About the MapReduce Technology Built into InetSoft's Business Intelligence Platform

This is the continuation of the transcript of a product demonstration of InetSoft's BI software for dashboards, reporting and mashups given. The presenter is Lauri King, Product Manager.

Lauri King (LK): All the actual data is coming from the atomic sources, we are not copying or moving the data anywhere in a persistent way. The data access engine is just live querying and uses a layer of caching for optimization purposes. The other aspect which I will introduce now is something that we brought into the product with the last version. It’s called Data Grid Cache. It’s a kind of MapReduce technology built into our business intelligence platform.

So essentially there are two technologies that we leverage to address a couple of concerns about data mashup. So typically people are very happy with data mashup especially if they are starting a new project. It means they don’t have to go through the hassle of writing ETL scripts. They don’t have to create a data warehouse. They can really just grab the data where it is now, do manipulations that they may need to, and then immediately build the dashboard.

dashboard demo
View a 2-minute demonstration of InetSoft's easy, agile, and robust BI software.

Data Security Solution for Business Intelligence

But then the primary questions are, okay what about security, and what about performance. We address security at the individual data block layer. As these individual sources are defined by IT, there is an application of security here. Remember that we were generating the SQL that was run, and that’s going to go through the security layer and based on my roles, my group memberships, and my permissions, it's going to automatically add a where clause, automatically include some row filtering, automatically hide some columns, so that I only see the data that I am allowed to.

By defining permissions at that atomic data source level, I can give the users a sandbox to play in. They can feel free to do whatever they want, and IT will never have to be concerned that somehow they are able to see something that they shouldn’t. That’s the data security solution.

Now for performance, we have got a various set of layers of caching in addition to caching individual queries. We have this technology Data Grid Cache. It essentially combines column-based database techniques, but instead of following the in-memory approach which is going to require a single server that has a massive amount of RAM, we have taken a column-based database approach where we are utilizing disk plus intelligent indexes to query very quickly.

#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index Read More

And we are using a MapReduce technique, so that you can spread this processing across a cluster of machines and execute queries in parallel. So if you have many gigabytes of data or terabytes of data like AT&T, one of our customers who is using this, they have a cluster of machines, and their data is persisted in these column-based database chunks spread across those different machines.

An individual query executes in parallel against the each of those data blocks and then delivers the results back into our BI front-end. So that layer of complexity is hidden from the user. They don’t see any difference if a query is running live against the atomic sources or if it's hitting the local cache, or if it's being running in the remote Data Grid Cache. The end result is exactly the same.

Advantages of Data Caching for BI Applications

Business intelligence (BI) applications empower users to explore vast datasets and uncover valuable insights. However, traditional data retrieval methods can lead to sluggish performance, especially when dealing with large datasets or frequent queries. This is where data caching comes in – a powerful technique that can significantly enhance the user experience of a BI application.

Turbocharged Performance: One of the most compelling advantages of data caching is its ability to dramatically improve query response times. By storing frequently accessed data or pre-computed query results in a high-speed cache, the BI application can bypass the need to constantly access the underlying data source. This translates to near-instantaneous responses for users, allowing them to explore data and generate reports much faster.

Reduced Load on Data Sources: Continuously querying a central database can put a strain on its resources, especially during peak usage times. Data caching acts as a buffer, reducing the number of requests reaching the main database. This not only improves the overall performance of the BI application but also helps maintain the health and stability of the data source.

Enhanced Scalability: As the volume of data and user base grows, a BI application without caching can struggle to keep up. Caching enables a BI system to scale more effectively. By keeping frequently accessed data readily available, the system can handle increased user traffic without compromising performance.

Improved User Experience: Fast and responsive BI applications translate to happier users. With data caching, users can spend less time waiting for queries to process and more time analyzing data and deriving insights. This fosters a more productive and data-driven work environment.

Important Considerations for Effective Data Caching

While data caching offers numerous benefits, it's crucial to implement it strategically. Here are some key considerations:

  • Cache Invalidation: Data in the real world changes constantly. It's essential to have mechanisms in place to ensure the cached data remains up-to-date. Strategies like time-based expiration or invalidation upon data source updates are crucial.
  • Cache Size Optimization: A cache that's too small won't provide significant benefits, while a cache that's too large can consume excessive resources. Finding the optimal cache size depends on factors like data access patterns and available memory.
  • Data Selection: Not all data needs to be cached. Identifying frequently accessed datasets and pre-computing commonly used queries will yield the most significant performance improvements.
Previous: Creation of Data Mashups