About the MapReduce Technology Built into InetSoft's Business Intelligence Platform

This is the continuation of the transcript of a product demonstration of InetSoft's BI software for dashboards, reporting and mashups given. The presenter is Byron Igoe, Product Manager.

Byron Igoe (BI): All the actual data is coming from the atomic sources, we are not copying or moving the data anywhere in a persistent way. The data access engine is just live querying and uses a layer of caching for optimization purposes. The other aspect which I will introduce now is something that we brought into the product with the last version. It’s called Data Grid Cache. It’s a kind of MapReduce technology built into our business intelligence platform.

So essentially there are two technologies that we leverage to address a couple of concerns about data mashup. So typically people are very happy with data mashup especially if they are starting a new project. It means they don’t have to go through the hassle of writing ETL scripts. They don’t have to create a data warehouse. They can really just grab the data where it is now, do manipulations that they may need to, and then immediately build the dashboard.

dashboard demo
View a 2-minute demonstration of InetSoft's easy, agile, and robust BI software.

Data Security Solution for Business Intelligence

But then the primary questions are, okay what about security, and what about performance. We address security at the individual data block layer. As these individual sources are defined by IT, there is an application of security here. Remember that we were generating the SQL that was run, and that’s going to go through the security layer and based on my roles, my group memberships, and my permissions, it's going to automatically add a where clause, automatically include some row filtering, automatically hide some columns, so that I only see the data that I am allowed to.

By defining permissions at that atomic data source level, I can give the users a sandbox to play in. They can feel free to do whatever they want, and IT will never have to be concerned that somehow they are able to see something that they shouldn’t. That’s the data security solution.

Now for performance, we have got a various set of layers of caching in addition to caching individual queries. We have this technology Data Grid Cache. It essentially combines column-based database techniques, but instead of following the in-memory approach which is going to require a single server that has a massive amount of RAM, we have taken a column-based database approach where we are utilizing disk plus intelligent indexes to query very quickly.

#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index Read More

And we are using a MapReduce technique, so that you can spread this processing across a cluster of machines and execute queries in parallel. So if you have many gigabytes of data or terabytes of data like AT&T, one of our customers who is using this, they have a cluster of machines, and their data is persisted in these column-based database chunks spread across those different machines.

An individual query executes in parallel against the each of those data blocks and then delivers the results back into our BI front-end. So that layer of complexity is hidden from the user. They don’t see any difference if a query is running live against the atomic sources or if it's hitting the local cache, or if it's being running in the remote Data Grid Cache. The end result is exactly the same.

Previous: Creation of Data Mashups