In Memory Business Intelligence Tools
InetSoft's BI software employs a combination of in-memory database technology and disk-based access to scale up for big data applications using commodity-priced servers. InetSoft's proprietary term for this approach is 'Data Grid Cache.'
Optimized, compressed indexes are loaded into memory while the data can either remain on the hard disk or be loaded in chunks into memory based on available memory and the data needed for a given dashboard or visualization.
The data can either be accessed in real-time from a data warehouse, operational data store, or a mashup of several sources, or it can be configured to be cached on disk by InetSoft's Style Intelligence server application at specified time intervals. Incremental updates can be added on a scheduled basis or on demand, and cache re-writes can be scheduled for off-peak times.
This approach offers maximum flexibility and leaves the choice up to the enterprise. Data timeliness and performance requirements vary from case to case, and InetSoft provides the agility for all cases.
View from the BI Verdict
"There’s one obvious reason for in-memory BI – it’s extremely fast, which is a big plus for users. The biggest performance bottleneck in typical BI applications is slow disk or even slower database access, which is hundreds of times slower than RAM access. Of course, disk access is not the only bottleneck, so in-memory tools are not hundreds of times faster overall than disk-based tools.
But that performance triggers a virtuous cycle: if an application is intrinsically very fast by default, it doesn’t need complicated data structures to further optimize performance. Even a simple, inefficient RAM-based application is likely to be at least an order of magnitude faster than even a very well optimized disk-based application. That saves time and effort (which translates into reduced consulting costs) when developing applications, and makes it easier for in-memory applications to re-structure dynamically or recalculate on-the-fly. This ease of development and flexibility is perhaps the biggest advantage of in-memory BI.
However, in-memory tools, though simpler to build than disk-based tools do need to hold data as compactly as possible. It is not enough simply to load disk structures into memory, as these tend to store data rather inefficiently. The best in-memory BI tools can take an order of magnitude less space for the data than it would take in an RDBMS.
In-memory applications have always been very fast, of course, but two other more recent developments help in-memory BI: the plunging cost of memory chips and 64-bit computers. One less obvious trend that helps in-memory BI is that modern computers crash much less often than in the past, so there’s little risk of randomly losing all the work performed in a session.
At first glance, 64-bit computing sounds like the most important breakthrough, as it allows access to far more RAM than the two or four GB accessible on 32-bit systems. However, it has always been possible to access more memory than the operating system can address, using techniques like bank switching.
Back in the days of 16-bit computers, these techniques allowed applications to access more than the maximum 640 KB of conventional memory. For example, in 1988, Lotus, Intel and Microsoft created a specification for an expanded memory system (LIM-EMS) that allowed 64 KB chunks of applications data to be paged into the reserved upper memory space – it may not sound like much today, but if most of the conventional memory was already filled with programs, an extra 64 KB for data could double the space available for application data (such as Lotus 1-2-3 spreadsheets).
Microsoft and Digital Research then added more features to their MS-DOS and DR-DOS operating systems that allowed extended memory to be used beyond the 1 MB addressable by 16-bit processors, for example to hold TSR programs and DOS components or as a high speed RAM disk.
Thus, long before 32-bit computers became established, 16-bit DOS computers routinely used more than the supposed upper limit of 1 MB of RAM. Exactly the same would have happened with 32-bit computers and their 4GB limit (once thought unimaginably high), but 64-bit processors became widely available at affordable prices early enough for these techniques not to be re-invented.
So the real breakthrough is the plunging cost of RAM, for which the BI industry can take no credit. It is the Asian semiconductor industry’s sustained heavy investment in advanced chip foundries that has made in-memory applications of all types practical and affordable. And that investment was not even driven by the needs of the BI industry, but consumer demand for electronic devices with ever more RAM. So today’s in-memory BI wave is thanks to demand in the consumer hardware market, rather than innovations in the business software industry.
Read more here
What is Data Grid Cache?
Data Grid Cache is InetSoft's proprietary data querying technology. It is a columnar data store that uses in-memory technology to enable highly scalable big data analytics on a real time map-reduce (Hadoop-like) architecture.
In other words, it is a data accelerator that stores data in columnar format and use in-memory technologies to speed up the processing of data, and the map-reduce data cluster enables unlimited scalability.
The data grid cache is optionally deployed when performance requirements call for it, whether to support big data, massive concurrency, high reliability, and/or to avoid overtaxing the operational data stores.
Inspired by Hadoop and MapReduce
The data grid cache solution was inspired by both Hadoop and MapReduce. The indexes for querying the optimized data cache are column-based, the approach used by MapReduce. The data cache can be distributed over multiple servers to spread the processing load and provide failover, both key features of Hadoop technology.
How Does Data Grid Cache Employ In-Memory Database Technology?
A data grid cache intelligently queries and saves the data necessary to support a given dashboard or data visualization, including all filtering and drill-down levels. When a cluster of commodity-priced servers is used to run a data grid cache, the data file is automatically split into chunks and distributed to the nodes.
One data chunk may be copied to multiple nodes, so if some nodes fail, the cluster could still work in most cases. On each node, the dimensions are always loaded into memory when executing the query. Measures are loaded into memory in chunks.
If the size of available memory allows, the entire measure column is loaded into memory, too. The advantage of not always loading measures into memory is greater speed for most interactive operations. Oftentimes a user's interaction initiates a query that only filters the rows and only calculates the measure on a small subset of the rows. Having dimensions in memory allows the filtering to be done quickly. After the filtering is done, only the rows in the subset needed for the final calculation are loaded in memory.
If there is enough memory available to hold the data grid cache, the entire data cache will be loaded into memory for processing.
Why is Data Grid Cache Better than Pure In-Memory BI?
A pure in-memory BI solution means that you are limited to analyzing or reporting on the size of the database by the size of memory in the BI server. Yes, memory costs continually come down, but terabyte memory machines still cost multiples of gigabyte memory machines. The data grid cache solution suffers from no such limits and can scale up as performance, usage concurrency, and database size grow by simply adding memory and/or commodity-priced servers in a cluster.
Therefore, a pure in-memory approach is not the best option for scalable, multi-user BI apps. 64-bit computing on column-based technologies can provide a better alternative for hefty OLAP projects.
|In-Memory Analytics Data Access|
|What is In-Memory Analysis?|
|Cube of Data in Server Memory|