What Big Data reporting tools are there? If we have all this data, and we need to do analytics on it. What reporting tools can we use to get the stuff back out? Can we use SSRS which is SQL Server Reporting Services?
The short answer to that one is yes. The key to all of these, and I want to put a big asterisks on this, because these is the approach we’re going to use today. I really think it’s kind of a stop gap measure, and it’s going to change, but I mention thing called Hive that in effect creates a SQL abstraction layer over Hadoop. And what Microsoft has done is to create an ODBC driver for Hive, so in effect any ODBC client can talk to Hadoop via this ODBC driver, and the Hive layer over Hadoop, that includes reporting services. That includes knowledge integration services.
It includes only the new tabular mode of analysis services. The original multidimensional mode of analysis services actually wouldn’t really work with ODBC. It’ll only it only work with only the Big Data sources. I didn’t even know that until I tried to get it to work against Hadoop, and I discovered that. And powerview, which is Microsoft’s new analysis and data visualization product, that is part of SQL Server, and it runs inside of Share Point 2010.
You can point that at not directly Hadoop, but you can point it to a PowerPivot model or in analysis services tabular model that pulls data from Hadoop. PowerPivot, most parts of the BI stack can get to it, though the notable exception is the multidimensional mode of SQL Server Analysis Services.
Stepping back for a moment now, Big Data came into my life really from reading about it. I was aware of it, and I explored it, and it turned out to be the next face of BI. Technologically the worlds are a little bit different, and I would say even culturally they’re little different. A lot of the practitioners in the Big Data world have never really done work with conventional BI technology and vice versa.
Of course, there are exceptions to that. I would call myself one of those, but really you know it’s all about the same stuff. It’s about taking data that maybe transactional and was captured operationally for tactical purposes and doing analysis and aggregation on top of it for strategic purposes. The difference is in the granularity of the data, and typically in Big Data scenarios the granularity is a lot lower so you’re getting a lot more data.
I spoke with someone at Teradata which is a probably the most notable name in massively parallel processing data warehouse appliances, that category that SQL parallel data warehouse edition is in. And he explained it a way that I thought really made a lot of sense, which is that in BI typically what we’re aggregating over is our transactions, and with Big Data typically what we’re aggregating over our interaction.
So think of an e-commerce kind of scenario. In BI maybe we’re just doing analysis over all the purchases that were made by customers and the things about those purchases like what items and all the attributes you can think of. What time of day, how they got to the site, that sort of thing, but then imagine instead only doing analysis on the purchases what if you did analysis on every click in the web session that led to that purchase and various cross references between those things at that small granular level that would be more of a Big Data scenario.
So again I mean you’re working analytically or thinking the same way, just think of it at a different scale, and I think you’ve got a sense of how the two are tied together.