InetSoft Webinar: That’s Absolutely a Big Data Tool

This is the continuation of the transcript of a Webinar hosted by InetSoft on the topic of "The Newest Buzz Word in BI: Big Data." The speaker is Abhishek Gupta, product manager at InetSoft.

That’s absolutely a Big Data tool. How it contrasts with Hadoop though, first of all, Hadoop is typically based on commodity everything. Cheap servers, cheap disks all tied together in a cluster, and PDW is not commodity. It’s high-end storage, high-end networking, high-end CPU and because it’s an appliance, the physical amount of processing you have is, by definition, finite.

With Hadoop you can keep adding nodes to the cluster if you want. So they’re not exactly the same, but map reduce inside of Hadoop and the MPP inside of SQL parallel data warehouse edition, they’re both distributed processing algorithms for dealing with large amounts of data and PDW is definitely a Big Data tool.

What tools do we have? What does Microsoft offer, and what open source tools are there? Big Data is very popular in the open source world also. So what tools do we have at our disposal for both creating the Big Data and also analyzing and getting those results out of Big Data implementations?

view demo icon
View a 2-minute demonstration of InetSoft's easy, agile, and robust BI software.

First about Microsoft, inside Microsoft its research team was building its own sort of Big Data technology that effectively would have competed with Hadoop, and the decision was made really to put a stop to that and to say look open sources. It may not be their default approach to markets, but in this market, in the Big Data market it’s all about Hadoop.

So they realized they should really be working with Hadoop as well. So Microsoft has teamed up with a company called Hortonworks, which is one of the more prominent companies in the Big Data world, and they’ve built their own distribution of Hadoop that runs on Windows. It runs both on Windows servers, and perhaps more importantly, on Windows Azure.

Hadoop is a whole eco-system so there are things that sit on top on Hadoop that you’ll find a lot. There is something called Hive, which in a fact is a SQL abstraction over Hadoop so that you can write SQL queries instead of Java code for Map Reduce jobs.

There is also something called Pig which is a data transformation language. There’s something called Mahout, M-A-H-O-U-T, for doing predictive analytics with Hadoop, and there is a few other pieces as well and Microsoft’s distribution has all of those pieces in it. So those are all open source projects. You can tell because they have silly names. That’s usually a clue that you’re in the world of open sources, and Microsoft implemented all of those.

Now it’s still in private beta, but I expect it to be generally released very soon and the private beta you can request an invitation. You can just go hadooponazure.com and request an invite. I think you can take up to a week before there is a response, but you can get in. And that’s the set of open source tools available to you if you understand the Windows world, but if you don’t, there is Hadoop implementation also included on Amazon web services, and of course, you could install it on your own cluster of servers.

In terms of making the Big Data, as it were, you’re already going to have it. If you’re going to be doing Big Data analysis, those data sources are going to be obvious, too. You may be working on log files in a web context.

You may be working on sensor data in a supply chain or manufacturing context. You shouldn’t have to make the data. The whole issue typically is that something is already making the data that you want to study.

data intelligence icon
Learn how InetSoft's native big data application is specifically designed for a big data operating system.

Previous: Dropping Data Storage Costs Helped Spark the Big Data Trend