InetSoft Webinar: Big Data – Hadoop or Data Warehouses?

This is the transcript of a podcast hosted by InetSoft on the topic of "Big Data – Hadoop or Data Warehouses?." The speaker is Abhishek Gupta, Product Manager at InetSoft.

Well I think we should start with the definition what is big data. First off, there has always been big data in business, but now it’s getting a little more attention. Big data is any dataset, any one single repository of data that exceeds the capability of most analytic tools to use it. In other words it gets beyond the common business intelligence tool sets, beyond the analysis software solutions that we are all familiar with and are popular.

Another way of saying it is whatever this BI tool is that we have, this big data will break it. This big data makes it hard to use that reporting tool successfully. And usually you see that in very slow response times. You try to use a data query tool, and it takes 18 hours instead of 30 minutes. That is certainly something that slows you down and makes your life difficult.

Where does big data come from? Where do you see big data? Well, certainly over the years it began in the historical times, and if we look back just 10 or 20 years, and it was really all the transaction details and the call detail records. And still these applications are producing huge datasets and huge amounts of data for some of our clients.

#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index Read More

Nowadays we’re seeing new forms of data, and this is why we have the new term big data. A good example is social media. Twitter and Facebook and all the blogs in the world, all of these thing are creating huge amounts of data, structured and unstructured. Imagine pulling all this data in. Can we analyze it? Can we use it for some competitive advantage?

Another good example is sensor data. You see companies like disk manufacturers or airplane manufacturers or silicon manufacturers spitting off ten terabytes of data every hour from their manufacturing processes. And after you look at the sensor data for a little while, you begin to detect the ability to improve the quality of the product and find defects before there are disasters either financially or physically.

There are other kinds of new data coming in. I think we are all familiar with it we have seen healthcare data showing up, both the x-rays and the MRIs, and, of course, all the clinical trial data. Some of these trials that they do in the pharmaceutical industry can generate a terabyte an hour. They need to study a new drug and try to figure out if it will have bad side effects or what will it do to help the patient.

So now that we have got computers everywhere we’re seeing new kinds of data, and it has becomes a tsunami. It’s wonderful for the BI vendors and for the customers who can use their BI solutions. It’s not always wonderful because the explosion of data can be mind boggling.

As a business intelligence vendor, we’re obviously having to face it on multiple fronts. We’re doing what our customers are asking us to because that gives us a priority. Some of our customers are leading us into social analytics, trying to figure out how we can analyze the data from Twitter, report on the data from blogs and from newspapers and various sources. We can put together an analytic environment to analyze sentiment and behaviors and feelings of people.

The idea is to respond to them in the market place. If they don’t like something, let’s fix it. If they do like something, let’s make a bigger advertising campaign. We’re also working in the area of using tools like MapReduce and Hadoop which are being adopted quickly by the e-commerce companies who are doing a lot of click stream analysis. Some of these click streams spin off a terabyte or two of data everyday.

If you save all that data, it doesn’t take very many days and months before you’ve got a fairly good sized database. So ultimately these people are looking at the clickstreams trying to find out what people are clicking on, what they want, why they gave up on their shopping cart and walked away. And from this you can distill some meaningful information that says let’s make our website easier.

view gallery
View live interactive examples in InetSoft's dashboard and visualization gallery.

Let’s make our interaction with our consumer personal. Let’s find ways to help them buy what they want to buy. And give them information they want to have. So this is another example of the kinds of data. Back to the manufacturers who have already been using InetSoft, one of the interesting areas is in real time BI. So some of them are taking data right off of the manufacturing line, feeding it straight into the InetSoft system and within minutes of manufacturing items being produced.

They know if the quality of this product is meeting the minimum requirements, and if it’s coming closer to maximum quality. So we’re approaching this in a lot of ways. And we’ve been integrating with MapReduce and Hadoop. Speaking of MapReduce, there are a lot of questions out in the marketplace surrounding it. Is it better to use MapReduce or a data warehouse for big data?

That's a good question a lot of people are struggling with, and there is a lot of religious fervor on both sides of that camp. Should we use commodity hardware with a parallel MapReduce for analytics, or should we use a traditional data warehouse on high-end hardware with relational tools and relational capabilities. And the answer is simple if you had a screwdriver you could pound in a nail with your screwdriver, but you might choose a different tool which would be more effective.

So MapReduce and Hadoop are two sides of the same coin. They provide process oriented parallelism, and they use a lot of process languages. In contrast relational databases have parallelism built into the data, and they handle a SQL language. So you use the correct tool for the job. And ultimately these are very complimentary. Now they do overlap some. You can run reports with MapReduce, and lot of people do that.

Read what InetSoft customers and partners have said about their selection of Style Scope for their solution for dashboard reporting.

You can do data mining with MapReduce, and lot of people do that. There are some forms of data mining that work best in Hadoop and don't work as well in the SQL data warehouse. In contrast you can take a tool like InetSoft or some other analytics solution and hook it up to the data warehouse and speed up analyses at enormous rates. And so they become very effective and valuable.

The consumer of these BI tools, the enterprise customer has a decision to make what's the most optimum place to run the analytics. Each one of these tools, each one of these functions for their business, and it’s never just a simple do it one way answer. But what we’ve decided is that these two tools should work together. They should exchange data. They should interact and make it easy for the customer to make that choice and change their mind when they want to change it. We think it should be complimentary, and they should work together.