Big Data - Its Definition and an Overview

This is the transcript of a podcast hosted by InetSoft on the topic of "Big Data: Its Definition and an Overview." The speaker is Mark Flaherty, CMO at InetSoft.

Let’s start with the definition of what is big data. Big data has been around for some time, but now it’s getting a little more attention. Big data is any dataset. Any one single chunk of data that exceeds the capability of most tools to use it. In other words, it gets beyond the common database toolsets, beyond the things that we’re all familiar with and that are popular.

Whatever this data access tool is that we have, this big data will break it. This big data makes it hard to use that tool successfully. And you usually you see that in long response times. For instance, you try to use a business intelligence tool, and it takes 18 hours instead of 30 minutes. That is certainly something that slows you down. It makes your life difficult.

Where does big data come from?

Certainly over the years it began from storing historical data. But even if we look back just 10, 20 years, it was really all the transaction data and the call detail records. These types of data are producing huge datasets and huge amounts of data for some of our clients.

why select InetSoft
“Flexible product with great training and support. The product has been very useful for quickly creating dashboards and data views. Support and training has always been available to us and quick to respond.
- George R, Information Technology Specialist at Sonepar USA

New Forms of Data

Nowadays we’re seeing new forms of data and this is why we have the new term, big data. A good example is social media. So going out into Twitter and Facebook and all the blogs in the world and all these things and saying pull all this data in. Can we analyze it? Can we use it for some competitive advantage?

Another good example is sensor data. So you see companies like disk manufacturers or airplane manufacturers or silicone manufacturers spitting off ten terabytes of data every hour. And after you look at the sensor data for a little while you begin to detect the ability to improve the quality of the product and find defects before they’re disastrous either financially or physically.

So there’s also other kinds of new data coming in. I think we’re all familiar with it. We’ve seen health care data showing up, both the x-rays, MRIs, and, of course, all the clinical trial data. Some of these trials that they do in the pharmaceuticals can generate a terabyte an hour in terms of just studying a new drug and trying to figure out if it will have bad side effects or what will it do to help the patient.

Now that we’ve got computers everywhere, we’re seeing new kinds of data, and it has become a data tsunami. It’s wonderful for the vendors and for the customers that can use it. It’s not always wonderful because of the explosion of data. It can be mind boggling.

#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index Read More

Put It All Together in an Analytic Environment

When it comes to social analytics, companies are trying to figure out how they can capture the data from Twitter, capture the data from blogs and from newspapers and various sources. Then they have to put it all together in an analytic environment to analyze sentiment and behaviors and feelings of people in order to be able to respond to them in the marketplace. If they don’t like something, let’s fix it. If they do like something, let’s make a bigger advertising campaign around it.

Big data technologies like MapReduce and Hadoop are being adopted quickly by the e-commerce companies doing a lot of click stream analysis. Some of these click streams spin off a terabyte or two of data every day. If you save all that data it doesn’t take very many days in the month before you’ve got a fairly good size database.

So ultimately these people are looking into click streams, trying to find out what people are clicking on, what they want, why they gave up on their shopping cart and walked away. And from all of this data, they are trying to distill out some meaningful information such as let’s make our website easier to use. Let’s make our interaction with our consumer personal. Let’s find ways to help them buy what they want to buy and give them information they want to have. So this is another example of the kinds of big data being analyzed.

And, of course, the manufacturers have already been using big data analytics. One of the interesting areas there is when they do it in real time. So some of them are taking data right off of the manufacturing line, feeding it straight into the a business intelligence system, and within minutes of a manufacturing item being produced, they know if the quality on this thing is meeting the minimum requirements, and if it’s coming closer to maximum quality.

Next: MapReduce and Hadoop - Key Technologies for Big Data