Big Data Is Not Only For Unstructured Data

This is the transcript of a Webinar hosted by InetSoft on the topic of "What is Big Data and What isn't?" The speaker is Abhishek Gupta, product manager at InetSoft.

Hi and welcome to Talking Big Data. Today we’re talking about myths about Big Data. The first one I hear, and I see quite frequently, honestly, is that Big Data is only for unstructured sources. I think this actually came up in one of the Big Data conferences I attended. The answer is no, that’s not true.

It’s a myth. It is a myth propagated by people that don’t know, and this came up over a yet again and after an article was published on IBM Data Magazine. These technologies don’t care what the data looks like, which is one of the reasons why they’re so useful, right. I actually was on the phone with one of our major bank customers up north, today.

The entire project was structured data, and we used our data grid cache technology to tackle the processing challenges in order to gain those big insights. I think this myth about unstructured data persists for two reasons. One, as a lot of people are glomming on to the Big Data space, they just haven’t worked with the technologies and don’t know. I think the second is little bit more of a theory. You have certain vendors out there, and I don’t like to name names so I won’t mention that it’s Oracle, that are just scared of these new technologies.

They try to tell a story that, “Oh this is for stuff that you are not going to put into Oracle database anyway right? And it’s just bad information, right?“ So, you know. Hopefully, you know, this won’t come up again, and I'm sure it will, but the answer is no, it’s for any data source, structured and unstructured, and semi-structured, that you can think of. It really doesn’t matter how it gets stored.

#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index Read More

Using an Analytical Tool on Big Data

Using an analytical tool on Big Data obviously does have some considerations over what kind of data. For instance, there is video, which is really unstructured, and you know is going to be hardest for this environment. So, of course we work perfectly well with good old fashioned structured data, and in fact Hive and HBase and others are quite good at that. In fact, it’s one of the reasons why we've introduced SQL capabilities on top of our BI platform because there is a lot of structured data where good old fashioned SQL is handy to exploit.

So, yes, I am telling you that Big Data is for structured and unstructured data. It is there to be used for whatever it makes sense, right? So, we’re back to the fit for purpose architecture paradigm. But, really, the idea is use it where it makes sense, right? And for this one project that I referenced for our colleagues outside the Toronto area, it was not a huge volume of data either.

That’s another myth, that Big Data solutions are only for huge data flows. It was not unstructured data. It was three years worth of structured data that we just didn't know how we were going to want to work with it because we hadn’t worked with it before. And the flexibility of the technologies was really what drove us, right? So, the ability to go deep, left, right, up, down, different ways of working with things, exporting data -- you know in multiple different formats where we loaded it raw and output it in all sorts of modified forms.

All those sort of capabilities, the schema, unread stuff that we’ve talked about before or what drove us to used, you know, have nothing to do with the data per se. In fact, you know, some people had never thought of working with structured data in these environments before, and once we showed them how easy it was, you know, it was a bit eye opening for them, so anyway.

So that is one myth shot down, and a bonus myth thrown in to boot. Now that we've agreed that Bid Data technology is good for structured and unstructured Data, I know this next one has to be true. And that is Big Data has an inherent data quality problem. Not true. And in my tradition of not naming names I won’t mention that this was SAP writing about this.

why select InetSoft
“Flexible product with great training and support. The product has been very useful for quickly creating dashboards and data views. Support and training has always been available to us and quick to respond.
- George R, Information Technology Specialist at Sonepar USA

Is Big Data Unstructured?

Big data refers to large volumes of data that are too large or complex to be processed and analyzed using traditional data processing tools. The emergence of big data as a recognized area of study arrived in the context of massive data sets provided by online social activities such as those facilitated by Facebook and Twitter and other social media, but was recognized to also describe the data challenges faced in many other disciplines, such as finance, climate analysis, genomics, communications, and so on.

Structured data is data that is organized in a specific format, such as a table in a database or a spreadsheet. It is typically easy to process and analyze because it is organized in a way that is easy for computers to understand. Unstructured data, on the other hand, does not have a predetermined format and may include text, images, audio, and video. It is often more difficult to process and analyze because it does not fit neatly into a structured format.

Big data can be either structured or unstructured, or a combination of both. Many organizations generate large amounts of unstructured data, such as social media posts, customer reviews, and log files, which can be challenging to process and analyze. One of the characteristics of these data sets is that it is often not obvious what relationships exist in the data or and whether these might provide useful information. In order to effectively work with big data, organizations therefore often need to use specialized tools and techniques, such as machine learning and data mining, to extract insights and make data-driven decisions.

Next: Big Data Doesn’t Have to Have Data Quality Problems