InetSoft Webinar: Analytics Have Moved Beyond The Data Warehouse

Below is the continuation of the transcript of a Webinar hosted by InetSoft on the topic of What Machine Learning Means for Company Analytics. The presenter is Abhishek Gupta, Chief Data Scientist at InetSoft.

Okay, move onto the next topic here. I think we have moved to a point where analytics have moved beyond the data warehouse. A data warehouse is obviously a good repository for certain types of functions within the organization. When we start talking about sound data of when we're talking customers calling in and the recordings of those customers and for trying to do sentiment analysis on those or categorizing calls or trying to analyze certain calls from the wider body of calls to find out what worked, what didn't work.

If we talk about transactional data, and we're not just talking about storing transactions for operational purpose, we're really trying to detect anomalies in those transactions, so we can – if there's something really wrong happening here -that we can interfere with those.

Sometimes it takes a little bit more than just the simple business rule to determine whether an individual transaction is fraudulent, for example, or is going against policies or whatever. We talk about sensor data and the whole Internet of things, and this is another really big hype another one. We could talk about for an hour on this. What data should those sensors be collecting?

We talk about oil rig maintenance and trying to determine when is the right time to schedule a maintenance appointment for those oil rigs. Shutting down those rigs is big money and doing it at the right time so that it's not inefficient or costly is key. Doing it too early means you're actually wasting money, and I hate using that as an example because I would much rather that we could find another way to get around than using oil.

#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index Read More

It's an area with sensors and deciding what information do we actually have to store in order to build up the analytics. That's the chicken and the egg question, right? Sometimes you have to collect a lot of data first to figure out what's valuable in it. When you're actually doing it in an operational context, it might just be a few touch points in there, and you can actually perform some analytics on that data while it's streaming, event stream processing.

I think event stream processing more and more is going to play a role in how organizations consume analytics. Just because we have commodity storage now doesn't necessarily mean we have got to go and store everything. Even commodity storage has a cost, and that doesn't mean we infinitely want to dump all the data in there just to see what might happen.

I think we have to be prudent in analysis. We have to take a little bit of an experimental approach to see what data has potential in it, and then based on that, store the stuff that really is needed to solve a particular business purpose, feeding those business requests.

As we are talking about machine learning and data streams, we're getting to the point where we can push more now. It's never the idea to go and fit them. I have seen very few use cases where it's the idea to go and fit a model in a streaming context, but applying the score code of a particular model in a streaming context can be very valuable.

I mentioned fraud detection already. If we're talking about a manufacturing environment, we're checking on the health of that manufacturing environment during a production cycle. As the data is streaming out, we can run some analytics on that and we can see, are we heading towards a breakdown, or are we heading down towards a lower production capacity. All those types of things can be enabled by pushing the analysis closer to the data creation source.

view demo icon
View a 2-minute demonstration of InetSoft's easy, agile, and robust BI software.

Let's focus on that last bullet point: machine learning on streams. You may have heard of online learning, and this is the idea that we actually update a model based on data coming in constantly. We are starting to be interested in this. I'm not sure if our customers are wanting to do this quite yet. We're certainly seeing a lot of interest in it, but I think the one thing is if you're going to do this you really have to trust that stream, right?

Because if you're a bank, and you've spent a lot on your data science deparment, the models have to go through these incredible documentation, validation, regulation processes. You're going to put this model out there, and then you're going to trust it to be updated based on the data that comes in on the stream.

Well, what if a horrible outlier comes in on the stream and throws off all the parameters in this model you've worked so long to build. I do think that in the future we will get to the point where the model updating its own parameters or rules based on data and coming from a constant stream. that that will happen. We will get there. I just like to point out that if you are going to do this then you really need to trust and verify that stream because it will change your model, and do you want that to happen?

You have got to put some sensible constraints on how much that that can happen in real time so that that model cannot just totally flip on its head. You have to have a model that's robust against that.

view gallery
View live interactive examples in InetSoft's dashboard and visualization gallery.

Model management, model monitoring, we're going to talk about that in just a minute. The only thing I would say is that from a research perspective we're really happy about the online learning techniques that we're starting to see in literature. This is certainly something we are looking towards. It's something that there's a lot of interest in. I'm not sure how much for our customer base this is actually being implemented at the moment though.

Previous: The Difference Between Machine Learning And Data Mining
Next: To Be Continued Next Week