InetSoft Webinar: Preparing to Take on a Data Analytics Problem

This is the continuation of the transcript of a Webinar hosted by InetSoft on the topic of "Best Practices in Data Mining." The speaker is Mark Flaherty, CMO at InetSoft.

Flaherty: When preparing to take on a data analytics problem, start by asking, what is it that you are really trying to do and how do you really think you are going to answer the real problem at hand, rather than kind of a naïve interpretation of the problem. And for example, when I say I want to forecast inventory levels, anybody who does physical forecasting understands that a forecast is by definition incorrect.

So I am going to tell you that the level is going to be three or five or seven with some range, is that going to be helpful to you? Is the question they really want to know, are the inventory levels sufficient or not sufficient. And if you can change things to a classification problem, then you can actually take action on the results. If they think about the results that they are going to get, how they are going to use that result, then they will have much higher success rate in understanding what predictive analytics can do for you.

Moderator: Yes, right. Now that’s a very good point, and let’s kind of talk about this for a second because you are reminding me of something that we are talking a lot about these days. It's one of the great bits of wisdom I have picked up from the consulting world. Anytime you are dealing with something like predictive analytics or data mining, or even in other kinds of enterprise data initiatives like master data management, for example, one of the best mantras I have heard is bring your data forward.

#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index Read More

In other words, work with your data first, examine the data first without making too many presumptions about it because if you approach a large data set with lots of presumptions, those are going to color what you expect from the data and you wind up just sort of intrinsically looking for patterns that validate what you have thought before you opened your can of data, if you will. Is that your impression as well?

Flaherty: Definitely. One thing I found is that people don’t understand what predictive analytics is. They find out that predictive analytics does two things, it either tells them something they already know, or it tells them something they don’t believe.

Moderator: That’s funny.

Flaherty: And if you fall into that trap, you lose before you start. So if you frame the question that you already know the answer, then predictive analytics isn’t going to help you because you have too much of a predefined vision of what the answer should be. You should prepare to be surprised, and you should expect to be surprised and be happily surprised by the discovery that you get with predictive analytics.

On the other hand, you do have to be able to frame the problem in a way that predictive analytics can understand. I think that’s one thing that people miss a lot when they approach a new predictive analytics problem. The example I’d like to give is the typical kind of churn analysis. Which customers are going to leave, which customers are going to stay?

You can lose the time element. Basically, you have to remember you are looking at a slice of time where there are certain customers left and certain customers didn’t leave, and you can do analysis based on a particular slice of time. Furthermore, you may or may not have the data, the data that indicates the thing that you are looking for may not really exist in your data.

Read what InetSoft customers and partners have said about their selection of Style Scope for their solution for dashboard reporting.

If I am in supermarket and all I have is loyalty card data, I don’t have a churn feed. There is no event defined that a person left is no longer my customer. So you have to kind of come up with definitions and you have to massage your data to make sure the concept that you are trying to understand better is somehow representable in your data and always, in my experience, there is a time factor which is often ignored in that space.

Previous: Best Practices in Data Mining