Is a machine learning platform useful for a small dataset?

This post is the third in a series discussing a machine learning use case for a mobile app provider. The link to the full case study can be found at the end of the post. The first post can be found at https://www.inetsoft.com/blog/machine-learning-concepts-defining-churn-predictive-metrics/

Is machine learning useful for a small dataset?

Machine learning is not associated with data volume. The most widely known dataset for machine learning is call “iris”. This is a dataset for three species of iris flowers. It only contains 150 rows (observations) of data with 5 columns (features). This dataset has been used to test and validate many machine learning algorithms and models.

In the business world, it is very common to see a project start with a spreadsheet of a few thousands row of data. Data in such a spreadsheet is normally already aggregated and transformed like in the use case we have been discussing. Sometimes in business applications, the challenge is about the number of features (columns). Too many features will require a lot more computational power and make the process slow. It can also make the results harder to understand. That’s where business expertise is needed to choose the right set of features, at least initially.

When your machine learning practice become more mature, more data (observations) normally make the results more accurate. That’s where Big Data can come into play.

The full case study can be found at https://insidebigdata.com/2017/04/14/predicting-mobile-app-user-churn-training-scaling-machine-learning-model/

If you’d like to watch a 2-minute introduction to InetSoft, see https://youtu.be/bQI2kFmwzmk

The next post in the series is at: https://www.inetsoft.com/blog/how-much-machine-learning-can-do-w-pc/