InetSoft Webinar: Analyzing Streaming Data
Below is the continuation of the transcript of a Webinar hosted by InetSoft on the topic of What Machine Learning Means for Company Analytics. The presenter is Abhishek Gupta, Chief Data Scientist at InetSoft.
This ties into what we talked about in terms of deploying these models in production in real time. The earlier points we've made about automation which I think apply in a typical organization even actually the leading edge Bay Area start up internet companies, but there's this distinction between software engineers and data scientists. What is this distinction?
Software engineers tend to be very good at writing code. They're very disciplined. Their code is very nice. It scales. It's easy to maintain. Data scientists are very good at the analysis of data and making sense of data. Usually when a data scientist does an analysis and builds a model they end up having to hand over their model to a software engineer who then rewrites it into the production environment, according to the standards of the production environment.
On the other hand to the extent that a software engineer can do what a data scientist can do, because obviously there are these easy to use machine learning libraries that they can use for themselves. It's possible that they do some of that, but I think if you talk to many people who work with data they will tell you that - for many software engineers their strength is really in actually writing something that's been well specified and spec'ed out.
|#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index
With data scientists, their strength is in the data discovery and learning what the right model to use. Whenever there's a little bit of ambiguity around the project then it becomes harder for a software engineer to actually do something that replicates what a data scientist can bring to the table.
On the other hand on with streaming data there is a distinction between processing and the simple analysis. At massive scale even the simple things like counts of what are the top items become difficult with massive scale. Then there are companies now out there who are able to do massive scale analytics of that sort, but also anomaly detection and correlations of massive scale. They tend to be kind of the leading edge companies. Then there is online learning. I think there are companies doing that, but that's still not a common thing to do.
Our next topic was the third topic I believe in the agenda. It's this idea of that in a large organization, there's just going to be a lot of different business intelligence software floating around. That's just the reality of today. I mean there's all these great free tools. These things exist, and they're high quality, and people use them for a good reason. I've been involved in how a clustering procedure would help choose the number of clusters for users, and we've done basic research on variable selection for support vector machines and for regression. I think it's clear to everyone that algorithms are really a commodity now.
Organizations, especially large organizations are just going to have different software around. The question is how to stop fighting over them. I like this one, and you like that one, and how do we use all of these things productively in a way that benefits your organization and solves business problems. Of course on the Open Source side of things there's just so much flexibility, so many different algorithms to choose from being made at universities.
View a 2-minute demonstration
of InetSoft's easy, agile, and robust BI software.
Now we see large companies contributing to different Open Source libraries and driving the direction of Open Source packages. For anyone thinking about this from a managerial perspective I think one interesting thing to think of is how do you get involved in a Open Source project and help stir it in a way that benefits your company. I mean this is something that's possible to do.