Abhishek: Yeah, there is a question that one of the attendees asked, is Alteryx a data shaper, and the answer is absolutely. In addition to you mentioned Trifacta and Paxata. Those are two technologies that that were kind of born with Hadoop, and that's where you were seeing the largest variety of data.
In that variety, you have to find a way of making the data really usable for broad analytics use cases. It depends on the shape of the data, whether it's a nested files or something else. And so you saw technology as Trifacta and Paxata that are really born around leveraging the Hadoop platform to do that data shaping and processing right on there.
Now it has expanded to other technologies so it's not just dependent on Hadoop, but Alteryx has just gone the other way where they started with being able to shape data and prepare data off of a number of different sources whether they are now actually leveraging the processing of Spark or Hadoop to be able to do some of the transformations in memory.
#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index |
|
Read More |
The Data Pipeline
So this is an area, we call it kind of the data pipeline, helping to move data from a landing zone or staging area into the data lake to different points of use cases. So it's very interesting. An important part of that data flow and a big part of that again is data lakes. Hadoop, often times traditional ETL, or data transformations are being done before the data lands in the system where we will do analytics.
The benefit of the data lake is you're just landing the data there in its raw format, and then you're doing the transformations and the shaping after it's already landed. So it's much more flexible and that way it enables you to do the kinds of shaping that you need for the right analytical use case. Larry, any other thoughts on this one?
Larry: Yes, so I mean I would say as having been a consumer of data and a business user myself, including data prep into the analytics workflow just makes a lot of natural sense because at the end of the day, for great analytics you need great data, and the business user or any user for that matter doesn't have to wait for someone else to do that prep for him.
When he knows how that data needs to be shaped to answer or to explore the particular question he is looking at. How this helps our customers is it helps drive agility in their analytics process, and it also makes the data available to large groups of stakeholders within organizations a lot faster and all our customers.
Abhishek: Yeah, exactly, and I close off on one question here just as it's relevant to our context, does InetSoft shape data, and the answer is, yes. So there is a pretty good amount of data preparation built into the InetSoft application itself, and we have things like cross data source joins and capabilities like that. So there's a lot of what we call data preparation built into InetSoft.
Alright, the next-to-last trend, Big Data grows up: Hadoop adds to enterprise standards, and this really keeps with the trend of Hadoop becoming a part of the enterprise analytics landscape, not a science project on the side. What we see is that there are more investments in the security and governance component surrounding Hadoop as it's a core enterprise system now.
Examples of that are Apache Sentry which provides a system for enforcing fine-grained and role-based authorization of data and metadata stored in a Hadoop cluster. Apache Atlas, which is the greatest part of the data governance initiative, empowers organizations to apply consistent data classification across the data ecosystem. Apache Ranger provides centralized security administration for Hadoop.