Amazon Elastic Cloud Compute is a Hadoop implementation within Amazon. Then there is Microsoft Azure, and Google has a new SQL engine that I think is very fast and allows big data but then you have the issue when you use the Cloud of the movement of the data back and forth because if you are moving petabytes across. It's not practical.
Data tends to have gravity where it collects a lot and wants to stay there. I have heard this other term associated with Hadoop the term, data landfill, or ha-dump. And I have also heard the Hadoop hangover.
It's an interesting emerging use case in which we see Hadoop being thought of as a staging environment on steroids, a place to stage and dump in a massive amount of stuff that you are not quite sure what you want to do with. So you stream it as a set of flat files into Hadoop. Let Hadoop deal with it where it goes, and then if you want to write MapReduce against that, have fun.
But in a lot of cases we are beginning to see other vendors create connectors to Hadoop to pull that Hadoop data into for instance like an Aster Data. So Teradata, Aster Data and Hadoop are this ecosystem that they have created that allow you to source it into Hadoop do more structured column or analytics in Aster Data take those insights pull them into Teradata and use your traditional data warehousing tools there.
Other examples are Hadoop and SAP HANA, and Hadoop and Greenplum. That’s a very common use case. And I think there is an important point because Big Data isn’t about just one of these technologies. Most of the large companies that have Big Data need a combination of business intelligence solutions. It's not like Hadoop is going to replace your data warehouse. You need all of these technologies.
I have talked to a lot of companies who are struggling with well if I do Hadoop, can I just get rid of my data warehouse, and the answer is probably not. There is all kinds of cleansing, conforming things that you still want to be able to do with the typical ETL processes. But there are an awful lot of use cases where you can just dump it into Hadoop or NoSQL database and run analytics against it a lot faster and cheaper. So it's just another analysis tool in your BI tool kit.