The tool itself is used to specify data sources and the rules for extracting and processing that data, and then, it executes the process for you. So it’s not really the same thing as programming in a traditional programming sense, where you write procedures and code. Instead, the environment works with a graphical interface where you are specifying rules and possibly using a drag-and-drop interface to show the flows of data in a process.
The way it works, in a sense, is that it takes these rules and runs them through an engine or generates code into an executable, which is then executable within your production environment. Most ETL tools are run in a batch mode because that’s where they have evolved from. So there is some kind of an event that triggers the extract. Or it’s schedule-driven, and the schedule dictates that that at such and such a time, you’ll run this particular extract. And then, there can be dependencies in the schedule so that if one thing executes successfully, another thing can be triggered to run.
ETL tools, themselves, are geared towards data oriented developers and DBAs. So these aren’t the kind of tools which are really aimed at application developers who are more into procedural coding and third-generation languages. We are looking at somebody who understands data, not necessarily an application programmer, and the preference is, in particular, somebody who understands something about databases and SQL, since about 80% of the time, we are pulling data from databases directly.
So you can read multiple types of databases, files, web services, and bring all of these things together. The way the tools facilitate this is to connect their libraries and integrated metadata stores underneath them. And that makes maintenance and traceability much easier than in a hand-coded environment.
ETL tools are good for bulk data movement, getting large volumes of data, and transferring them in batch. They are good for situations where you have complex rules and transformations. So you have got calculations and string manipulation and data changes and integration of multiple sets of data, and in particular, high volumes of data from different type of sources.