Achieving Analytic Agility with Data Mashups

Executive Summary for the Report

Catalyst

The need to react quickly to a rapidly changing business climate is forcing organizations to make faster business decisions. But being able to quickly ask questions and get answers is not a simple task. While business intelligence (BI) tools and applications have emerged as viable solutions, traditional approaches to building and deploying these systems are complex, cumbersome, and expensive and often fail to keep pace with BI end-user demands and expectations. More often than not, resource-constrained IT departments are overburdened with BI requests, sometimes taking days, weeks, or months to fulfill them.

The challenge is to deliver a BI system that breaks the mold of traditional BI solution deployments by removing this IT bottleneck and shifting the analysis to business users. What's needed is a more agile approach that allows business users to self-service their BI needs. Ovum believes a key enabler for achieving self-service BI agility is data mashup technology.

Ovum view

Many BI systems continue to be designed by IT, based on rigid and inflexible data sources. This has created a bottleneck of end-user change requests as business needs constantly change and evolve. The inability of IT to keep up has given rise to "user-centric" system designs that provision business end user with self-service capabilities to satisfy their BI needs and provide relief to overstretched IT departments. This is not a new concept. Just as most motorists expect to pump their own gas without an attendant, corporate business end users increasingly want and expect the right information in their hands at the right time, without the intervention of IT.

BI vendors are making the jump to self-service in different ways. A promising way t o achieve self-service BI is an integration-oriented approach grounded in data mashup technology. Data mashups allow nontechnical business users to easily and quickly access, integrate, and display BI data from a variety of operational data sources, including those that are not integrated into the existing data warehouse, without having to understand the intricacies of the underlying data schemas.

Anatomy of a Data Mashup

Data Mashup Diagram

How Does Data Mashup Eliminate the Need for ETL Processes?

Data mashup is a technique used to integrate data from multiple sources or formats into a single dataset for analysis or visualization purposes. While data mashup can offer benefits in terms of agility and flexibility, it does not necessarily eliminate the need for Extract, Transform, Load (ETL) processes entirely. Instead, it complements traditional ETL processes and can be used in conjunction with them to streamline data integration workflows. Here's how data mashup and ETL processes compare and how they can work together:

  1. Data Mashup:

    • Agile Integration: Data mashup allows users to quickly combine data from different sources or formats without the need for complex transformations or pre-defined schemas. It often involves using visual tools or self-service BI platforms to blend data interactively.
    • Ad Hoc Analysis: Data mashup is well-suited for ad hoc analysis or exploratory data tasks where users need to combine and analyze data on-the-fly without formal ETL processes.
    • User Empowerment: Data mashup empowers business users and analysts to perform data integration tasks without heavy reliance on IT or data engineering teams. It promotes self-service analytics and enables users to access and blend data as needed.
  2. ETL Processes:

    • Structured Data Pipelines: ETL processes involve structured pipelines for extracting data from source systems, transforming it according to predefined business rules or requirements, and loading it into a target data warehouse or data lake.
    • Data Quality and Governance: ETL processes often include data cleansing, normalization, deduplication, and validation steps to ensure data quality and consistency. They also enforce data governance policies and standards.
    • Scalability and Performance: ETL processes are designed for handling large volumes of data efficiently and reliably. They can scale to process data from diverse sources and support complex transformation logic.
    • Batch Processing: ETL processes typically operate in batch mode, scheduled at regular intervals to refresh data warehouses or update analytical datasets. They ensure that data is processed and available for analysis in a consistent and timely manner.

While data mashup can provide agility and flexibility for certain use cases, it may not be suitable for all scenarios, especially those involving large-scale data integration, complex transformations, or strict governance requirements. In many cases, organizations adopt a hybrid approach, leveraging both data mashup and ETL processes based on the specific needs of their use cases:

  • Complementary Approach: Organizations use data mashup for ad hoc analysis, prototyping, or exploratory tasks where agility and self-service capabilities are essential. They rely on ETL processes for structured, governed data integration tasks that require scalability, reliability, and data quality assurance.
  • Integrated Workflows: Data mashup tools and self-service BI platforms may integrate with ETL tools and data integration platforms to enable seamless workflows. For example, users can prototype data mashup scenarios using self-service tools and then operationalize them through automated ETL pipelines for production use.
  • Data Governance and Control: Organizations establish policies and guidelines to govern the use of data mashup tools and self-service capabilities, ensuring that data integration tasks adhere to data quality, security, and compliance standards.