This is the continuation of the transcript of a Webinar hosted by InetSoft on the topic of "Agile BI: How Data Virtualization and Data Mashup Help" The speaker is Mark Flaherty, CMO at InetSoft.
That’s the first step in a data virtualization process, to create this normalized or base model disparate data. The second step is to transform it, improve its quality and integrate it into new data models. The third step is to expose this information through as a virtual relational database or as a Web service, such as an XML feed, or something that can be accessed as a Java Object or as a Java Portlet or as a SharePoint Web Part. The reason you want to do that is so you have reusability of these data models to serve multiple applications.
You are basically providing virtualized access. Now in runtime, any external application would call the dashboard or report created in this data mashup platform. The platform would understand the query, optimize it and may decide automatically or in design time whether to pull real-time or cached data, in which case a scheduler is invoked to pre-fetch the data.
You are not doing a full-blown replication of a data store. You are only selectively using caches or a scheduler to complement virtualization. There are a lot of optimization techniques like push-down delegation, asynchronous access, parallel access, selecting automatically the types of joins, which we will touch on briefly later, if we have time.Finally from the management and monitoring perspective, since you are now using this as your virtual data layer, you need to understand governance and metadata. How are my different data models coming together? Do I have economical models that these applications are using? How are they going to fit with the physical models? What is the change impact analysis? You use it to propagate changes to the data models in terms of security and control.
All of those things are also part of what the data virtualization platforms have got to do. With that understanding, the best in class in this category really need to realize value from all data types. They need to provide flexible integration options to virtualization which minimize data replication. But they should not be so rigid that they don’t allow for replication when called for using either caching strategies or scheduled pre-fetch preload strategies.
The data mashup tool also needs to integrate with a lot of the common enterprise architecture
infrastructure, such as LDAP for security, single sign on, other modeling tools, etc. There have to be
performance and scalability options combined with governance and flexibility features.
Now that
we have a sense of what is data virtualization, at least at the high level view, let's go back and understand
how it fits in enterprise architecture, and we will look specifically at some customer examples.
We already touched on several of these points. One way to look at this is, it isn’t a matter of what type of application we are accessing, whether SOA, or transactional applications or some more of the persistent data stores. The data mashup platform is providing a unifying effect across both types of applications.
One thing we are going to look at is how this is just one representation of the unification of potential users of three big data blocks, operational, transactional and informational. Within that I have given a couple of examples, so it could be a BPM oriented application that could be highly machine driven like in a telco. It could be provisioning. Or it could be human driven business process like claims processing.
Customer services have very common cuts across all industries. How do you talk to the customer, take an order, up-sell, cross-sell, fix a technical problem etc? But they are all related to dealing with source systems that are transactional systems in one shape or form. And they typically may use a BPM or EII and a messaging or transactional kind of system. Then over on the other side, you have more informational applications. Informational applications could be business intelligence types of applications where they are providing contextual information to the transactional application.
Yes—data virtualization and data mashup solve related problems, but they are not the same. Both aim to let people work with data from multiple sources without painful copying and manual wrangling. However, they differ in architecture, governance, performance characteristics, and the kinds of use cases they serve best. Think of data virtualization as the “plumbing and contracts” layer for unified access, and data mashup as the “assembly and transformation” layer that shapes data for specific analytical questions.
Data virtualization creates an abstraction layer with semantic models and views. Client tools query those views as if they were single tables. Under the hood, the virtualization engine plans and pushes down queries to each source, stitches results, applies security, and returns the answer. There is usually minimal or no persistent storage in the virtualization tier; it is primarily a routing and optimization brain, sometimes with caching.
Data mashup tools, by contrast, offer a curated workspace to blend data—sometimes using the virtualized views as inputs, sometimes connecting directly to sources. They provide pipelines or step-by-step recipes (join sales to targets, map currencies, fix date grains, compute rolling 90-day metrics). The output might be an in-memory dataset, a refreshed extract, or a governed dataset published to dashboards.
Dimension | Data Virtualization | Data Mashup |
---|---|---|
Primary Goal | Unified, governed access to distributed data; minimize duplication. | Rapid combination and transformation of data for specific analyses. |
Location of Data | Stays in original systems; accessed through a logical layer. | May stay virtual, be cached, or be extracted to a model or dataset. |
Users | Data engineers, platform teams; sometimes power analysts. | BI developers, analysts, finance/ops power users. |
Governance | Centralized semantic layer, role-based policies, lineage. | Project-level governance; recipes/pipelines are auditable but more decentralized. |
Performance | Live federation; relies on pushdown and source performance. | Often preps and shapes data; can materialize for fast dashboards. |
Change Management | Versioned views/semantic models; change once, affect many consumers. | Changes scoped to the mashup/pipeline; faster iteration for a given report. |
Best For | Real-time access, minimizing data movement, enforcing enterprise-wide definitions. | Custom transformations, scenario-specific blends, rapid prototyping. |
The overlap is real: both can present a “single source of truth” experience to end users. Many platforms blur the line by offering virtualized connections plus transformation steps. In practice, a mashup can consume virtualized views, and a virtualization layer can expose pre-transformed composite views that feel like mashups. The difference is emphasis: virtualization is about how you access data (logically and securely), mashup is about what you do with that data (combine, clean, reshape) to answer questions.
The smartest architecture often uses both. A data virtualization layer standardizes access, security, and semantics across source systems. On top of that, a mashup layer performs business-specific blending and transformation, producing certified datasets for dashboards and self-service analysis. This division of labor keeps the enterprise contract stable while letting domains innovate at the edges.
Imagine finance wants a consolidated revenue dashboard. Data lives in the ERP (actuals), CRM (deals), and a data lake (usage). Data virtualization exposes clean, governed views: erp.revenue_actuals, crm.bookings, and lake.usage_events. A data mashup pipeline then:
Governance and access are handled by the virtualization tier; speed and analytic shaping are delivered by the mashup pipeline. Everyone wins.
Previous: Speed of Data Integration with a Data Mashup Platform |