InetSoft Webinar: The First Step in a Data Virtualization Process

This is the continuation of the transcript of a Webinar hosted by InetSoft on the topic of "Agile BI: How Data Virtualization and Data Mashup Help" The speaker is Mark Flaherty, CMO at InetSoft.

That’s the first step in a data virtualization process, to create this normalized or base model disparate data. The second step is to transform it, improve its quality and integrate it into new data models. The third step is to expose this information through as a virtual relational database or as a Web service, such as an XML feed, or something that can be accessed as a Java Object or as a Java Portlet or as a SharePoint Web Part. The reason you want to do that is so you have reusability of these data models to serve multiple applications.

You are basically providing virtualized access. Now in runtime, any external application would call the dashboard or report created in this data mashup platform. The platform would understand the query, optimize it and may decide automatically or in design time whether to pull real-time or cached data, in which case a scheduler is invoked to pre-fetch the data.

You are not doing a full-blown replication of a data store. You are only selectively using caches or a scheduler to complement virtualization. There are a lot of optimization techniques like push-down delegation, asynchronous access, parallel access, selecting automatically the types of joins, which we will touch on briefly later, if we have time.

example of a dashboard built from a data virtualization process

More Dashboard Examples

Read Reviews

#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index.

What Are the Key Considerations for Implementing Data Virtualization?

Finally from the management and monitoring perspective, since you are now using this as your virtual data layer, you need to understand governance and metadata. How are my different data models coming together? Do I have economical models that these applications are using? How are they going to fit with the physical models? What is the change impact analysis? You use it to propagate changes to the data models in terms of security and control.

All of those things are also part of what the data virtualization platforms have got to do. With that understanding, the best in class in this category really need to realize value from all data types. They need to provide flexible integration options to virtualization which minimize data replication. But they should not be so rigid that they don’t allow for replication when called for using either caching strategies or scheduled pre-fetch preload strategies.

The data mashup tool also needs to integrate with a lot of the common enterprise architecture infrastructure, such as LDAP for security, single sign on, other modeling tools, etc. There have to be performance and scalability options combined with governance and flexibility features.
Now that we have a sense of what is data virtualization, at least at the high level view, let's go back and understand how it fits in enterprise architecture, and we will look specifically at some customer examples.

View live interactive examples in InetSoft's dashboard and visualization gallery.

Unifying Effect of Data Mashup

We already touched on several of these points. One way to look at this is, it isn’t a matter of what type of application we are accessing, whether SOA, or transactional applications or some more of the persistent data stores. The data mashup platform is providing a unifying effect across both types of applications.

One thing we are going to look at is how this is just one representation of the unification of potential users of three big data blocks, operational, transactional and informational. Within that I have given a couple of examples, so it could be a BPM oriented application that could be highly machine driven like in a telco. It could be provisioning. Or it could be human driven business process like claims processing.

Customer services have very common cuts across all industries. How do you talk to the customer, take an order, up-sell, cross-sell, fix a technical problem etc? But they are all related to dealing with source systems that are transactional systems in one shape or form. And they typically may use a BPM or EII and a messaging or transactional kind of system. Then over on the other side, you have more informational applications. Informational applications could be business intelligence types of applications where they are providing contextual information to the transactional application.

Learn the advantages of InetSoft's small footprint BI platform.

Is There a Difference Between Data Virtualization and Data Mashup?

Yes—data virtualization and data mashup solve related problems, but they are not the same. Both aim to let people work with data from multiple sources without painful copying and manual wrangling. However, they differ in architecture, governance, performance characteristics, and the kinds of use cases they serve best. Think of data virtualization as the “plumbing and contracts” layer for unified access, and data mashup as the “assembly and transformation” layer that shapes data for specific analytical questions.

Quick Definitions

Data Virtualization: A logical data access layer that exposes a unified view of disparate sources (databases, warehouses, lakes, SaaS apps) without physically moving or duplicating the data. It focuses on federation, real-time or near–real-time queries, security enforcement, and governance. It is typically administered centrally by data engineering or data platform teams.
Data Mashup: The process (and tooling) of combining, transforming, filtering, and enriching data from multiple sources—often closer to the analytics edge—to answer a particular question or drive a specific report/dashboard. It emphasizes agility, rapid iteration, and business-friendly transformation steps (joins, unions, calculations, lookups).

How They Typically Work

Data virtualization creates an abstraction layer with semantic models and views. Client tools query those views as if they were single tables. Under the hood, the virtualization engine plans and pushes down queries to each source, stitches results, applies security, and returns the answer. There is usually minimal or no persistent storage in the virtualization tier; it is primarily a routing and optimization brain, sometimes with caching.

Data mashup tools, by contrast, offer a curated workspace to blend data—sometimes using the virtualized views as inputs, sometimes connecting directly to sources. They provide pipelines or step-by-step recipes (join sales to targets, map currencies, fix date grains, compute rolling 90-day metrics). The output might be an in-memory dataset, a refreshed extract, or a governed dataset published to dashboards.

Core Differences

Dimension	Data Virtualization	Data Mashup
Primary Goal	Unified, governed access to distributed data; minimize duplication.	Rapid combination and transformation of data for specific analyses.
Location of Data	Stays in original systems; accessed through a logical layer.	May stay virtual, be cached, or be extracted to a model or dataset.
Users	Data engineers, platform teams; sometimes power analysts.	BI developers, analysts, finance/ops power users.
Governance	Centralized semantic layer, role-based policies, lineage.	Project-level governance; recipes/pipelines are auditable but more decentralized.
Performance	Live federation; relies on pushdown and source performance.	Often preps and shapes data; can materialize for fast dashboards.
Change Management	Versioned views/semantic models; change once, affect many consumers.	Changes scoped to the mashup/pipeline; faster iteration for a given report.
Best For	Real-time access, minimizing data movement, enforcing enterprise-wide definitions.	Custom transformations, scenario-specific blends, rapid prototyping.

Where They Overlap

The overlap is real: both can present a “single source of truth” experience to end users. Many platforms blur the line by offering virtualized connections plus transformation steps. In practice, a mashup can consume virtualized views, and a virtualization layer can expose pre-transformed composite views that feel like mashups. The difference is emphasis: virtualization is about how you access data (logically and securely), mashup is about what you do with that data (combine, clean, reshape) to answer questions.

Common Use Cases

Data Virtualization

Expose a unified customer 360 view that pulls from CRM, billing, and support systems without replicating all tables.
Provide governed, role-aware access to sensitive facts across regions with different data residency rules.
Enable ad hoc queries across multiple operational databases where ETL latency is unacceptable.

Data Mashup

Join actuals from the ERP with pipeline from CRM and budgets from spreadsheets to produce a quarterly variance dashboard.
Blend product telemetry with marketing campaigns to build an attribution model and a cohort analysis.
Normalize supplier data from CSV drops, apply currency conversions, and publish a cleansed spend analytics data set.

Strengths and Trade-Offs

Virtualization Strengths

Minimal duplication: Reduces copy sprawl and keeps data fresh by reading at source.
Global governance: Central policies and semantics; great for compliance-heavy environments.
Speed to access: New sources can be exposed quickly without building full pipelines first.

Virtualization Trade-Offs

Performance variability: Federation depends on source pushdown and network; complex joins across slow systems can drag.
Operational coupling: Reporting load can impact operational systems unless guarded by caching and workload management.
Transformation limits: Deep, multi-step reshaping is possible but can be harder to maintain purely in virtual views.

Mashup Strengths

Analytical agility: Rapid iteration for business questions; easy to compose transformations and test scenarios.
Performance control: Can materialize outputs or cache shaped datasets for fast dashboards.
Transparency of logic: Stepwise pipelines make it clear how each metric is derived.

Mashup Trade-Offs

Potential duplication: Extracted or cached datasets can proliferate without discipline.
Local governance: Unless centralized, different teams may implement slightly different definitions.
Refresh management: Pipelines must be scheduled and monitored, which introduces operational overhead.

How They Complement Each Other

The smartest architecture often uses both. A data virtualization layer standardizes access, security, and semantics across source systems. On top of that, a mashup layer performs business-specific blending and transformation, producing certified datasets for dashboards and self-service analysis. This division of labor keeps the enterprise contract stable while letting domains innovate at the edges.

Decision Guide

Need live, governed access to many sources with minimal movement? Start with data virtualization.
Need to reshape, calculate, and join disparate data quickly for reporting? Reach for data mashup.
Operating under strict compliance with shared definitions? Virtualization-first, with curated mashups built on top.
Performance-sensitive dashboards with complex transformations? Mashup + materialize (and optionally cache) for speed, sourcing through virtualization when helpful.

Practical Example

Imagine finance wants a consolidated revenue dashboard. Data lives in the ERP (actuals), CRM (deals), and a data lake (usage). Data virtualization exposes clean, governed views: erp.revenue_actuals, crm.bookings, and lake.usage_events. A data mashup pipeline then:

Maps products and currencies to a standard catalog and reporting currency.
Joins bookings to usage and actuals at the monthly grain.
Computes rolling 3-, 6-, and 12-month metrics and cohort retention.
Publishes a “Revenue_Mart_Certified” dataset for dashboards and Excel pull-through.

Governance and access are handled by the virtualization tier; speed and analytic shaping are delivered by the mashup pipeline. Everyone wins.

Previous: Speed of Data Integration with a Data Mashup Platform

Next: The Need to Aggregate Information from Multiple Sources