Different Business Groups Developing Different Data Models

Below is the transcript of a Webinar hosted by InetSoft on the topic of "Managing Data Complexity." The presenter is Mark Flaherty, Vice President of Marketing at InetSoft.

Mark Flaherty (MF):A different level of complexity looks at the historical development of application, so it used to be that we would have one large amount of computer that was being used for all of our batch applications. But as time-sharing became more of the norm, and then came the evolution of workgroup computing, we ended up with a kind of divestment of centralized control of our management of our information. This meant that we have got different groups developing different data models for essentially the same context.

Every organization or every group within company is looking at customers. Every group is looking at sales. Every group is looking at customer service, and so on. And so we end up with, instead of having one view of what a customer is, we end up with six or seven or eight, none of which is aligned with the other ones. And then, we have this requirement to centralize all that data or pull it into some data warehouse or some data mart or some master data repository.

But already the level of complexity that is associated with the variability of the way the applications have developed leads into our next level of complexity, which is the structural complexity that if we all understand. We all know what a person is or what a customer is. Except in one table, we have defined it in terms of a first name and a last name with a certain size and data type.

#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index Read More

Maybe it's a variable character string, 25 characters long, while in another data set, we might instead be representing a social security number as a numeric value of size nine digits, and other one might be looking at a social security number with a different name, but it's a nine character string.

Another data set might have a social security number using the data element name SSN, but it's an 11 character string. Telephone numbers are notorious for the variety in formats and structures, I guess if you were to look at that stack of business cards that is sitting on your desk that you have collected from the numerous places you have been, you probably find eight, nine or ten different ways that telephone numbers are represented.

So even though, we may even have a structured representation of similar concepts in different data sets and different applications, there is bound to be some level of variety and therefore, some level of complexity in the structures underlying those data sets.

And then we can even drill down a little bit more to another level of granularity which is the underlying meaning of those concepts? And of course, we all know what a customer is, don’t we? I already know the people are smiling and shaking their head, because they’ve spent hours having this argument and meetings about getting a single definition for the customer or a single definition for a product or a single structure for customer. And the fact is that that there are many different definitions for the same term, and we don’t define them specifically, because we all use them in different context and they all have different meanings.

For example, you can have a customer from the sales perspective which is the person who pays for the product or service or exchanges money for the products or service. But on the other hand, the customer support staff provides supports to customers, and typically that is somebody who has a copy or a license for the use of the product. And it's very possible that these two sets of customers don’t overlap.

When working with databases, you might encounter situations where customer data is stored in separate tables. This can happen if, for example, you have a table for online customers and another for in-store customers. These sets of customers might not overlap – someone who shops online might not be the same person who shops in-store. Joining these tables for analysis can be tricky because traditional joins (like inner joins) will only return rows where there's a match on the join condition. To address this, you can use different join types depending on your desired outcome.

If you want to see all customers, regardless of whether they shop online or in-store, you can use a left outer join or a right outer join. These joins will include all rows from one table and matching rows from the other, with null values for unmatched columns. Alternatively, a full outer join will include all rows from both tables, even if there's no match on the join condition, resulting in a combination of matched and unmatched data. Choosing the right join type depends on your specific needs. Outer joins are helpful for getting a complete picture of your customer base, including those who don't interact with both channels. But, they can also lead to a wider result set that might require additional filtering. Gemini may display inaccurate info, including about people, so double-check its responses.

Previous: Example of an Advanced Visualization
Next: Mobile BI