InetSoft Webinar: Telling the Truth with Your Data

This is the continuation of the transcript of a webinar hosted by InetSoft in January 2018 on the topic of "Data Visualization How To Techniques." The speaker is Abhishek Gupta, product manager at InetSoft.

The next point is talking about telling the truth with your data. It’s called graphical integrity. As a viewer, as a consumer of data and graphical data, there’s nothing more annoying than being lied to by a report, or by a chart, or by a graph, and yet this happens all the time.

There are a few different ways that people will lie with their graphs. One way is that they will misrepresent the data that they’re showing. They’ll misrepresent changes in the data. Very often they’ll use the length of a line or the size of an object to represent the relative growth of something. So, you’ll see lines getting longer and longer over time to indicate that a number gets larger and larger over time.

But unless that object gets larger where the same proportion to the number gets larger, then that object is lying to you. So for example, here’s a graph that shows the mandated fuel economy for automobiles in the 1970’s. The government mandated the average fuel economy had to be 18 miles per gallon in 1978 ,and it went up every year until it was mandated to be 27.5 miles per gallon in 1985.

So over that period, that fuel economy was mandated to go up about 50% or so. But the graph that’s showing that information shows lines to represent those numbers. Those lines get bigger and bigger by about 8-10 times. So, a 50% increase in data is shown by about an 800% increase in the size of these lines that represent that data.

The graph is totally lying to us and Tufty actually came up with a formula to describe this or measure this. He called it the lie factor. The lie factor is the size of the effect that’s shown in the graphic, the size of the effect of the data. So, if you’ve got an 800% increase in the size of your graphic change to represent only a 50% increase in the size of your data, then your lie factor is 800% divided by 50% which is what? 16.

#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index Read More

So, you’re lying quite a bit. And this is the picture that we’re talking about here. This is from the New York Times. Now, these are credible news sources doing this. And sometimes they do it deliberately, but more often than not they do it because they want to make it look pretty. They think to make it look pretty is more important than actually showing the truthful data.

How do you show data that is one-dimensional? How do people show that? Do they really show it fairly? This is a really common problem. What you’ll see a lot of times is data points tend to be one-dimensional data points. I’ve got a graph I’m looking at here. It’s the number of doctors devoted to family practice for different years.

In 1964, it was 27%, 1975 it was 16%, and 1990 it was 12%. Those are three numbers. Those are single points. Each number represents just a single dimension of data—just a point. But the graph I’m looking at represents each one of those points by a picture of a doctor—a nice little drawing of a doctor holding a clipboard.

And the problem with this is that these pictures are two-dimensional. So, what the picture is doing is it has the height of each doctor. It’s proportional to those numbers—27, 16, and 12. But the picture doesn’t just have height. The picture also has width and as you know the size of an object is proportional both to its size and its width.

data intelligence icon
Learn how InetSoft's native big data application is specifically designed for a big data operating system.

And so, if you’re presenting one-dimensional data with a two-dimensional object the size of that object tends to be over-exaggerated as it gets larger because both the width and the height are both growing. And so, in this case here—27% and 12%--we’ve got a difference of what is that—about 100%? It a little bit more than doubled between 12-27%.

But these pictures—even though they’re twice as high, they’re actually about four times the size because when you double the height you also double the width. So, this is a lie factor of about four here because it’s representing 100% data increase with the 400% increase in size. That’s a really common problem.

Previous: Charts Built Without Computers