Excel 2007 and the Lie Factor

“The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the quantities represented.”

Edward Tufte calls violation of this principle the “Lie Factor”. The implementation of in-cell data bars in Microsoft Excel 2007 is a big offender.

Almost a year ago, I was surprised to discover that the Microsoft Excel 2007 development team didn’t understand what zero means. Their implementation of in-cell data bars showed a bar in a cell, even if the cell had a zero or very low value.

Data bars in the Excel 2007 prototype

That was in the Excel 2007 Beta. Things haven’t improved in the current version of Excel 2007. The default setting for data bars in Excel 2007 is to scale to bars so that the smallest bar is based on the smallest value in the selected range and the largest bar is based on the largest value. It still appears that the smallest bar will be no smaller than five or ten percent of the width of the cell. Here’s a sample:

Sample data bars in Excel 2007

So, if you select a range that has values between 600 and 700, the 600 would have a little bitty bar and the 700 would have a full-width bar. Based on the bars, it would look like the 700 is ten to twenty times larger than 600. Outside of Redmond, this is generally regarded as untrue.

What’s more, if you create two sets of data bars side by side, each group of data bars scales itself independently even though they look the same. Take a look at this screenshot:

Sample data bars from two different conditional formats in Excel 2007

Notice the top seven cells have data bars that have one set of scaling and the bottom data bars have a different scaling. However, they look identical, and users should generally expect these bars to have the same scale.

Here are the rules:

  1. Defaults matter! It doesn’t matter that you can do data bars correctly in Excel. The default should be to do it right and it should be hard to do it wrong.
  2. The “right way” to make data bars is to make the length of the data bar directly proportional to the value in the cell. If one cell has a value twice another it should have a bar that is twice as long.
  3. Remove the default gradient shading. The gradient makes it hard to tell where the bar ends, obscuring what you’re trying to show.
  4. Continuous cells with data bars should all use the same scale. Use different colors to indicate ranges that have different scales.

Excel 2007 supports at least twenty-five different combinations of ways of specifying the length of the data bar.

Five different ways of setting data bars

Exactly one of those ways is correct. Base the shortest bar on the number 0. Base the longest bar on the highest value. Turn off the gradient. If you want to see bars based off percentile or some custom formula, then be explicit. Create a new column, create your formula, create bars on that column.

Please, guys, this isn’t rocket science. This is plain common sense. You would not ship Microsoft Word with a glaring bug in the way text renders. You would not ship Excel with a broken statistical function that people use everyday. Delivering deceitful-by-design infographics betrays your central role in democratizing the analysis of data. Until you fix this, in-cell ASCII art still remains the best way to explore data visually.

A disclosure: We do not currently use Excel 2007 at Juice Analytics. This is not due to a high-minded sense of moral outrage but is merely a reflection of our clients’ environments.