Excel 2007 and the Lie Factor
By Chris Gemignani
June 7, 2007
Find more about:
analytics
excel
sparklines
tufte
visualization
“The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the quantities represented.”
Edward Tufte calls violation of this principle the “Lie Factor”. The implementation of in-cell data bars in Microsoft Excel 2007 is a big offender.
Almost a year ago, I was surprised to discover that the Microsoft Excel 2007 development team didn’t understand what zero means. Their implementation of in-cell data bars showed a bar in a cell, even if the cell had a zero or very low value.

That was in the Excel 2007 Beta. Things haven’t improved in the current version of Excel 2007. The default setting for data bars in Excel 2007 is to scale to bars so that the smallest bar is based on the smallest value in the selected range and the largest bar is based on the largest value. It still appears that the smallest bar will be no smaller than five or ten percent of the width of the cell. Here’s a sample:

So, if you select a range that has values between 600 and 700, the 600 would have a little bitty bar and the 700 would have a full-width bar. Based on the bars, it would look like the 700 is ten to twenty times larger than 600. Outside of Redmond, this is generally regarded as untrue.
What’s more, if you create two sets of data bars side by side, each group of data bars scales itself independently even though they look the same. Take a look at this screenshot:

Notice the top seven cells have data bars that have one set of scaling and the bottom data bars have a different scaling. However, they look identical, and users should generally expect these bars to have the same scale.
Here are the rules:
- Defaults matter! It doesn’t matter that you can do data bars correctly in Excel. The default should be to do it right and it should be hard to do it wrong.
- The “right way” to make data bars is to make the length of the data bar directly proportional to the value in the cell. If one cell has a value twice another it should have a bar that is twice as long.
- Remove the default gradient shading. The gradient makes it hard to tell where the bar ends, obscuring what you’re trying to show.
- Continuous cells with data bars should all use the same scale. Use different colors to indicate ranges that have different scales.
Excel 2007 supports at least twenty-five different combinations of ways of specifying the length of the data bar.

Exactly one of those ways is correct. Base the shortest bar on the number 0. Base the longest bar on the highest value. Turn off the gradient. If you want to see bars based off percentile or some custom formula, then be explicit. Create a new column, create your formula, create bars on that column.
Please, guys, this isn’t rocket science. This is plain common sense. You would not ship Microsoft Word with a glaring bug in the way text renders. You would not ship Excel with a broken statistical function that people use everyday. Delivering deceitful-by-design infographics betrays your central role in democratizing the analysis of data. Until you fix this, in-cell ASCII art still remains the best way to explore data visually.
A disclosure: We do not currently use Excel 2007 at Juice Analytics. This is not due to a high-minded sense of moral outrage but is merely a reflection of our clients' environments.





9 comments | Show all comments only the last 5 are shown
James L. said:
“The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the quantities represented.”
Although unrelated to the article, I was wondering what your take would be on the use of logarithmic scaling? This violates "direct proportionality", but is quite common in scientific/engineering fields (I myself used it the other day).
Chris Gemignani said:
My initial thought is that logarithmic scaling doesn't work in the context of in-cell graphing. Log scaling would be really hard to use without an axis and is probably best when you're comparing time-series trends in a line chart. Thanks.
Will Oswald said:
"You would not ship Excel with a broken statistical function"... erm, unless you include the LINEST function that up until Excel 2003 did not adjust for collinearity in multiple regressions, a fundamental problem
Chris Gemignani said:
You noticed I qualified my statement with "that people use everyday". I have heard about this problem and others in with Excel's statistical functions. These problems should have been fixed as soon as they were reported.
R Varley said:
Hi, I'm trying to write an evaluation document on Excel 2007; everyone seems to think it's rubbish for statistics, but no-one says what's wrong. I've been trawling the internet for days, and turned up nothing beyond "Everyone knows it's broken". Can you give me any pointers?
Thanks.
Patrick O'Beirne said:
1st Oct 2007:
Data Bars – Feedback Please
Today’s author: Scott Ruble, the program manager who leads the charting and visualization efforts in Excel. Scott is looking for some feedback on potential changes to data bar behaviour.
http://blogs.msdn.com/excel/archive/2007/10/01/data-bars-feedback-please.aspx
Chris Gemignani said:
Patrick, I commented on the Excel databars post. I'm astonished that these questions keep coming up. The solution is simple: "You need to start with the absolute principle that the bars you show _must_ be proportional to the numbers they represent."
Greg said:
Is there a way to turn off the gradient fill in the data bars?
Matt Craig said:
I have used another way to represent data bars "in cells", but it is "clunky". I create a bar chart, turn off every interface element except for the bars, make the bars slightly transparent (if I still want to see the numbers), and then put it on top of the cells.
It works, but it's "clunky" - the scaling and placement is finicky. It is pretty nice when it's done though, as you can apply appropriate rules to the display of the graph. (e.g. 0 = 0 length), and it gets rid of the annoying gradient (the reason I did it in the first place)
said:
Add a comment