30 Great Visualization Resources in 30 Days
By Ken Hilburn
July 6, 2010
Find more about:
visualization,
tufte,
tools,
charts,
infoviz
A lot of the applications that Juice creates are designed to make information more accessible to people who wouldn't consider themselves to be data experts. They realize the value in the data that they have, and in many cases they have some sort of analytics solution in place, but they know they're not getting as much value from their data as they should.
One of the hurdles we frequently come up against is that people who aren't actively participating in the visualization discussion don't know what's possible. All they've ever seen, in many cases, are the confusing dashboards, charts, and graphs that are all too prevalent from the vendors in our space. You know the ones: a thick layer of technology slathered with some gloss and wiggle, between two slices of "do it yourself".
In many cases, we find ourselves closing this gap by referring to some of the best examples of work out there. As we were thinking about this, the idea to provide a simple walk through of these examples came into being. The result: a 30 day calendar chocked full of some of the best samples of skills enhancing examples we could find.

Each day is a bite sized chunk and takes only a few minutes to watch, read, do, or play. Some of the days are comprised of Juice content, but most days are from other sources that we've found useful.
You can download it to use yourself, or to share with your friends who need to expand their info-viz horizons. Either way, we think it'll get your creative juices flowing.
Tufte-Style Comparison Chart Generator
By Sal Uryasev
May 6, 2008
Find more about:
tufte
pil
comparison
chart
generator
Last week, we shared a rendition of a Tufte graphic using just a few lines of Nodebox code. As our commenters pointed out, Python is great, but it may not be every business analyst's carnal desire to learn a programming language just to generate some nifty graphs. I spent some time to push Chris's Nodebox rendition into a PIL-based Windows tool that can generate the same sort of comparison graph from an Excel file on the fly.
The result is The Comparison Chart Generator 1.0. The installation instructions are relatively simple. Unzip the zip file, and run comparisionchartgenerator.exe.
Alternatively, we have a new excel chart that creates the same effect using only excel functionality. Download the Excel Tufte Line Chart here.
If you are using the Chart Generator, start with some data in an Excel (xls) or Comma Delimited (csv) format. The data for this graph has to be contained within the first sheet starting with cell A1, as in the following picture.

Select an input file. There are a couple example files bundled with the download.

After selecting a file, you'll be prompted to modify a few of the basic options available for the chart.

Finally, save the result as a jpeg.

Here is the same image found in Tufte's textbook processed using the Comparison Chart Generator. It is generated using the csv example file bundled with the download.

Those of us who have undergone lasik eye-improvement surgery may still prefer the sharp crisp Nodebox results, but for the rest of us, this image looks pretty good. Let us know if this tool is useful. If there is enough of a positive response, we may consider expanding functionality for other fancy Tufte-esque charts.
If you do prefer Nodebox, I have an updated script here. This pushes the script up to 20 lines of code or so, but the extra 9 lines allow the labels to push themselves apart on their own. If you want to look at the source code for the Windows program, you can get it here. I used py2exe to compile it into an executable. The code, however, has not been thoroughly commented or cleaned as of yet, so edit it at your own risk.
21 comments | Show all comments only the last 5 are shown
lucas said:
Keep going, guys! I'm looking forward to seeing other Tufte-esque charts here.
And thanks a lot for the Nodebox, what a amazingly useful piece of software!
Asim said:
sal,
it took me a while to put all the pieces together. "using python...using excel..." but i realised that you may be interested in using resolver one:
http://www.resolversystems.com/products/
(i'm certain you've heard of it before, but let me describe it for the benefit of others)
it integrates a spreadsheet environment with a built in ironpython interpreter. that way, you wouldn't have to mess around with PIL and py2exe.
watch the one minute screencast:
http://www.resolversystems.com/screencasts/resolver-one-in-one/
and download it for free under a non-commercial license. big down side: only for windows (i'm a mac user, and don't enjoy working in a virtualised environment).
hope this is of interest to you, take care.
asim
Bilsko said:
Just tried it out on my Vista machine with Excel 2007 and it works great. Of course, I had to save the file as .xls so compchart could read it (it still baffles me that Microsoft had to go and introduce .xlsx as a file type...)
Rob said:
I just tried to run the .exe. file and got an error: "The specified module could not be found. Loadlibrary (pythondl) failed"
Any idea what this means and (more importantly) how to get around it?
Thx
johnny m said:
Awesome! However, all I get are export errors. But you have inspired me to begin to learn Python.
Traceback (most recent call last):
File "comparisonchartgenerator.py", line 247, in <module>
File "Image.pyc", line 1405, in save
File "JpegImagePlugin.pyc", line 409, in _save
File "ImageFile.pyc", line 493, in _save
IOError: encoder error -2 when writing image file
Madelaine said:
Cool, thanks. I might use this for gene expression data sometime.
derek said:
That's very nice. For extra sharp crispness, can you arrange for the imnage to be saved as GIF or PNG? Generally speaking, JPG is a very bad format to choose for graphs. The compression algorithm, which was designed for photographs with their smooth color gradients and few sharp edges, handles text, lines, and solid blocks, with their uniform fields of few colors, and many sharp edges, very badly, and the file is almost never as compact as a GIF acheives.
The image above shows the characteristic "newsprint smudged by fingers" visual effect of text in jpegs, and the file is 57K. You should find a lossless compression format both sharper in appearance and smaller in size.
Sal said:
I picked JPEG as a default since the PNG format is less known within Windows. Functionality for PNG is already included in the program, but is not obvious. When you are offered to save the file, ignore the *.jpg suggestion, and simply name it "whateveryouwant.png". You will have the output converted into the right format.
The GIF format is also built in if you want to try it out, but for some reason the PIL library that I used has not been creating great-looking GIF images. I would avoid them. The PNG looks very nice though.
derek said:
Thanks. Unfortunately, it may call itself a PNG, but it's still got jpeg artifacts. Also, bizarrely, the pseudo-PNG comes out at 60K compared to the jpeg's 40K.
There's no reason for such a simple graphic to have that kind of bloat. At the risk of tooting my own trumpet, see <a href="http://i146.photobucket.com/albums/r264/del_c/politics-charts/DoDDeaths3.png">this 800x600 graph</a>, which I think packs a fair bit more info into only 13.5K.
(and the <a href="http://i146.photobucket.com/albums/r264/del_c/politics-charts/DoDDeaths2small.png">400x300 thumbnail version</a>, designed to fit into the narrow column of a blog, is a mere 3.9K!)
Chris Gemignani said:
Derek,
We've had a number of problems getting a high quality image out of the Python Imaging Library (PIL). For this application, GIF would be best, but PIL was producing some ugly files.
Those graphics are really nice. Excel, too!
We use ImageMagick in house, but we can't package that in an app. A nice approach when using Excel is to output an image slightly bigger than you need then scale it down slightly with ImageMagick. This gives you anti-aliased lines and text that you don't get by default from Excel. It's what we used to produce the Colbert Bump graphs.
Nick said:
Hi,
This looks great! But for some reason the download link for the source for the windows version does not seem to work - I'd love to study the code, to learn how to use basic python to make my own tufte-esque charts.
Christian said:
Thank you for this post, it looks great! I love Tufte's work and read your blog frequently in Google Reader.
The output file (.png or .jpg) could be of a much wider use if it was a .wmf file, because this would enable me to change the colour of one line or text and make any additions I like with Illustrator. Is it possible to get a .wmf version? That would be fantastic.
Sal said:
Code should be accessible.
Most of the code deals with the GUI interface and with parsing excel/csv files. The actual PIL interaction starts around line 196.
I don't believe that PIL actually supports the wmf format. I am fixing up a presentable version of this sort of graph in Excel to add to the next version of chartchooser (http://chartchooser.juiceanalytics.com/). I'll put up a draft version of that when I have it cleaned up - it should be sufficiently editable to not need Illustrator.
Kasper said:
Great tool. One question: Is there a way to change the number of decimals shown? Currently it seems to show just on decimal, whatever the number format in the xls-spreadsheet.
Sal said:
As promised, I posted an excel chart of the same graph. You can find the link near the top of the page.
Jose Hernandez said:
I have an alternative post on a dynamic Excel bumpchart that combines charts with the cell grid. You can donwload it at http://sites.google.com/a/visual-catalyst.com/info_displays/Home/tufte_example_bumpchart.xls?attredirects=0
This display works for all versions of Excel. I'm working on a how to that describes how you can extend this type of chart.
Christof said:
Excellent work. I'm impressed!
John said:
awesome - using it right now. More Tufte style charting programs please!
Andrew said:
Can you do a chart with more than two columns?
Ahem. said:
I think you're missing the point Edward Tufte was making when he made his original chart. Because he took into consideration that the data was all going in the same direction (down) he was able to design a chart where it was pre-planned that there wouldn't be any x's or crossing lines.
(See http://nymag.com/daily/entertainment/2007/06/edward_tufte_and_the_triumph_o.html)
Edward Tufte would find another solution to the data above.
Travis said:
<quote>
"Because he [Tufe] ttook into consideration that the data was all going in the same direction (down) he was able to design a chart where it was pre-planned that there wouldn't be any x's or crossing lines.</quote>
Not true. Do some googling on Tufte and "bumps chart" or "bumps races" for great examples
Add a comment
Real-World Tufte Graphics in 11 Lines of Code
By Chris Gemignani
May 2, 2008
Find more about:
tufte
graphics
design
nodebox
python
One of the troubles with Tufte is the frustrating infeasability of his approach to design for real people in business. One of his recommendations is to use Adobe Illustrator.
Adobe Illustrator is a big serious program that can do almost anything on the visual field (other than Photoshop an image). Most of my sparkline work was done in Illustrator. Fortunately all graphic designers and graphic design students have the program and know how to use it, so find a colleague who knows about graphic design.
Raise your hand if you have a graphic design assistant at your beck and call. I thought not.
One of the tools we use for rapid prototyping at Juice is NodeBox.
NodeBox is a Mac OS X application that lets you create 2D visuals (static, animated or interactive) using Python programming code and export them as a PDF or a QuickTime movie. NodeBox is free and well-documented.
All true. But it's more helpful to think of NodeBox as a free Adobe Illustrator that you can program in the world's easiest programming language. Oops, here's the right link.
I wanted to see if we could reproduce the following graph from The Visual Display of Quantitative Information, p 158.

Here's the code. It's 11 lines of code if you exclude entering the data and setting things like fonts and colors.
size(500,700)
font('Palatino');
fontsize(12)
stroke(0.4) # a medium grey for lines
fill(0.2) # a slightly darker grey for text
<h1>data = (label, first, last, label-fudge-factor)</h1>
data = [ ('Sweden', 46.9, 57.4, 0., 0.),
('Netherlands', 44.0, 55.8, .3, 0.),
('Norway', 43.5, 52.2, 0., 0.),
('Britain', 40.7, 39.0, 0., 0.),
('France', 39.0, 43.4, 0., 0.6),
('Germany', 37.5, 42.9, 0., -0.4),
('Belgium', 35.2, 43.2, 0., 0.),
('Canada', 35.2, 35.8, .8, 0.4),
('Finland', 34.9, 38.2, -0.5, 0.),
('Italy', 30.4, 35.7, 0.3, -0.3),
('United States', 30.3, 32.5, -0.3, 0.),
('Greece', 26.8, 30.6, 0.4, 0.),
('Switzerland', 26.5, 33.2, -0.2, 0.1),
('Spain', 22.5, 27.1, 0., 0.3),
('Japan', 20.7, 26.6, 0., 0.), ]
text("Current Receipts of Goverment as a Percentage of "
"Gross Domestic Product, 1970 and 1979", 20, 70, width=215)
text("1970", WIDTH*.28, HEIGHT*0.03)
text("1979", WIDTH*.68, HEIGHT*0.03)
def ypos(val):
# calculate a vertical position by scaling between 10% and 90%
# of the height of the image
return HEIGHT * (0.9 - 0.8 * (val - minval) / (maxval - minval))
<h1>find the minimum and maximum values in the range</h1>
alldata = [d[1] for d in data] + [d[2] for d in data]
minval, maxval = min(alldata), max(alldata)
for label, start, end, startfudge, endfudge in data:
align(RIGHT)
text(label, 0, ypos(start+startfudge)+4, width=0.25*WIDTH)
text("%0.1f" % start, 0.25*WIDTH, ypos(start+startfudge)+4, width=0.07*WIDTH)
align(LEFT)
text(label, WIDTH*.75, ypos(end+endfudge)+4)
text("%0.1f" % end, 0.68*WIDTH, ypos(end+endfudge)+4, width=0.07*WIDTH)
line(WIDTH*.33, ypos(start), WIDTH*.67, ypos(end))
Here's what the result looks like.

We have some great followups to this planned for next week. We'll reimplement this code with the Python Imaging Library, which will open things up for Windows users. We have some great plans for mashing these graphics up with our just released Google Analytics API.
23 comments | Show all comments only the last 5 are shown
Clint said:
Chris,
You tout that "It's 11 lines of code if you exclude entering the data and setting things like fonts and colors"
How long did it take you to code and what's the comparable length of time for a designer in Illustrator?
Seems to me that savvy python scriptors are just as rare as designers so I'm not sure there's a winner here.
Asim said:
If you're looking at visualisation using Python check out R:
http://www.r-project.org/
and the corresponding Python package:
http://rpy.sourceforge.net/
Here are some examples of using R:
http://addictedtor.free.fr/graphiques/thumbs.php
I've used an R/RPy combination successfully in work and academic assignments. Once downside is that it's difficult to set up RPy on Linux/Macs.
Tony said:
I'm with Clint on this one. Six in one, half dozen in another... If you are a programmer, sure, Python would be the obvious choice.
My thinking is that people would rather learn Illustrator where the work is visible versus Python where it's a lot of unfamiliar characters in specific strings that translates into an end product.
Now to your credit, Python and Nodebox don't cost $599 like Illustrator. So that's a big plus if you have programming skills or want to learn them.
Chris Gemignani said:
@Clint/Tony:
It's leverage, leverage, leverage. The code solution lets me produce 1000 graphs for no more than the cost of producing one. It lets me produce next months graph for no more than the cost of producing this one. It lets me build an API like http://code.google.com/apis/chart/. Admittedly this takes yet more skills and experience, but the problem is getting easier not harder.
Of course there's a benefit to free and open source too. I don't need a purchase order, I don't need to talk to my boss, to get something done, etc.
The time spent for this project is probably about the same as Illustrator. It was about 10 minutes to get to basic, working code. Then an hour of extra primping to make it pretty for the blog. Frankly, I'm not really sure how someone would produce an accurate technical drawing like this in Illustrator. Tufte mentions a Excel data import function, but that sounds like extra complexity too.
@Asim: I'm aware of the R stuff. We don't use it, but it's great. This NodeBox approach is more pixel-perfect particularly if you're seeking a very specific look.
Sal said:
Code, however, is very reusable. Your programmer only needs to create it once, and then, with minor adjustments, any similar graph can be drawn by non-programmers.
Tony said:
@ Chris - Great point!
Nick said:
Speak of a coincidence - While browsing Tufte's site looking for advice on programs that make tables like this (actually looking for a way to reproduce the cancer survival rates ones) I followed a link to this site on cleaning up Excel graphs and end up finding what I was looking for in the first place! I was even thinking about learning to use python to do it too....
Scott Zakrajsek said:
First, I just wanted to say that I love the tips on this blog.
I'm really interested in the follow up and would like to see the flexibility and visual aesthetic of Python. I'm still a big fan of Illustrator, I think that knowledge of a few basic AI tools can deliver a large variety of graph types. Check out this link, a great example of clean data visualization done w/ illustrator:
http://feltron.com/index.php?/content/2007_annual_report/P0/
Brendan O'Connor said:
I actually use <a href="http://www.statmethods.net/">R</a> for static data visualizations like this. (e.g. a <a href="http://blog.doloreslabs.com/?p=11">color wheel of words</a>.) It's definitely a weird choice, but (1) I think its data management and mathematical list operations are easier than Python or Ruby, and (2) it has a small amount of GUI integration. I see that NodeBox is a bit better than PIL on those points though...
Shane said:
Great post. I'm very interested in hearing about other methods that dont require OS X.
Andrew said:
Very true insights on using Adobe Illustrator. My background isn't in graphic design, and I haven't spent a number of years taking courses in AI. I generally find that Adobe's software is obtuse and confusing. Perhaps it is easy to use for fanatics, but for occasional, reluctant users it's a nightmarish experience.
Thanks for providing an alternative for the rest of us.
Mike said:
For Linux users NodeBox can be run using QT, there is some info about installing at http://dev.nodebox.net/wiki/Qt
Using this method NodeBox is running fine for me but the code above shows an error:
Traceback (most recent call last):
File "/home/luser/try-qt/nodebox/gui/qt/__init__.py", line 534, in _compileScript
self._code = compile(source + "\n\n", self.scriptName, "exec")
File "<untitled>", line 7
<h1>data = (label, first, last, label-fudge-factor)</h1>
^
SyntaxError: invalid syntax
(not sure if this is a problem with the code or the NodeBox QT version)
Mike said:
Update- got QT NodeBox to run on Ubuntu 8.04 and run the updated script from http://media.juiceanalytics.com/downloads/tufte_nodebox_forcepush.py just fine!
The font('Palatino') command was still showing an error but it worked fine with that line removed ;)
Big thumbs up for Tufte on Linux using NodeBox :D
Sal said:
Whoa - nice find Mike. I have it running on Ubuntu 8.04, and will definitely use this in server-side applications.
I think I found the bug with the font setup in the Nodebox Qt code. If you open up /try-qt/nodebox/graphics/qt.py and go to line 884, and change 'return f.exactMatch()' to 'return f', the font feature works again. You can even download the Palatino font and point to it with the full path.
Pradeep Gowda said:
I've implemented a in-browser vresion of this graphic using Javascript and processing.js library.
http://pradeepgowda.com/programming/tuft-graphics-processingjs.html
Jonno said:
I'm a statistican and have had similar frustrations with implementing interpretable graphs. A common tool I would use for this is R - a free statistical programming language with excellent graphing capabilities. The code would be about the same length as Node box (at a guess).
http://www.r-project.org/
Chris Gemignani said:
Who's up for a multi-language infographics shootout?
Tim said:
That's cool !
I was wondering if there was a way to generate these graphics through command line ? that way we could embed this in web application and get the graphics generated dynamically
note: looks like comments in your code got converted to html (# -> h1)
Kragen Javier Sitaker said:
Is there a way to get old-style numerals with NodeBox? I suppose you have to find an installed font on your Mac with old-style numerals.
Pradeep's processing.js demo is awesome, but from the screenshot lacks antialiasing. (I'm not yet a Firefox 3 Achiever.)
Luke said:
Dude, why reproduce the errors ("fudge factors") in the original?
The Dude said:
@Luke: Dude, the fudge factors are not errors. They are there so that the text labels do not overlap.
Michael Galloy said:
I made an IDL implementation, the results are <a href="http://michaelgalloy.com/wp-content/uploads/2008/08/receipts.png">here</a>. It wasn't too bad to have it automatically compute the fudge factors (at least in simple cases).
Ahem. said:
I think you're missing the point Edward Tufte was making when he made his original chart. Because he took into consideration that the data was all going in the same direction (down) he was able to design a chart where it was pre-planned that there wouldn't be any x's or crossing lines.
(See http://nymag.com/daily/entertainment/2007/06/edward_tufte_and_the_triumph_o.html)
Edward Tufte would find another solution to the data above.
Add a comment
New Year’s Resolution: Tufte and the iPhone
By Chris Gemignani
January 24, 2008
Find more about:
interface
sparklines
tufte
visualization
Edward Tufte has produced a illuminating video tour of the user interface of the iPhone. The video illustrates Tufte’s struggles to come to grips with the difference between dynamic screen resolution and the resolution of printed paper. Tufte is prone to grandiose pronouncements, like this one:
All history of improvements in human communication is written in terms of improvements in resolution: to produce, for viewers of evidence, more bits per unit time, and more bits per unit area. Slideware is contrary to that history. Trading in reductions in resolution for user convenience or for pitching may be useful in mass market products or in commercial art, but not for technical communications. The solution is not to rescue slideware design; the solution is to use a different, better, and content-driven presentation method. On this solution, see our thread PowerPoint Does Rocket Science—and Better Techniques for Technical Reports — Tufte Nov 10 2006
Somehow, I don’t think the importance of the Gutenberg Bible related to it showing “more bits per unit area.” Quick, count the “bits per unit area.”


It didn’t take bits per unit area to revolutionize communication in the past and it won’t in the future either. The iPhone is a tremendously engaging information device and points the way forward for information displays. Here’s what the iPhone does well:
Maximize screen real estate: Controls are only visible when needed, fading away gently when you are concentrating on content. Tufte furiously neologizes, calling this “computer information debris.” Control junk is more apt, more terse, more Tuftian.
Direct manipulation: As Tufte says: information is the interface. Filtering and choosing should take place in the context of direct manipulation. A good essay on the possibilities of direct manipulation can be found here.
Fun: Above all, information can be fun and engaging to navigate. Tufte condemns Apple’s stock ticker for having “cartoony” and PowerPoint-like displays and offers an improved version (with 5 digits of precision). Apple’s cheery display offers a more entertaining, usable interface for day-to-day usage.
With our empathy for the day-to-day troubles of the business person seeking insight in data, it’s frustrating listening to Tufte. He is clearly an academic, with academic interests and academic timeframes. As much as his work is respected and inspirational within business circles, he makes little effort to enable his message to be implemented.
Good Tufte: Clutter and overload are not an attribute of information, they are failures of design. If the information is in chaos, don’t start throwing out information, instead fix the design.
Bad Tufte: “…the conclusion of sparkline analysis in Beautiful Evidence, where the idea is to make our data graphics at least operate at the resolution of good typography (say 2400 dpi).” http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msgid=0002NC&topicid=1 *Ed: At least 2400 dpi? Orly?
Mostly right Tufte: “Thus the iPhone got it mostly right.”
Mostly wrong Tufte: “Adobe Illustrator is a big serious program that can do almost anything on the visual field (other than Photoshop an image). Most of my sparkline work was done in Illustrator. Fortunately all graphic designers and graphic design students have the program and know how to use it, so find a colleague who knows about graphic design.” http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msgid=0000Jr&topicid=1&topic=Ask%20E%2eT%2e
It is heartening to see Tufte engage and connect his mental frameworks to our modern, screen-oriented, graphics-accelerated, not graphics-designed world. But the future of information design and interaction belongs to the iPhone, not the printed page.
3 comments
ross said:
Nice post, thanks for making it, I found in interesting and I think it's good that people are prepared to quest Tufte, who seems to have rightly or wrongly some God like stature.
For my part, I have used TyTN's series since mark 1 and these, running windows mobile, have had all of the features (more or less) of the iPhone for some time. Compromise in the key with small devices. - Untill we get screen that can project into air! :-)
Cheers
Ross
mahalie said:
It's always folly to never question anything someone says just because you have a lot of respect for their ideas generally. Yet I see many bloggers flame well-respected experts...probably as traffic bait. So great to hear a voice of reason. Thanks!
darrell said:
"To clarify add Detail" - as an example, Tufte adds a satellite weather pattern to augment a weather forecast of X degrees and Partly Cloudy. How does that clarify? You need expertise to interpret it, and it didn't offer analysis / interpretation just raw data (satellite view).
I understand his point if you're presenting to a panel of experts. But the iPhone is sold to consumers, not weather forecasters.
Few of us are weather forecasting expertise (beyond idle speculation). Using the satellite video, a non-expert could probably guess, the degree of cloudy, and perhaps the direction of the wind. Other useful info like wind speeds, wind chill factor, probability of precipitation and temperature are not aided by the satellite visual.
Eye candy; yes. Useful; only to a limited expert audience, and only with additional information not displayed.
"To Clarify; first consider the audience, then add relevant detail."
Add a comment
Excel 2007 and the Lie Factor
By Chris Gemignani
June 7, 2007
Find more about:
analytics
excel
sparklines
tufte
visualization
“The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the quantities represented.”
Edward Tufte calls violation of this principle the “Lie Factor”. The implementation of in-cell data bars in Microsoft Excel 2007 is a big offender.
Almost a year ago, I was surprised to discover that the Microsoft Excel 2007 development team didn’t understand what zero means. Their implementation of in-cell data bars showed a bar in a cell, even if the cell had a zero or very low value.

That was in the Excel 2007 Beta. Things haven’t improved in the current version of Excel 2007. The default setting for data bars in Excel 2007 is to scale to bars so that the smallest bar is based on the smallest value in the selected range and the largest bar is based on the largest value. It still appears that the smallest bar will be no smaller than five or ten percent of the width of the cell. Here’s a sample:

So, if you select a range that has values between 600 and 700, the 600 would have a little bitty bar and the 700 would have a full-width bar. Based on the bars, it would look like the 700 is ten to twenty times larger than 600. Outside of Redmond, this is generally regarded as untrue.
What’s more, if you create two sets of data bars side by side, each group of data bars scales itself independently even though they look the same. Take a look at this screenshot:

Notice the top seven cells have data bars that have one set of scaling and the bottom data bars have a different scaling. However, they look identical, and users should generally expect these bars to have the same scale.
Here are the rules:
- Defaults matter! It doesn’t matter that you can do data bars correctly in Excel. The default should be to do it right and it should be hard to do it wrong.
- The “right way” to make data bars is to make the length of the data bar directly proportional to the value in the cell. If one cell has a value twice another it should have a bar that is twice as long.
- Remove the default gradient shading. The gradient makes it hard to tell where the bar ends, obscuring what you’re trying to show.
- Continuous cells with data bars should all use the same scale. Use different colors to indicate ranges that have different scales.
Excel 2007 supports at least twenty-five different combinations of ways of specifying the length of the data bar.

Exactly one of those ways is correct. Base the shortest bar on the number 0. Base the longest bar on the highest value. Turn off the gradient. If you want to see bars based off percentile or some custom formula, then be explicit. Create a new column, create your formula, create bars on that column.
Please, guys, this isn’t rocket science. This is plain common sense. You would not ship Microsoft Word with a glaring bug in the way text renders. You would not ship Excel with a broken statistical function that people use everyday. Delivering deceitful-by-design infographics betrays your central role in democratizing the analysis of data. Until you fix this, in-cell ASCII art still remains the best way to explore data visually.
A disclosure: We do not currently use Excel 2007 at Juice Analytics. This is not due to a high-minded sense of moral outrage but is merely a reflection of our clients' environments.
9 comments | Show all comments only the last 5 are shown
James L. said:
“The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the quantities represented.”
Although unrelated to the article, I was wondering what your take would be on the use of logarithmic scaling? This violates "direct proportionality", but is quite common in scientific/engineering fields (I myself used it the other day).
Chris Gemignani said:
My initial thought is that logarithmic scaling doesn't work in the context of in-cell graphing. Log scaling would be really hard to use without an axis and is probably best when you're comparing time-series trends in a line chart. Thanks.
Will Oswald said:
"You would not ship Excel with a broken statistical function"... erm, unless you include the LINEST function that up until Excel 2003 did not adjust for collinearity in multiple regressions, a fundamental problem
Chris Gemignani said:
You noticed I qualified my statement with "that people use everyday". I have heard about this problem and others in with Excel's statistical functions. These problems should have been fixed as soon as they were reported.
R Varley said:
Hi, I'm trying to write an evaluation document on Excel 2007; everyone seems to think it's rubbish for statistics, but no-one says what's wrong. I've been trawling the internet for days, and turned up nothing beyond "Everyone knows it's broken". Can you give me any pointers?
Thanks.
Patrick O'Beirne said:
1st Oct 2007:
Data Bars – Feedback Please
Today’s author: Scott Ruble, the program manager who leads the charting and visualization efforts in Excel. Scott is looking for some feedback on potential changes to data bar behaviour.
http://blogs.msdn.com/excel/archive/2007/10/01/data-bars-feedback-please.aspx
Chris Gemignani said:
Patrick, I commented on the Excel databars post. I'm astonished that these questions keep coming up. The solution is simple: "You need to start with the absolute principle that the bars you show _must_ be proportional to the numbers they represent."
Greg said:
Is there a way to turn off the gradient fill in the data bars?
Matt Craig said:
I have used another way to represent data bars "in cells", but it is "clunky". I create a bar chart, turn off every interface element except for the bars, make the bars slightly transparent (if I still want to see the numbers), and then put it on top of the cells.
It works, but it's "clunky" - the scaling and placement is finicky. It is pretty nice when it's done though, as you can apply appropriate rules to the display of the graph. (e.g. 0 = 0 length), and it gets rid of the annoying gradient (the reason I did it in the first place)
Add a comment
Earlier writing





8 comments | Show all comments only the last 5 are shown
oli said:
I'd love to learn more on this subject but I am having trouble. None of the URLs work. Is it just me?
Thx
Oli
Jack Lucky said:
Thanks for making this lesson. lvu it.
Week 1 day 4: is paid content (can you please have a look in to it)
Ken said:
@Jack Lucky: Bummer. This is new... and annoying. You can (as I'm sure you've already considered) register for a free 14 day trial to see the article (no credit card required), but I can certainly understand the hesitation to do so.
To help you better decide if it's worth enduring the registration process, here is a summary of the article:
It starts with a discussion of factors driving Martin Wattenburg's 1998 creation of the treemap (one of our favorite visualizations - in Zach's mind, there's a treemap hidden in _every_ data set... somewhere). It then continues on by discussing other advanced visualizations such as presidential word clouds, Wikipedia change timelines (the only image shown), and infographics. It concludes with a discussion of how the brain is better at processing visual information.
It's a who's who list of information visualizations ground breakers such as Wattenburg, Viegas, Tufte, NYT, and Tableau. It's one report in a nine report series including "Data, data everywhere", "Needle in a haystack", and "Handling the cornucopia." Keep in mind however, that it shows only one image of the many examples that it discussions - sort of ironic, huh?
We apologize for the required registration reference, but hopefully this will help.
Chris said:
Thanks for putting this together, nicely done. Just curious, do you have any other examples of guides that use a similar format? I posted the link and wrote a couple of paragraphs about it on my blog at http://freshspectrum.com
paresh said:
Apart from spreading this to people who are already initiated into the world of data visualization, guys reading the data visualization blogs, we should also spread it to others who may only be peripherally aware of this field. Doing my bit - spreading it among finance and accounting professionals [Linkedin Group].
Ken said:
@Paresh - Yes! Thanks for helping others "see."
Nemo said:
Thanks, but why are you giving URLs in a PDF document and not a simple web page ? (pdf viewers are not web browsers, and your links in Acrobat reader on my Mac are not clikables !).
James said:
and for some tardy responses,
@Chris - Glad you found it helpful for you and your readers. I'm curious myself if there are other materials presented this way! If you find any, do share. It was simply my effort in always reevaluating how we present information.
@Nemo - The links should be working on the latest version of Adobe Reader (9.3.3) from www.adobe.com
said:
Add a comment