Real-World Tufte Graphics in 11 Lines of Code
By Chris Gemignani
May 2, 2008
Find more about:
tufte
graphics
design
nodebox
python
One of the troubles with Tufte is the frustrating infeasability of his approach to design for real people in business. One of his recommendations is to use Adobe Illustrator.
Adobe Illustrator is a big serious program that can do almost anything on the visual field (other than Photoshop an image). Most of my sparkline work was done in Illustrator. Fortunately all graphic designers and graphic design students have the program and know how to use it, so find a colleague who knows about graphic design.
Raise your hand if you have a graphic design assistant at your beck and call. I thought not.
One of the tools we use for rapid prototyping at Juice is NodeBox.
NodeBox is a Mac OS X application that lets you create 2D visuals (static, animated or interactive) using Python programming code and export them as a PDF or a QuickTime movie. NodeBox is free and well-documented.
All true. But it's more helpful to think of NodeBox as a free Adobe Illustrator that you can program in the world's easiest programming language. Oops, here's the right link.
I wanted to see if we could reproduce the following graph from The Visual Display of Quantitative Information, p 158.

Here's the code. It's 11 lines of code if you exclude entering the data and setting things like fonts and colors.
size(500,700)
font('Palatino');
fontsize(12)
stroke(0.4) # a medium grey for lines
fill(0.2) # a slightly darker grey for text
<h1>data = (label, first, last, label-fudge-factor)</h1>
data = [ ('Sweden', 46.9, 57.4, 0., 0.),
('Netherlands', 44.0, 55.8, .3, 0.),
('Norway', 43.5, 52.2, 0., 0.),
('Britain', 40.7, 39.0, 0., 0.),
('France', 39.0, 43.4, 0., 0.6),
('Germany', 37.5, 42.9, 0., -0.4),
('Belgium', 35.2, 43.2, 0., 0.),
('Canada', 35.2, 35.8, .8, 0.4),
('Finland', 34.9, 38.2, -0.5, 0.),
('Italy', 30.4, 35.7, 0.3, -0.3),
('United States', 30.3, 32.5, -0.3, 0.),
('Greece', 26.8, 30.6, 0.4, 0.),
('Switzerland', 26.5, 33.2, -0.2, 0.1),
('Spain', 22.5, 27.1, 0., 0.3),
('Japan', 20.7, 26.6, 0., 0.), ]
text("Current Receipts of Goverment as a Percentage of "
"Gross Domestic Product, 1970 and 1979", 20, 70, width=215)
text("1970", WIDTH*.28, HEIGHT*0.03)
text("1979", WIDTH*.68, HEIGHT*0.03)
def ypos(val):
# calculate a vertical position by scaling between 10% and 90%
# of the height of the image
return HEIGHT * (0.9 - 0.8 * (val - minval) / (maxval - minval))
<h1>find the minimum and maximum values in the range</h1>
alldata = [d[1] for d in data] + [d[2] for d in data]
minval, maxval = min(alldata), max(alldata)
for label, start, end, startfudge, endfudge in data:
align(RIGHT)
text(label, 0, ypos(start+startfudge)+4, width=0.25*WIDTH)
text("%0.1f" % start, 0.25*WIDTH, ypos(start+startfudge)+4, width=0.07*WIDTH)
align(LEFT)
text(label, WIDTH*.75, ypos(end+endfudge)+4)
text("%0.1f" % end, 0.68*WIDTH, ypos(end+endfudge)+4, width=0.07*WIDTH)
line(WIDTH*.33, ypos(start), WIDTH*.67, ypos(end))
Here's what the result looks like.

We have some great followups to this planned for next week. We'll reimplement this code with the Python Imaging Library, which will open things up for Windows users. We have some great plans for mashing these graphics up with our just released Google Analytics API.





23 comments | Show all comments only the last 5 are shown
Clint said:
Chris,
You tout that "It's 11 lines of code if you exclude entering the data and setting things like fonts and colors"
How long did it take you to code and what's the comparable length of time for a designer in Illustrator?
Seems to me that savvy python scriptors are just as rare as designers so I'm not sure there's a winner here.
Asim said:
If you're looking at visualisation using Python check out R:
http://www.r-project.org/
and the corresponding Python package:
http://rpy.sourceforge.net/
Here are some examples of using R:
http://addictedtor.free.fr/graphiques/thumbs.php
I've used an R/RPy combination successfully in work and academic assignments. Once downside is that it's difficult to set up RPy on Linux/Macs.
Tony said:
I'm with Clint on this one. Six in one, half dozen in another... If you are a programmer, sure, Python would be the obvious choice.
My thinking is that people would rather learn Illustrator where the work is visible versus Python where it's a lot of unfamiliar characters in specific strings that translates into an end product.
Now to your credit, Python and Nodebox don't cost $599 like Illustrator. So that's a big plus if you have programming skills or want to learn them.
Chris Gemignani said:
@Clint/Tony:
It's leverage, leverage, leverage. The code solution lets me produce 1000 graphs for no more than the cost of producing one. It lets me produce next months graph for no more than the cost of producing this one. It lets me build an API like http://code.google.com/apis/chart/. Admittedly this takes yet more skills and experience, but the problem is getting easier not harder.
Of course there's a benefit to free and open source too. I don't need a purchase order, I don't need to talk to my boss, to get something done, etc.
The time spent for this project is probably about the same as Illustrator. It was about 10 minutes to get to basic, working code. Then an hour of extra primping to make it pretty for the blog. Frankly, I'm not really sure how someone would produce an accurate technical drawing like this in Illustrator. Tufte mentions a Excel data import function, but that sounds like extra complexity too.
@Asim: I'm aware of the R stuff. We don't use it, but it's great. This NodeBox approach is more pixel-perfect particularly if you're seeking a very specific look.
Sal said:
Code, however, is very reusable. Your programmer only needs to create it once, and then, with minor adjustments, any similar graph can be drawn by non-programmers.
Tony said:
@ Chris - Great point!
Nick said:
Speak of a coincidence - While browsing Tufte's site looking for advice on programs that make tables like this (actually looking for a way to reproduce the cancer survival rates ones) I followed a link to this site on cleaning up Excel graphs and end up finding what I was looking for in the first place! I was even thinking about learning to use python to do it too....
Scott Zakrajsek said:
First, I just wanted to say that I love the tips on this blog.
I'm really interested in the follow up and would like to see the flexibility and visual aesthetic of Python. I'm still a big fan of Illustrator, I think that knowledge of a few basic AI tools can deliver a large variety of graph types. Check out this link, a great example of clean data visualization done w/ illustrator:
http://feltron.com/index.php?/content/2007_annual_report/P0/
Brendan O'Connor said:
I actually use <a href="http://www.statmethods.net/">R</a> for static data visualizations like this. (e.g. a <a href="http://blog.doloreslabs.com/?p=11">color wheel of words</a>.) It's definitely a weird choice, but (1) I think its data management and mathematical list operations are easier than Python or Ruby, and (2) it has a small amount of GUI integration. I see that NodeBox is a bit better than PIL on those points though...
Shane said:
Great post. I'm very interested in hearing about other methods that dont require OS X.
Andrew said:
Very true insights on using Adobe Illustrator. My background isn't in graphic design, and I haven't spent a number of years taking courses in AI. I generally find that Adobe's software is obtuse and confusing. Perhaps it is easy to use for fanatics, but for occasional, reluctant users it's a nightmarish experience.
Thanks for providing an alternative for the rest of us.
Mike said:
For Linux users NodeBox can be run using QT, there is some info about installing at http://dev.nodebox.net/wiki/Qt
Using this method NodeBox is running fine for me but the code above shows an error:
Traceback (most recent call last):
File "/home/luser/try-qt/nodebox/gui/qt/__init__.py", line 534, in _compileScript
self._code = compile(source + "\n\n", self.scriptName, "exec")
File "<untitled>", line 7
<h1>data = (label, first, last, label-fudge-factor)</h1>
^
SyntaxError: invalid syntax
(not sure if this is a problem with the code or the NodeBox QT version)
Mike said:
Update- got QT NodeBox to run on Ubuntu 8.04 and run the updated script from http://media.juiceanalytics.com/downloads/tufte_nodebox_forcepush.py just fine!
The font('Palatino') command was still showing an error but it worked fine with that line removed ;)
Big thumbs up for Tufte on Linux using NodeBox :D
Sal said:
Whoa - nice find Mike. I have it running on Ubuntu 8.04, and will definitely use this in server-side applications.
I think I found the bug with the font setup in the Nodebox Qt code. If you open up /try-qt/nodebox/graphics/qt.py and go to line 884, and change 'return f.exactMatch()' to 'return f', the font feature works again. You can even download the Palatino font and point to it with the full path.
Pradeep Gowda said:
I've implemented a in-browser vresion of this graphic using Javascript and processing.js library.
http://pradeepgowda.com/programming/tuft-graphics-processingjs.html
Jonno said:
I'm a statistican and have had similar frustrations with implementing interpretable graphs. A common tool I would use for this is R - a free statistical programming language with excellent graphing capabilities. The code would be about the same length as Node box (at a guess).
http://www.r-project.org/
Chris Gemignani said:
Who's up for a multi-language infographics shootout?
Tim said:
That's cool !
I was wondering if there was a way to generate these graphics through command line ? that way we could embed this in web application and get the graphics generated dynamically
note: looks like comments in your code got converted to html (# -> h1)
Kragen Javier Sitaker said:
Is there a way to get old-style numerals with NodeBox? I suppose you have to find an installed font on your Mac with old-style numerals.
Pradeep's processing.js demo is awesome, but from the screenshot lacks antialiasing. (I'm not yet a Firefox 3 Achiever.)
Luke said:
Dude, why reproduce the errors ("fudge factors") in the original?
The Dude said:
@Luke: Dude, the fudge factors are not errors. They are there so that the text labels do not overlap.
Michael Galloy said:
I made an IDL implementation, the results are <a href="http://michaelgalloy.com/wp-content/uploads/2008/08/receipts.png">here</a>. It wasn't too bad to have it automatically compute the fudge factors (at least in simple cases).
Ahem. said:
I think you're missing the point Edward Tufte was making when he made his original chart. Because he took into consideration that the data was all going in the same direction (down) he was able to design a chart where it was pre-planned that there wouldn't be any x's or crossing lines.
(See http://nymag.com/daily/entertainment/2007/06/edward_tufte_and_the_triumph_o.html)
Edward Tufte would find another solution to the data above.
said:
Add a comment