Real-World Tufte Graphics in 11 Lines of Code
By Chris Gemignani
May 2, 2008
Find more about:
tufte
graphics
design
nodebox
python
One of the troubles with Tufte is the frustrating infeasability of his approach to design for real people in business. One of his recommendations is to use Adobe Illustrator.
Adobe Illustrator is a big serious program that can do almost anything on the visual field (other than Photoshop an image). Most of my sparkline work was done in Illustrator. Fortunately all graphic designers and graphic design students have the program and know how to use it, so find a colleague who knows about graphic design.
Raise your hand if you have a graphic design assistant at your beck and call. I thought not.
One of the tools we use for rapid prototyping at Juice is NodeBox.
NodeBox is a Mac OS X application that lets you create 2D visuals (static, animated or interactive) using Python programming code and export them as a PDF or a QuickTime movie. NodeBox is free and well-documented.
All true. But it's more helpful to think of NodeBox as a free Adobe Illustrator that you can program in the world's easiest programming language. Oops, here's the right link.
I wanted to see if we could reproduce the following graph from The Visual Display of Quantitative Information, p 158.

Here's the code. It's 11 lines of code if you exclude entering the data and setting things like fonts and colors.
size(500,700)
font('Palatino');
fontsize(12)
stroke(0.4) # a medium grey for lines
fill(0.2) # a slightly darker grey for text
<h1>data = (label, first, last, label-fudge-factor)</h1>
data = [ ('Sweden', 46.9, 57.4, 0., 0.),
('Netherlands', 44.0, 55.8, .3, 0.),
('Norway', 43.5, 52.2, 0., 0.),
('Britain', 40.7, 39.0, 0., 0.),
('France', 39.0, 43.4, 0., 0.6),
('Germany', 37.5, 42.9, 0., -0.4),
('Belgium', 35.2, 43.2, 0., 0.),
('Canada', 35.2, 35.8, .8, 0.4),
('Finland', 34.9, 38.2, -0.5, 0.),
('Italy', 30.4, 35.7, 0.3, -0.3),
('United States', 30.3, 32.5, -0.3, 0.),
('Greece', 26.8, 30.6, 0.4, 0.),
('Switzerland', 26.5, 33.2, -0.2, 0.1),
('Spain', 22.5, 27.1, 0., 0.3),
('Japan', 20.7, 26.6, 0., 0.), ]
text("Current Receipts of Goverment as a Percentage of "
"Gross Domestic Product, 1970 and 1979", 20, 70, width=215)
text("1970", WIDTH*.28, HEIGHT*0.03)
text("1979", WIDTH*.68, HEIGHT*0.03)
def ypos(val):
# calculate a vertical position by scaling between 10% and 90%
# of the height of the image
return HEIGHT * (0.9 - 0.8 * (val - minval) / (maxval - minval))
<h1>find the minimum and maximum values in the range</h1>
alldata = [d[1] for d in data] + [d[2] for d in data]
minval, maxval = min(alldata), max(alldata)
for label, start, end, startfudge, endfudge in data:
align(RIGHT)
text(label, 0, ypos(start+startfudge)+4, width=0.25*WIDTH)
text("%0.1f" % start, 0.25*WIDTH, ypos(start+startfudge)+4, width=0.07*WIDTH)
align(LEFT)
text(label, WIDTH*.75, ypos(end+endfudge)+4)
text("%0.1f" % end, 0.68*WIDTH, ypos(end+endfudge)+4, width=0.07*WIDTH)
line(WIDTH*.33, ypos(start), WIDTH*.67, ypos(end))
Here's what the result looks like.

We have some great followups to this planned for next week. We'll reimplement this code with the Python Imaging Library, which will open things up for Windows users. We have some great plans for mashing these graphics up with our just released Google Analytics API.
Analytics Roundup: TIps for showing, sharing, communicating
By Chris Gemignani
December 6, 2007
Find more about:
Business_Intelligence
analytics
business
charts
excel
google
graphics
graphs
powerpoint
presentation
- Developer's Guide - Google Chart API - Google Code
- Beautiful stuff, particularly the Venn diagram.
- Align Journal - BI Worst Practices
- We often see articles on BI "Best Practices" here is an article telling us what NOT to do.
- flot - Google Code
- Attractive Javascript plotting for jQuery.
- ongoing ยท On Communication
- Interesting blog post about how different forms of communication rank for immediacy, lifespan, and audience reached.
- The Excel Magician: 70+ Excel Tips and Shortcuts to help you make Excel Magic : Codswallop
- SlideShare
- Source for presentation ideas.
Analytics Roundup: infographics and visualizations
By Chris Gemignani
October 29, 2007
Find more about:
census
design
graphics
mapping
movie
nytimes
video
visualization
zipcode
- Visualization for the Masses: Information Graphics and the New York Times
- He explained how a 30-person team creates the impressive infographics and visualizations we see on the newspaper every week.
- information r/evolution movie
- This video explores the changes in the way we find, store, create, critique & share information, a nice video illustration of some of Shirkey's essays.
- demographics by ZIP Code - ZIPskinny
- Colorful visualization comparing demographic attributes of zip codes.
1 comment
saraw1 said:
I love your site and I LOVE your work! This is amazing stuff. BTW, I know the ZIP skinny link does not refer to your work, but since you asked for comments, I'd like to point out that zip codes are not a meaningful unit for socio-demo-geographic analysis. They were developed for the sole purpose of facilitating mail delivery. For the aforementioned analysis, the US Census's categories are far better.
Add a comment
Analytics Roundup
By Ken Hilburn
September 14, 2007
Find more about:
analytics
email
graphics
nytimes
productivity
social_network
statistics
trends
video
visualization
- Nielsen/NetRatings' August social media numbers: Not much change
- Interesting post I stumbled on related to Nielsen's web analytics service. Several references to "juicy" or "juiciness".
- Inbox Zero
- Merlin Mann on cleaning your e-mail inbox.
- The New York Times > Home Prices Across the Nation
- The most interesting / important part may be the talking head in the lower left, should you be annotating your reports with video?
- Introduction to Statistical Thought—free ebook
- 1) explains how statisticians think about data
2) introduces modern statistical computing
3) as lots of real examples
Analytics Roundup: Chicken presentation and so much more
By Ken Hilburn
August 18, 2007
Find more about:
ai
algorithms
blog
business
collaborative-filtering
datamining
excel
graphics
graphs
humor
intelligence
machinelearning
netflix
powerpoint
presentations
squarepie
statistics
trends
typography
usability
userexperience
video
visualization
- Programming Collective Intelligence
- Pulling information from community contributed data.
- Videos that can change your organization
- Top ten business videos on YouTube.
- The Encyclopedia of Business Cliches
- UC Berkeley CS160 User Interfaces Fall 06
- Course readings and student notes.
- Language Log: Chicken: the PowerPoint Presentation
- The presentation you dare not give.
- Prometheus Meets the Enterprise Management System
- I laughed, I cried, I laughed again.
- Diagrams: Tools and Tutorials
- Data Visualization: Modern Approaches
- A grab bag of ideas.
- fontblog : Introducing Ambiguity
- A typographic symbol to indicate ambiguity, compare to the typographic mark lol which indicates stupidity.
- Whimsley: The Netflix Prize: 300 Days Later
- Process Trends Website
- Good excel charting and visualization tips.
- BusinessWeek: Who Participates And What People Are Doing Online
- A simple and fairly effective use of square pies.
Earlier writing





23 comments | Show all comments only the last 5 are shown
Clint said:
Chris,
You tout that "It's 11 lines of code if you exclude entering the data and setting things like fonts and colors"
How long did it take you to code and what's the comparable length of time for a designer in Illustrator?
Seems to me that savvy python scriptors are just as rare as designers so I'm not sure there's a winner here.
Asim said:
If you're looking at visualisation using Python check out R:
http://www.r-project.org/
and the corresponding Python package:
http://rpy.sourceforge.net/
Here are some examples of using R:
http://addictedtor.free.fr/graphiques/thumbs.php
I've used an R/RPy combination successfully in work and academic assignments. Once downside is that it's difficult to set up RPy on Linux/Macs.
Tony said:
I'm with Clint on this one. Six in one, half dozen in another... If you are a programmer, sure, Python would be the obvious choice.
My thinking is that people would rather learn Illustrator where the work is visible versus Python where it's a lot of unfamiliar characters in specific strings that translates into an end product.
Now to your credit, Python and Nodebox don't cost $599 like Illustrator. So that's a big plus if you have programming skills or want to learn them.
Chris Gemignani said:
@Clint/Tony:
It's leverage, leverage, leverage. The code solution lets me produce 1000 graphs for no more than the cost of producing one. It lets me produce next months graph for no more than the cost of producing this one. It lets me build an API like http://code.google.com/apis/chart/. Admittedly this takes yet more skills and experience, but the problem is getting easier not harder.
Of course there's a benefit to free and open source too. I don't need a purchase order, I don't need to talk to my boss, to get something done, etc.
The time spent for this project is probably about the same as Illustrator. It was about 10 minutes to get to basic, working code. Then an hour of extra primping to make it pretty for the blog. Frankly, I'm not really sure how someone would produce an accurate technical drawing like this in Illustrator. Tufte mentions a Excel data import function, but that sounds like extra complexity too.
@Asim: I'm aware of the R stuff. We don't use it, but it's great. This NodeBox approach is more pixel-perfect particularly if you're seeking a very specific look.
Sal said:
Code, however, is very reusable. Your programmer only needs to create it once, and then, with minor adjustments, any similar graph can be drawn by non-programmers.
Tony said:
@ Chris - Great point!
Nick said:
Speak of a coincidence - While browsing Tufte's site looking for advice on programs that make tables like this (actually looking for a way to reproduce the cancer survival rates ones) I followed a link to this site on cleaning up Excel graphs and end up finding what I was looking for in the first place! I was even thinking about learning to use python to do it too....
Scott Zakrajsek said:
First, I just wanted to say that I love the tips on this blog.
I'm really interested in the follow up and would like to see the flexibility and visual aesthetic of Python. I'm still a big fan of Illustrator, I think that knowledge of a few basic AI tools can deliver a large variety of graph types. Check out this link, a great example of clean data visualization done w/ illustrator:
http://feltron.com/index.php?/content/2007_annual_report/P0/
Brendan O'Connor said:
I actually use <a href="http://www.statmethods.net/">R</a> for static data visualizations like this. (e.g. a <a href="http://blog.doloreslabs.com/?p=11">color wheel of words</a>.) It's definitely a weird choice, but (1) I think its data management and mathematical list operations are easier than Python or Ruby, and (2) it has a small amount of GUI integration. I see that NodeBox is a bit better than PIL on those points though...
Shane said:
Great post. I'm very interested in hearing about other methods that dont require OS X.
Andrew said:
Very true insights on using Adobe Illustrator. My background isn't in graphic design, and I haven't spent a number of years taking courses in AI. I generally find that Adobe's software is obtuse and confusing. Perhaps it is easy to use for fanatics, but for occasional, reluctant users it's a nightmarish experience.
Thanks for providing an alternative for the rest of us.
Mike said:
For Linux users NodeBox can be run using QT, there is some info about installing at http://dev.nodebox.net/wiki/Qt
Using this method NodeBox is running fine for me but the code above shows an error:
Traceback (most recent call last):
File "/home/luser/try-qt/nodebox/gui/qt/__init__.py", line 534, in _compileScript
self._code = compile(source + "\n\n", self.scriptName, "exec")
File "<untitled>", line 7
<h1>data = (label, first, last, label-fudge-factor)</h1>
^
SyntaxError: invalid syntax
(not sure if this is a problem with the code or the NodeBox QT version)
Mike said:
Update- got QT NodeBox to run on Ubuntu 8.04 and run the updated script from http://media.juiceanalytics.com/downloads/tufte_nodebox_forcepush.py just fine!
The font('Palatino') command was still showing an error but it worked fine with that line removed ;)
Big thumbs up for Tufte on Linux using NodeBox :D
Sal said:
Whoa - nice find Mike. I have it running on Ubuntu 8.04, and will definitely use this in server-side applications.
I think I found the bug with the font setup in the Nodebox Qt code. If you open up /try-qt/nodebox/graphics/qt.py and go to line 884, and change 'return f.exactMatch()' to 'return f', the font feature works again. You can even download the Palatino font and point to it with the full path.
Pradeep Gowda said:
I've implemented a in-browser vresion of this graphic using Javascript and processing.js library.
http://pradeepgowda.com/programming/tuft-graphics-processingjs.html
Jonno said:
I'm a statistican and have had similar frustrations with implementing interpretable graphs. A common tool I would use for this is R - a free statistical programming language with excellent graphing capabilities. The code would be about the same length as Node box (at a guess).
http://www.r-project.org/
Chris Gemignani said:
Who's up for a multi-language infographics shootout?
Tim said:
That's cool !
I was wondering if there was a way to generate these graphics through command line ? that way we could embed this in web application and get the graphics generated dynamically
note: looks like comments in your code got converted to html (# -> h1)
Kragen Javier Sitaker said:
Is there a way to get old-style numerals with NodeBox? I suppose you have to find an installed font on your Mac with old-style numerals.
Pradeep's processing.js demo is awesome, but from the screenshot lacks antialiasing. (I'm not yet a Firefox 3 Achiever.)
Luke said:
Dude, why reproduce the errors ("fudge factors") in the original?
The Dude said:
@Luke: Dude, the fudge factors are not errors. They are there so that the text labels do not overlap.
Michael Galloy said:
I made an IDL implementation, the results are <a href="http://michaelgalloy.com/wp-content/uploads/2008/08/receipts.png">here</a>. It wasn't too bad to have it automatically compute the fudge factors (at least in simple cases).
Ahem. said:
I think you're missing the point Edward Tufte was making when he made his original chart. Because he took into consideration that the data was all going in the same direction (down) he was able to design a chart where it was pre-planned that there wouldn't be any x's or crossing lines.
(See http://nymag.com/daily/entertainment/2007/06/edward_tufte_and_the_triumph_o.html)
Edward Tufte would find another solution to the data above.
said:
Add a comment