tutorial

Recreating the NY Times Cancer Graph

This New York Times cancer graph is a beautiful piece of work.

NY Times cancer graphic

I wanted to see if we could reproduce it with everyday tools.

Excel reproduction of the NY Times cancer graphic

Click here to watch a screencast showing how it was done. Warning the screencast is a little long—14 minutes—and a little unpolished. One cut, no retakes, banzai analytics!

Derek raised an interesting question about how to find the fonts used by the New York Times. While I don’t think you can find a high quality free version of these fonts (Helvetica Neue, Univers?), Microsoft has made some very good new fonts for Vista and these are also available to Microsoft Office users through a compatibility pack. Here’s a link or google for "microsoft office compatibility pack". I recommend using these fonts.

Here’s a version of the graph with these new fonts and more emphasis on getting the typography right.

Excel reproduction of the NY Times cancer graphic with better fonts

Why make 100 charts when one will do?

Charts are a great way to explore data. Here is some American baseball data showing player salaries over a five year period.

Baseball salaries by team over time

Charting this data with a line chart would allow us to see trends in salaries by team. However, when we use Excel’s default chart, we get something that looks like this:

Excel’s default multiline chart

That’s quite a mess. It would be a lot easier if we could create one chart for each row.

The OFFSET function is going to help. In its simplest form the OFFSET function works like this:

OFFSET(anchor, rows from anchor, columns from anchor)

That is, OFFSET will start with the anchor cell, go down a number of rows from that anchor and over a number of columns and return the value it finds.

OFFSET function

We can use the OFFSET function to create cells that pull a single row of data out of the table dynamically. We create a new row atop of our data and create a series of OFFSET functions that all rely on a single cell (the big yellow one) for their row offset. So changing one cell will pull different rows of data into our fixed location.

Creating a dynamic row that doesn’t move

Now, chart the data that doesn’t move.

Charting the dynamic row

After fixing the chart, we’d like to make it easy to change the value in the big yellow cell.

We can use Excel Forms to build a lightweight user interface. Bring up the Excel forms toolbar by rightclicking on any toolbar and choosing Forms. Place a scrollbar beside the chart.

Excel Forms

Right clicking on the scrollbar allows you to Format Control. Link the control to the cell that is controlling all the row offsets. Now, moving the scrollbar will update the chart.

Chart with scrollbar
Selecting Format Control
Formatting the scrollbar control

Now, the scrollbar controls the chart. Here is the baseball spreadsheet for you to play with: Baseball_offset.xls Have fun!

On the way to 100 charts

Note: this post is adapted from a presentation I gave at eMetrics 2007 in San Francisco.