1. Skip to navigation
  2. Skip to content
  3. Skip to sidebar

This New York Times cancer graph is a beautiful piece of work.

NY Times cancer graphic

I wanted to see if we could reproduce it with everyday tools.

Excel reproduction of the NY Times cancer graphic

Click here to watch a screencast showing how it was done. Warning the screencast is a little long—14 minutes—and a little unpolished. One cut, no retakes, banzai analytics!

Derek raised an interesting question about how to find the fonts used by the New York Times. While I don’t think you can find a high quality free version of these fonts (Helvetica Neue, Univers?), Microsoft has made some very good new fonts for Vista and these are also available to Microsoft Office users through a compatibility pack. Here’s a link or google for “microsoft office compatibility pack“. I recommend using these fonts.

Here’s a version of the graph with these new fonts and more emphasis on getting the typography right.

Excel reproduction of the NY Times cancer graphic with better fonts

Topics:
, , , ,
  • http://www.esnips.com/web/WebAnalyticsGraphs/ Daniel Waisberg

    Hi Chris,

    the screencast is really amazing. I learned some very helpful tips. Please do others like this one.

    Now, there is only one thing that you did not include in your graph (which is a long time doubt I have): the gridlines crossing OVER the bars. I have seen several graphs by Tufte and Phew that show white gridlines crossing over grey bars; I love that! Isn’t it possible to do it on Excel?

    As for the graph itself, there is one thing I did not like. IMHO, there should be one additional piece of information, something like “probability of death”. Not sure if this one would make. My problem is that the graph might be used to say, for example, that prostate cancer has a worrying number of new cases, and deaths might increase. However, since we do not have information regarding old cases to compare to deaths, the simple design might lead us to think deaths will highly increase as a consequence of high number of new cases. Sounds reasonable?

    Anyway, thank you very for the screencast.

  • http://juiceanalytics.com Chris Gemignani

    Daniel,

    The built-in gridlines pass _under_ the chart elements in Excel rather than over, so it’s not possible to recreate the Times’ treatment.

    Probability of death is certainly an important additional measure, but so is median survival time or expected years of life. You can infer probability of death in a crude way from DEATHS / NEW CASES and I’m happy enough with that.

    Cheers, Chris

  • http://www.supportanalytics.com Tony Rose

    Nice job Chris. I had to laugh a few times as you were getting some push-back from Excel. Sometimes the first cut is better, or more entertaining, than a polished one… Showing how to format the number so MEN and WOMEN are labeled at zero was very helpful. The NYT did a very nice job setting this chart up. Keep up the great work!

  • http://juiceanalytics.com Chris Gemignani

    Thanks, Tony. There was a lot more mumbling under my breath than you heard. I recorded using Parallels to run Excel on my Mac and the keyboard mapping is a little wonky and the delete key doesn’t work. Grrr.

    It is so much easier to recreate an existing design than to create a great new design from scratch.

  • http://rupeshtripathi.com Rupesh Tripathi

    Easier workaround that space guesswork (to centrally align the category labels), Chris, is to add another column where you can put in the number of spaces required; and use the formula =CONCATENATE(REPT(” “,no. of spaces,category label). Some hit and trials will give the result.For a long list / to start with something scientific before hit and trial, here is the solution:
    - Figure out the length of longest label(LEN function).
    - For each label, use the formula =(length of longest label- length of current label)/2
    - Logically this should put in the correct number of spaces in front of each label.. however due to some reason, (may be the width of space character), the alignment seems slightly leftwards. So I add “1″ to the formula above.
    - In most cases, this would render center alignment. In the skewed cases, one can overwrite the formula with a higher/lower number and achieve desired result.

    Wonderful screencast.

  • http://rupeshtripathi.com Rupesh Tripathi

    Small correction – Apologies.
    Read the formula =CONCATENATE(REPT(” “,no. of spaces,category label) as
    =CONCATENATE(REPT(” “,no. of spaces),category label)

  • http://juiceanalytics.com Chris Gemignani

    Interesting idea, Rupesh. Centering those darn labels was the most time consuming part of this exercise, and they still don’t look all that great. The slight leftward bias in my labels also relates to charting internal margins and other Excel esoterica. Again, Grrrr.

  • http://www.michaeldoan.com Michael Doan

    Great screencast! Thanks for sharing it. I’ve been using Excel for awhile now but I’ve never had much of an opportunity to incorporate graphs. When I do, its often a frustrating experience.

  • James McMurry

    Nicely done. Excel graphing masters have their own special kung-fu.

    Side note: Assuming you’re using a Mac notebook (sounds like it), try using the “fn” key in conjunction with “delete”. This should make Windows delete as you expect.

  • http://juiceanalytics.com Chris Gemignani

    James, I am using a Mac notebook and fn-Delete does indeed work as a Windows delete. Many thanks.

  • derek

    Gridlines over the top is easy if you roll your own using a Line or XY (Scatter) series.

    There’s a limit to the different types of series that can be combined, but quickly glancing at the design, it doesn’t immediately seem to me that you’ve reached it yet.

  • derek

    A question for you typography experts: what widely-available free font most closely mimics the lettering in New Yorks Times graphics?

  • Mike Ward

    Very nice graph, learnt a couple of neat tricks there.

    Presuming that we’d be creating this graph to be printed out, or exported as a graphic, I’d be tempted to use either a text box or do something with the camera object here for the series labels.

    It might require some altering of cell heights, but I’m sure something could be achieved.

  • http://nodependenciesnologo.wordpress.com Nils

    I’ve only just started making screencasts professionally, but this is one amazing example. Something to look up and aspire to.

  • http://juiceanalytics.com Chris Gemignani

    Good question, Derek. I’m not a huge typography geek, but the NYT font looks like Helvetica Neue which is a really nice multi-weight version of Helvetica that’s available free on OS X, but not in Windows land. It’s a font we use a lot for presentations here in Juiceland.

    Microsoft released a bunch of good new fonts with Windows Vista. They’re available as a free download for older versions of Office if you google for “Microsoft Office Compatibility Pack”. If you’re interested in typography and use Windows you should get these fonts.

    I tried using these fonts on the graph and there is a slight improvement. See above for a version with improved fonts.

  • doug

    For the headline face, NYT is using Franklin Gothic (Bitstream, not ITC); for the body text, Helvetica. Their online features (at nytimes.com) seem to be using Helvetica for all text.

    Great screencast.

  • http://juiceanalytics.com Chris Gemignani

    Thanks, Doug. Lazyweb, we thank thee!

  • http://www.datawrangling.com Pete Skomoroch

    Nice work Chris. I thought this chart looked sharp when I saw it on reddit the other day. There have been a number of nice graphics and interactive “infographics” coming out of the NYT lately. Any idea what they use to produce these?

    Can we look forward to a script/screencast showing how to reproduce this in matplotlib/python?

    -Pete

  • derek

    Thanks Chris and Doug. I had a Franklin-a-like already, and I’ve downloaded the Powerpoint 2007 Viewer to get the new Vista fonts, and used Calibri for a Helvetica substitute.

  • http://richardsona.squarespace.com Adam Richardson

    This is a great tutorial, I learned a lot on this. I’ve always been turned off by Excel’s charts but never spent the time to learn how to make them better.

    Wrote it up on my blog too: http://richardsona.squarespace.com

  • Henk

    Well done, guys.
    I was somewhat intrigued by Daniel Waisberg’s wish to have the gridlines OVER the bars. Dereks’ suggestion to make a combichart with your own gridlines is one possible solution but I don’t think so easy as he said for most of us. I see two possible ways to get these gridlines on top of the bars (although I would like to add that I don’t think it’s very necessary to do so). I share both ideas here, for discussion purposes.
    1. You can use a stacked bar chart with a thin “white” lining around the fill. This requires a bit juggling with the spreadsheet with conditional numbers (for the length of the bars; if it exceeds the default value between the grid values, it needs to be cut off).
    Note: I fear this is not easier than derek’s suggestion. It also requires some forward thinking about the grid, the default value taken as a variable.
    2. An overlay chart. Essentially you split the chart in (a) the bars and (b) the value and category axis, including axes values and grid lines. Make sure the plot area is transparent, and that the dimensions are EXACTLY equal. Now position (b) on top of (a) (Select chart (b) and use ALT+mouse to snap it into position over (a) accurately).
    Note: this method is a bit inflexible in the sense that resizing is elaborative. Moreover, in this particular example of a combined chart it may not be so easy to do.

    I hope this makes sense.

  • derek

    Well, I meant for someone with the skills to reproduce Chris’s graph. At that level of play, the drawing of arbitrary lines over the top usng a XY (Scatter) graph range should be peanuts.

    Unfortunately I can’t get the Quicktime screencast to work, so I don’t know in enough detail how the trick was done to advise how to take the next steps. It would be something like this: create a range of cells with the following values:

    X-value    Y-value
    -150,000   0
    -150,000   12
    
    -100,000   0
    -100,000   12

    …and so on. Fix the secondary x and y axis scales to the appropriate values (if they’re not already being used for something else–if they are, have a think aboubt designing your way around the problem). Format the line so it is the same as the background colour (white). Now they will only be visible when they cut across the bars.

    Build-your-own axis scales (which is very similar to this) is also a highly useful technique; I would go so far as to say it’s the most powerful single piece of Excel hackery there is. Both are described in the Juice article “Tufte Charts in Excel” by Zach.

  • http://www.thetimoneygroup.com Brian Timoney

    Make sure you install those Windows updates that they were bothering you with during the screencast….

  • http://www.neohide.com Keith

    I bet that is a very good illustration of how to produce a good presentation, particularly in a business point of view.

    All marketing managers should learn to produce these types of presentations!

  • derek

    Make sure you install those Windows updates that they were bothering you with during the screencast….

    They’re not bothering me with any updates, either in Opera or Internet Explorer. Both browsers, after taking a long time to download the very large screencast, report:

    “QuickTime is missing software required to perform this operation.
    Unfortunately, it is not available on the QuickTime server.”

    I don’t see any suggested action arising from that message.

  • Shawn Mitchell

    I am not an Excel guru but I did manage to get the grid lines to show through the bars in the graph.

    Below is a link to a screen shot of the effect using Excel 2007.

    Photo Sharing and Video Hosting at Photobucket

  • Shawn Mitchell

    Here is a close up of the chart with both data series using transparency.

    Photo Sharing and Video Hosting at Photobucket

  • derek

    Shawn, you’re using a feature that is new in Excel 2007, but you’ve misunderstood the effect that is being sought. It is not to have a black gridline visible under a translucent bar (though there are ways to acheive that effect pre-2007 also).

    It is to have a white gridline visible *over* a solid bar, but invisible on the white background. That is acheived by a custom line series, but not, I believe, by built-in gridlines even in XL2007. They’re still underneath the data series.

  • NH

    Cool… my recollection of font usage:

    AIC Franklin Gothic -heds & subheads, 12pt. (1 col.), 14pts. (2cols.), etc… also used for smaller uppercase (7pts.)

    Helvetica & Helvetica-Light: body text (9pts.)
    Helvetica-Bold (8-9pts.): bolded upper & lower case text.

    Helvetica Light-Oblique: source line (7 or 8pts?)

    AIC Imperial: credit (5.8pts?)

  • jen

    I can’t get my second graph to flip and display in reverse. It does one of two things:
    1. the order of the Y axis values sorts in the opposite order;
    2. only one of the data series moves to the opposite side. the one on the secondary axis stays put.

    Any suggestions?? Thanks!

  • http://javaunmoradi.com Javaun

    Hi Derek. I too use Excel 2003, and so I guess I don’t have the bar transparency feature that Shawn proposed to make the gridline appear to float over the bar. Still, Shawn’s idea would work to make the gridline float over the bar but appear transparent on the background. He simply needs to change the dotted gridline color to white. The white will show briefly through the transparency (may appear off-white) but will be indistinguishable against the backround. I’m guessing that for the NY Times graph, they did a rough mockup in excel using ugly colors and ugly fonts, and then a designer traced it (to preserve the scale) in Illustrator and beautified it with color and fonts.

  • sesha

    Great work. Keep posting to benefit many like me.
    Can you also help me in constructing graphs on a mckinsey chart that we use at our office. My problem is to edit the text boxes and graphs every time i need to update the data

  • http://www.juiceanalytics.com Zach

    Sesha, we have developed an approach for automatically updating PowerPoint slides (charts, text boxes, tables) from Excel spreadsheets. I’m not sure if that is exactly what you are referring to. We can discuss offline if it is.

  • Sarah

    I created a similar graph using Jon Peltier’s tornado graph as a starting point. I was able to get white gridlines on top of the bars by creating a dummy series and then adding y-error bars. I had the additional requirement of getting the Male and Female sides into a single chart, so I had to use a dummy series for the y axis anyway. Here is what it looks like: http://flickr.com/photos/saamiam/2176279190/

  • brandie

    my father died of lung cancer…hahahha jking

  • Ashutosh

    I think NYT uses Tableau to create its graphs. I have many charts in NYT, which no doubt look like Tableau charts.

  • Jon Peltier

    Nice charts, and comparisons within sex are easy. The problem with two-sided “tornado” charts like this, is it is very difficult to compare the two sides, male vs female. Obviously men have more prostate cancer, and women more ovarian and breast cancer. But I can’t tell the differences in pancreatic, colorectal, or non-hodgkins.

    An option would be to put more space between the males bars, and insert the corresponding female bars.

    What was the basis for sorting? It wasn’t by value, nor by alphabetical order.

  • Pingback: Compatibility Cancer And Cancer | AllGraphicsOnline.com

  • Yan

    Does anyone know how to have different gap widths for different series in Excel 2010? The options tab doesn’t seem to exist any more.

    Thanks guys!