Recreating the NY Times Cancer Graph
By Chris Gemignani
July 29, 2007
Find more about:
charts
excel
screencast
squarepie
tutorial
This New York Times cancer graph is a beautiful piece of work.
I wanted to see if we could reproduce it with everyday tools.

Click here to watch a screencast showing how it was done. Warning the screencast is a little long—14 minutes—and a little unpolished. One cut, no retakes, banzai analytics!
Derek raised an interesting question about how to find the fonts used by the New York Times. While I don't think you can find a high quality free version of these fonts (Helvetica Neue, Univers?), Microsoft has made some very good new fonts for Vista and these are also available to Microsoft Office users through a compatibility pack. Here's a link or google for "microsoft office compatibility pack". I recommend using these fonts.
Here's a version of the graph with these new fonts and more emphasis on getting the typography right.

"Business Intelligence isn't a technical problem, it's a social problem"
By Zach Gemignani
March 23, 2007
Find more about:
screencast
Yesterday I presented to an B-eye-network audience our perspective on why business intelligence is broken and what can be done to fix it. The full PDF-version (4mb) of the presentation can be downloaded.
A sampling of the fun:
"Chart-based encryption -- data goes in, no information comes out"

On the excessive emphasis on reporting over analysis...

"Technologists are looking to build an atomic-baloney slicer"..."Nobody ever got fired for adding more requirements"

"Data analysis isn't just for the data analysts anymore"

"Have you ever working with a reporting tool that outputted to PDF?"
Hopefully we stirred the pot a little with this presentation. A recording of the B-eye-network event should be available soon.
13 comments | Show all comments only the last 5 are shown
Robert Kosara said:
That was a great talk! I really liked the slides, the pictures were great and there were some very quotable statements. I think especially the points about reporting vs. analysis and data/analysis vs. presentation were good and are often overlooked. The remarks about BI being a social rather than a technical problem, and about "chart-based encryption" were also spot on: it's unbelievable what visual nonsense is being produced.
Your examples were also interesting, because they were unexpected. NameVoyager and WeFeelFine.org aren't exactly what I would consider business visualizations. But they clearly illustrate what good visualization should do (though they are certainly more on the presentation than the analysis side, which also makes them so appealing: they tell a story, or at least give you the building blocks for one).
What I missed was some kind of big final example at the end that would apply what you said to some real business cases. I am all for visualizing blogs and baby names, but how do these things apply to the typical BI or CRM application? And what does analysis vs. presentation mean in a practical example?
David A. Heiser said:
I can't understand business these days.
Microsoft expanded the graphic "blings" in Excel, because business asked for all these realy wierd pyramid columns in pastels. Pastels!! Business asking for pastel colors for presentations??? I thought presentations were to upset the audience, not put them to sleep.
What's the relevance of cows? Is it related to a new advertisong campaign? Analysis is how you can distort the facts to pursuade the audience.If you have to wake up the audience with spectacular displays, why not just bring a wide screen HD tv set and do a surroung presentation with cows mooing in the background (7.1 sound).
You could perhaps have a parade around the perimeter walls of the conference room, with a real brass band. Get everybody to join in. Signifies agrrement with the latest management program.
DAHeiser
Michel Guillet said:
The slides and discussion really resonated with me. For 10 years I I labored with the presentation of 1,000s of slides to execs about customers and statistics at The Coca-Cola Company, Thursday's discussion finally answered my question, "what am I doing wrong?" The problem was or is a social one. My perspective is changed.
Thank you for providing the enlightenment.
Guillet
Brian Timoney said:
Zach:
Good job.
I'm shamelessly thieving your approach with your PowerPoints and dropped $11 on iStockPhoto to get the appropriately resonating photos for an upcoming conference presentation.
The "chart-based encryption" joke was made funnier since you had no way of knowing if any of the webinar audience "got it." As an East Coaster transplanted west, I've had my share of lines during talks that were pure comedic gold be met earnest non-comprehension . (Another reason we're so well liked...)
Great stuff; I've already dropped the herding cats vs. herding cows distinction on a co-worker.
BT
Jeff O'Connor said:
Please come and give this presentation at my company. I and my coworkers have given similar ones for years now, but they don't seem to carry the same weight as the ones that better-paid outside contractors give.
As an outsider you'll have instant credibility with my senior management, which I and my coworkers lack because we're part of - you know - a cost-center.
To ensure that everything you say to my superiors is taken as seriously as possible, I will disavow any knowledge of having made this post...
Methods In Excel » Reporting and Analysis said:
[...] I was interested to read what Zack at Juice Analytics had to say on the matter. To paraphrase, reporting is for things we know well and are predictable where as analysis if for things that are unknown and erratic. Now I’ve only looked at the slides (check them out there good!) so I may have missed the context, but that’s not really the point. [...]
Fraser Moffatt said:
interesting presentation. I am especially drawn to the Mindset|Skillset|Toolset paradigm and would be interested in the audio recording, an expansion of these ideas or maybe some reference to follow-up on.
I've always thought that if you provide the tools, the mindset will follow. Sometimes is does and and sometime there's a lot of kicking and screaming.
I'm in an org where the tools are there (multimillion dollar BI/Datamart/Reporting "solution", but the mindset is non-existent and I'm wondering how to overcome this.
The concept of BI as a social problem is compelling and I would like to know more about your thoughts on this.
Zach said:
Thank you all for the feedback on the presentation. If you are interested in continuing the conversation, please send me an e-mail at zach.gemignani@juiceanalytics.com and we can set up a time to discuss in more depth. We always find it helpful to hear more stories about the challenges people face in the BI trenches.
Jay Jakosky said:
Nice presentation and your example were excellent.
Yaju Arya said:
Excellent stuff, awesome!!
Emily Breed said:
Zach, the cows-vs.-cats comparison is a great one. Would it be all right to borrow that idea? (I work in risk management, and we deal with a lot of people who'd prefer to pay attention to the cows only, even though we have cats overrunning the place...)
Zach said:
Emily, feel free...and use the pictures too.
Drew said:
Nice work! Thanks for sharing this.
Add a comment
A Juice Web Event: Empowering the Analyst
By Zach Gemignani
March 6, 2007
Find more about:
analytics
presentations
screencast
Our friends at Tableau invited us to lead off a webinar about the broken bits of Business Intelligence and what is needed to fix it. With the provocative title "The Score: IT-centric BI — 5, Information Worker — 0", we intend to hit blog-themes such as the plight of the noble but beaten-down analyst, the misplaced emphasis on bulky technology solutions, and the false deification of the Executive Dashboard.We'd love to have you stop by on March 22 at 2:00 ET. Go here to register.
The session abstract is below:
Empowering the "Everyday Data" Analyst
Like it or not, we've all become "everyday data" analysts during the last decade. We became document specialists and spreadsheet experts ten years before that. We have standard tools for creating documents, spreadsheets, and presentations right on our desktops. These applications are familiar and easy to use – even if we only use them infrequently. Why don't we have the same for working with data?
Everyone agrees that we have plenty of data—it streams through our departments and across our desktops everyday. But despite the big, IT-centric BI solutions that exist in our organizations, it's the tools and skills for investigating and making sense of "everyday data" that we're missing. The people who have the most to gain from data analysis are often the least capable of doing so. Where's the BI equivalent of Word or Visio?
Join Zach Gemignani, co-founder of Juice Analytics for this free web seminar. Based on his years of experience with analytics client engagements, you will hear him present the real-world struggle of "everyday data" analysts. You will learn:
- How the IT-centric view of BI should change
- How do we empower our "everyday data" analysts in our organizations
- What shifts in approach and technology are necessary for effectively working with data
1 comment
Wade said:
Missed (unfortunately) the webinar. Will you be making any materials from the event available on your site?
Cheers,
Wade
Add a comment
Squaring the Pie Solutions Screencast
By Chris Gemignani
December 14, 2006
Find more about:
excel
infographics
screencast
squarepie
tools
visualization
1 comment
Brett said:
Is there a reason that the navigation to the next and previous posts etc has not been included on this page. It makes this page a bit of a dead end and having to navigate around it seems a bit clumsy. Really great site guys!!
Add a comment
Solving the Pie
By Chris Gemignani
December 14, 2006
Find more about:
design
excel
infovis
innovation
screencast
Last week I challenged the you to reproduce this alternative to pie charts in Excel. I promised a screencast to show how it's done.

Eighteen people answered the call with nearly three dozen different solutions. Click here to watch the screencast showing how to accomplish the two most popular solutions; filling cells with conditional formatting and pushing the column chart to extremes.
If you want to look at the source,Clint Ivy produced an excellent version of the cell filling approach.
Dermot Balson submitted an terrific version of the column chart approach.
Thank you to everyone who submitted a solution.
14 comments | Show all comments only the last 5 are shown
Robert Kosara said:
Great to see that you picked up the idea, and made this challenge around it! Being able to do more than the old pie and bar charts in Excel makes these techniques accessible to a lot more people, and hopefully shows them how much more can be done with visualization than the simple things Excel offers directly.
I agree with some of the comments in the other posting, there are some issues with perception here. Especially when only showing one number, filling the cells from the bottom (like a bar chart) is probably a better idea. However, when comparing several numbers, having areas as square as possible is preferable (squarified treemaps have shown this quite clearly). But this shows that even for simple graphics like this, there is still quite a bit of work to be done to understand how to use them most effectively.
Anyway, great blog, and you have really active readers/contributors!
Rage on Omnipotent » Blog Archive » Square pies said:
[...] Nice ways of getting square pie charts. We’ll have to start using them. [...]
Rage on Omnipotent » Blog Archive » Square pies said:
[...] Nice ways of getting square pie charts. We’ll have to start using them. [...]
David Boyle said:
Fantastic. And LOVE the video.
The challenge now is for someone to build in the functionality for showing two different ranges in the same chart.
David Boyle said:
I answered my own question and emailed in a solution (building on the previous work) that allows multiple values in the same chart. It is also here: http://beglen.googlepages.com/square_pie_DBB.xls
ross said:
Looks nice, I think this is a terrible chart, for reason as made before, but it does look nice. I could not view the video sadly, in FF or IE :-( - could you put in on youtube?
Henk said:
This excercise flashed my memory back ages ago when someone showed me how to make a squared circle. But that's maths, not Excel.
GrahamC said:
Surely to show 2 values in the same chart you'll have the 1 colour which is the lower number, then the 2nd colour which is the higher number, the 2nd colour only needs to be stacked ontop of the 1st colour to show the comparitive difference?
DBM Forum » Blog Archive » Ban de Pie Chart said:
[...] Toch lijkt deze uitwerking niet echt een goed alternatief: Optisch lijkt de eerste “taart” (of kunnen we deze grafiek beter “cake” noemen?) voor meer dan 56% gevuld. Een andere invulling (bijvoorbeeld hele regels van onderaf naar boven) vangt dit wellicht op. Ook het invullen van meerdere segmenten kan onduidelijk worden. Mocht je toch een vierkante taart willen maken in Excel, op Juice Analytics staan enkele in Excel uitgewerkte voorbeelden. [...]
Googlizing myself | Chris Teplovs said:
[...] In that alert I found a blog post on Data Visualization Gone Wrong, which was fun to read and reminded me of the Coda Hale’s rant against Google Analytics’ pie charts. The Gone Wrong posting led me to juiceanalytics, which is also helpful. [...]
Mozlog.nl marktonderzoek weblog » Blog Archief » Square pie-chart of pie-chart? said:
[...] Als oplossing wordt de square pie-chart gegeven. De blog juiceanalytics heeft hier van een aantal Excel bestanden gepost. [...]
SPSSlog.com » Happy holidays! said:
[...] If you need something to do during your time off, try out the great online graph service Swivel, or try making the coolest SQUARE PIE-CHART graphs with SPSS! Post all your creations in our comments… [...]
Stefan Schwarzer said:
Hi there,
to make things more complicated, I would like that my square actually displays with two different colors (in addition to white or "empty"). Say, I have a value of 62 (which is for example the Percentage of students finishing a course). Then I have another value, say 38 which is the Percentage (out of the total) of students finishing with a specifc grade.
So, my square looks like this (2 for those students finishing and 1 for those with a specific grade):
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 2 2 2 2 2 2 2
0 0 0 2 2 2 2 2 2 1
0 0 2 2 1 1 1 1 1 1
0 0 2 2 1 1 1 1 1 1
0 0 2 2 1 1 1 1 1 1
0 0 2 2 1 1 1 1 1 1
0 0 2 2 1 1 1 1 1 1
0 0 2 2 1 1 1 1 1 1
But how can I visualize it in this way: http://geodata.grid.unep.ch/Picture_2.png ?
Chris Gemignani said:
Stephan, Here's a quick capsule of one way to do what you want.
Create a 10x10 block and number the blocks in the order in which you would like them to be filled. For instance:
1 2 5 . . . . . . .
3 4 6 . . . . . . .
7 8 9 . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
Then if you want to show blocks for two values (call them A and B), use a two level conditional format to colorize the blocks:
IF blockval <= A: make color red
IF blockval <= A+B: make color orange



35 comments | Show all comments only the last 5 are shown
Daniel Waisberg said:
Hi Chris,
the screencast is really amazing. I learned some very helpful tips. Please do others like this one.
Now, there is only one thing that you did not include in your graph (which is a long time doubt I have): the gridlines crossing OVER the bars. I have seen several graphs by Tufte and Phew that show white gridlines crossing over grey bars; I love that! Isn't it possible to do it on Excel?
As for the graph itself, there is one thing I did not like. IMHO, there should be one additional piece of information, something like "probability of death". Not sure if this one would make. My problem is that the graph might be used to say, for example, that prostate cancer has a worrying number of new cases, and deaths might increase. However, since we do not have information regarding old cases to compare to deaths, the simple design might lead us to think deaths will highly increase as a consequence of high number of new cases. Sounds reasonable?
Anyway, thank you very for the screencast.
Chris Gemignani said:
Daniel,
The built-in gridlines pass _under_ the chart elements in Excel rather than over, so it's not possible to recreate the Times' treatment.
Probability of death is certainly an important additional measure, but so is median survival time or expected years of life. You can infer probability of death in a crude way from DEATHS / NEW CASES and I'm happy enough with that.
Cheers, Chris
Tony Rose said:
Nice job Chris. I had to laugh a few times as you were getting some push-back from Excel. Sometimes the first cut is better, or more entertaining, than a polished one... Showing how to format the number so MEN and WOMEN are labeled at zero was very helpful. The NYT did a very nice job setting this chart up. Keep up the great work!
Chris Gemignani said:
Thanks, Tony. There was a lot more mumbling under my breath than you heard. I recorded using Parallels to run Excel on my Mac and the keyboard mapping is a little wonky and the delete key doesn't work. Grrr.
It is so much easier to recreate an existing design than to create a great new design from scratch.
Rupesh Tripathi said:
Easier workaround that space guesswork (to centrally align the category labels), Chris, is to add another column where you can put in the number of spaces required; and use the formula =CONCATENATE(REPT(" ",no. of spaces,category label). Some hit and trials will give the result.For a long list / to start with something scientific before hit and trial, here is the solution:
- Figure out the length of longest label(LEN function).
- For each label, use the formula =(length of longest label- length of current label)/2
- Logically this should put in the correct number of spaces in front of each label.. however due to some reason, (may be the width of space character), the alignment seems slightly leftwards. So I add "1" to the formula above.
- In most cases, this would render center alignment. In the skewed cases, one can overwrite the formula with a higher/lower number and achieve desired result.
Wonderful screencast.
Rupesh Tripathi said:
Small correction - Apologies.
Read the formula =CONCATENATE(REPT(" ",no. of spaces,category label) as
=CONCATENATE(REPT(" ",no. of spaces),category label)
Chris Gemignani said:
Interesting idea, Rupesh. Centering those darn labels was the most time consuming part of this exercise, and they still don't look all that great. The slight leftward bias in my labels also relates to charting internal margins and other Excel esoterica. Again, Grrrr.
Michael Doan said:
Great screencast! Thanks for sharing it. I've been using Excel for awhile now but I've never had much of an opportunity to incorporate graphs. When I do, its often a frustrating experience.
James McMurry said:
Nicely done. Excel graphing masters have their own special kung-fu.
Side note: Assuming you're using a Mac notebook (sounds like it), try using the "fn" key in conjunction with "delete". This should make Windows delete as you expect.
Chris Gemignani said:
James, I am using a Mac notebook and fn-Delete does indeed work as a Windows delete. Many thanks.
derek said:
Gridlines over the top is easy if you roll your own using a Line or XY (Scatter) series.
There's a limit to the different types of series that can be combined, but quickly glancing at the design, it doesn't immediately seem to me that you've reached it yet.
derek said:
A question for you typography experts: what widely-available free font most closely mimics the lettering in New Yorks Times graphics?
Mike Ward said:
Very nice graph, learnt a couple of neat tricks there.
Presuming that we'd be creating this graph to be printed out, or exported as a graphic, I'd be tempted to use either a text box or do something with the camera object here for the series labels.
It might require some altering of cell heights, but I'm sure something could be achieved.
Nils said:
I've only just started making screencasts professionally, but this is one amazing example. Something to look up and aspire to.
Chris Gemignani said:
Good question, Derek. I'm not a huge typography geek, but the NYT font looks like Helvetica Neue which is a really nice multi-weight version of Helvetica that's available free on OS X, but not in Windows land. It's a font we use a lot for presentations here in Juiceland.
Microsoft released a bunch of good new fonts with Windows Vista. They're available as a free download for older versions of Office if you google for "Microsoft Office Compatibility Pack". If you're interested in typography and use Windows you should get these fonts.
I tried using these fonts on the graph and there is a slight improvement. See above for a version with improved fonts.
doug said:
For the headline face, NYT is using Franklin Gothic (Bitstream, not ITC); for the body text, Helvetica. Their online features (at nytimes.com) seem to be using Helvetica for all text.
Great screencast.
Chris Gemignani said:
Thanks, Doug. Lazyweb, we thank thee!
Pete Skomoroch said:
Nice work Chris. I thought this chart looked sharp when I saw it on reddit the other day. There have been a number of nice graphics and interactive "infographics" coming out of the NYT lately. Any idea what they use to produce these?
Can we look forward to a script/screencast showing how to reproduce this in matplotlib/python?
-Pete
derek said:
Thanks Chris and Doug. I had a Franklin-a-like already, and I've downloaded the Powerpoint 2007 Viewer to get the new Vista fonts, and used Calibri for a Helvetica substitute.
Adam Richardson said:
This is a great tutorial, I learned a lot on this. I've always been turned off by Excel's charts but never spent the time to learn how to make them better.
Wrote it up on my blog too: http://richardsona.squarespace.com
Henk said:
Well done, guys.
I was somewhat intrigued by Daniel Waisberg's wish to have the gridlines OVER the bars. Dereks' suggestion to make a combichart with your own gridlines is one possible solution but I don't think so easy as he said for most of us. I see two possible ways to get these gridlines on top of the bars (although I would like to add that I don't think it's very necessary to do so). I share both ideas here, for discussion purposes.
1. You can use a stacked bar chart with a thin "white" lining around the fill. This requires a bit juggling with the spreadsheet with conditional numbers (for the length of the bars; if it exceeds the default value between the grid values, it needs to be cut off).
Note: I fear this is not easier than derek's suggestion. It also requires some forward thinking about the grid, the default value taken as a variable.
2. An overlay chart. Essentially you split the chart in (a) the bars and (b) the value and category axis, including axes values and grid lines. Make sure the plot area is transparent, and that the dimensions are EXACTLY equal. Now position (b) on top of (a) (Select chart (b) and use ALT+mouse to snap it into position over (a) accurately).
Note: this method is a bit inflexible in the sense that resizing is elaborative. Moreover, in this particular example of a combined chart it may not be so easy to do.
I hope this makes sense.
derek said:
Well, I meant for someone with the skills to reproduce Chris's graph. At that level of play, the drawing of arbitrary lines over the top usng a XY (Scatter) graph range should be peanuts.
Unfortunately I can't get the Quicktime screencast to work, so I don't know in enough detail how the trick was done to advise how to take the next steps. It would be something like this: create a range of cells with the following values:
<pre>X-value Y-value
-150,000 0
-150,000 12
-100,000 0
-100,000 12</pre>
...and so on. Fix the secondary x and y axis scales to the appropriate values (if they're not already being used for something else--if they are, have a think aboubt designing your way around the problem). Format the line so it is the same as the background colour (white). Now they will only be visible when they cut across the bars.
Build-your-own axis scales (which is very similar to this) is also a highly useful technique; I would go so far as to say it's the most powerful single piece of Excel hackery there is. Both are described in the Juice article "<a href="http://www.juiceanalytics.com/writing/2006/08/tufte-charts-in-excel/">Tufte Charts in Excel</a>" by Zach.
Brian Timoney said:
Make sure you install those Windows updates that they were bothering you with during the screencast....
Keith said:
I bet that is a very good illustration of how to produce a good presentation, particularly in a business point of view.
All marketing managers should learn to produce these types of presentations!
derek said:
<i>Make sure you install those Windows updates that they were bothering you with during the screencast....</i>
They're not bothering me with any updates, either in Opera or Internet Explorer. Both browsers, after taking a long time to download the very large screencast, report:
"QuickTime is missing software required to perform this operation.
Unfortunately, it is not available on the QuickTime server."
I don't see any suggested action arising from that message.
Shawn Mitchell said:
I am not an Excel guru but I did manage to get the grid lines to show through the bars in the graph.
Below is a link to a screen shot of the effect using Excel 2007.
<a href="http://photobucket.com" target="_blank"><img src="http://i159.photobucket.com/albums/t152/robertshawnmitchell/transparent_bar_graph.jpg" border="0" alt="Photo Sharing and Video Hosting at Photobucket"></a>
Shawn Mitchell said:
Here is a close up of the chart with both data series using transparency.
<a href="http://photobucket.com" target="_blank"><img src="http://i159.photobucket.com/albums/t152/robertshawnmitchell/closeup.jpg" border="0" alt="Photo Sharing and Video Hosting at Photobucket"></a>
derek said:
Shawn, you're using a feature that is new in Excel 2007, but you've misunderstood the effect that is being sought. It is not to have a black gridline visible under a translucent bar (though there are ways to acheive that effect pre-2007 also).
It is to have a white gridline visible *over* a solid bar, but invisible on the white background. That is acheived by a custom line series, but not, I believe, by built-in gridlines even in XL2007. They're still underneath the data series.
NH said:
Cool... my recollection of font usage:
AIC Franklin Gothic -heds & subheads, 12pt. (1 col.), 14pts. (2cols.), etc... also used for smaller uppercase (7pts.)
Helvetica & Helvetica-Light: body text (9pts.)
Helvetica-Bold (8-9pts.): bolded upper & lower case text.
Helvetica Light-Oblique: source line (7 or 8pts?)
AIC Imperial: credit (5.8pts?)
jen said:
I can't get my second graph to flip and display in reverse. It does one of two things:
1. the order of the Y axis values sorts in the opposite order;
2. only one of the data series moves to the opposite side. the one on the secondary axis stays put.
Any suggestions?? Thanks!
Javaun said:
Hi Derek. I too use Excel 2003, and so I guess I don't have the bar transparency feature that Shawn proposed to make the gridline appear to float over the bar. Still, Shawn's idea would work to make the gridline float over the bar but appear transparent on the background. He simply needs to change the dotted gridline color to white. The white will show briefly through the transparency (may appear off-white) but will be indistinguishable against the backround. I'm guessing that for the NY Times graph, they did a rough mockup in excel using ugly colors and ugly fonts, and then a designer traced it (to preserve the scale) in Illustrator and beautified it with color and fonts.
sesha said:
Great work. Keep posting to benefit many like me.
Can you also help me in constructing graphs on a mckinsey chart that we use at our office. My problem is to edit the text boxes and graphs every time i need to update the data
Zach said:
Sesha, we have developed an approach for automatically updating PowerPoint slides (charts, text boxes, tables) from Excel spreadsheets. I'm not sure if that is exactly what you are referring to. We can discuss offline if it is.
Sarah said:
I created a similar graph using Jon Peltier's tornado graph as a starting point. I was able to get white gridlines on top of the bars by creating a dummy series and then adding y-error bars. I had the additional requirement of getting the Male and Female sides into a single chart, so I had to use a dummy series for the y axis anyway. Here is what it looks like: http://flickr.com/photos/saamiam/2176279190/
brandie said:
my father died of lung cancer...hahahha jking
said:
Add a comment