Recreating the NY Times Cancer Graph
By Chris Gemignani
July 29, 2007
Find more about:
charts
excel
screencast
squarepie
tutorial
This New York Times cancer graph is a beautiful piece of work.
I wanted to see if we could reproduce it with everyday tools.

Click here to watch a screencast showing how it was done. Warning the screencast is a little long—14 minutes—and a little unpolished. One cut, no retakes, banzai analytics!
Derek raised an interesting question about how to find the fonts used by the New York Times. While I don't think you can find a high quality free version of these fonts (Helvetica Neue, Univers?), Microsoft has made some very good new fonts for Vista and these are also available to Microsoft Office users through a compatibility pack. Here's a link or google for "microsoft office compatibility pack". I recommend using these fonts.
Here's a version of the graph with these new fonts and more emphasis on getting the typography right.

Analytics Roundup: Square Pie of Death
By Chris Gemignani
July 29, 2007
Find more about:
design
infographics
squarepie
visualization
- NY Times: % of Americans who believe that after death...
- Astonishingly awful square pie from the NYT, who are normally infographic innovators.
- raganwald: Beware of the Turing Tar-Pit
- Know the difference between general and specific in building tools.
Square Pie in the Eye
By Chris Gemignani
July 29, 2007
Find more about:
design
infovis
visualization
The New York Times—normally a source of clear and interesting infographics—produced the following graphic over the weekend.
This is bafflingly awful—it’s Tiger Woods carding a 90. Square pies are an infographic seasoning—they’re cilantro, not steak. Here are a few of the problems with this graphic:
The color choices are bad. The saturations between groups are considerably different. The yellow is highly saturated while the other colors are not. The increased saturation draws your attention to the yellow area, but this is just a category like the others. I’d imagine someone with red-green color blindness would have trouble distinguishing the other colors.
There’s a hole in the center. Presumably this indicates people who didn’t respond to the question, but this is not noted. There are no gridlines in the white section even though the non-responding group should be treated visually like the other groups.
It’s hard to compare the sizes of groups. People are better at comparing lengths than volumes. Mixing length and volumes—some of the of the response categories are arranged linearly, while the inner category is basically a volume (with a hole!)—makes it nearly impossible for people to use their spacial skills to side up the differences. Asking people to compare lines and donuts is like asking whether you prefer the color blue or raw carrots. For the record, I prefer carrots.
If you’re interested in the concept of square pie charts, the place to start is at EagerEyes. If you want to learn how to make them yourself, check out our contest, results, and screencast.
The Times is still a source of great design and inspiration. Here’s another graphic they also produced over the weekend that shows cancer incidence, survival rate, and gender differences in a way that is clear, clean, and concise.
7 comments | Show all comments only the last 5 are shown
Tony Rose said:
Ughh. This square pie is horrid. Possibly a good example of some folks trying too hard to "spice" up data visualization. I absolutely agree with your feedback regarding this being awful. Kind of like, A for effort, but F for results. I am still debating whether this is better than a regular pie chart. My conclusion is no. A regular pie would have at least shown this a little bit better. The best way would have probably been a bar chart.
Jim said:
Good analysis. You are right about the color problem. I am (mildly, I believe) red-green color blind and I basically can't tell the difference between the "return to earth in a different form" and "live in a different place" sections without looking for the thick grid line in between. Even with the line it took me about 5 seconds to find the break.
Chris Gemignani said:
Tony, One of the really bad decisions here is how the pie is filled by spiraling in from the outside. Filling the square pie by filling a row of blocks horizontally until you reach the target would have been a little better.
Tony Rose said:
Yes, exactly. That spiraling is extremely difficult to follow and adds no value. Using a square pie and shading starting in the lower right for each value would have been better, but would have created five graphs. I think a bar graph would give the readers the same information and would cut down on the comprehension time.
Your post back in December shows an example:
http://www.juiceanalytics.com/writing/2006/12/solving-the-pie/
Jordan Lund said:
My initial reaction towards square pie charts, pixel charts, or whatever you want to call them is that they are horrible and unreadable.
However now that I've had time to think about it, they could work provided there were a style manual for setting them up.
For example:
1) Charts must be filled in from left to right from the largest value to the smallest value.
2) Data labels must be placed in the first column which contains a majority of that value.
3) It is allowable to complete a column with a smaller value if this will prevent other columns from being broken un-naturally.
So, taking the NYT data and applying the three rules I just created, you end up with this:
http://img293.imageshack.us/img293/5534/gridchartco5.jpg
Sherman Dorn said:
Okay, the square-pie nonsense trumps today's awful bubble-map on attacks in Afghanistan (http://www.nytimes.com/imagepages/2007/08/12/world/20070812_AFGHAN_GRAPHIC.html). Good article, bad infographics, no cookie for Times graphics folks today!
Bob said:
It's a visual gag.
It's supposed to look like the light at the end of a tunnel, not make it easier to compare the proportion of 4 categories (which even a seven year old can do, even without the aid of a chart, and certainly without complaining about how difficult it is).
Add a comment
TV Ratings and Online Audiences... Or, Where to Find Skeet Ulrich's Bio
By Zach Gemignani
July 23, 2007
Find more about:
metrics
statistics
The TV ratings system is broken. Everyone knows it, but nobody wants to admit it. Nielsen ratings struggle to accurately measure audience quantity (limited tracking of DVR usage and online viewers) and quality (are viewers engaged? are they skipping the ads?). However, admitting so would undermine the delicate balance TV networks share with their advertisers.
I caught an interesting segment on KCRW's "The Business" podcast about TV series that find themselves on the "bubble," i.e. at risk of getting canceled. The producer of CBS's Jericho, "a post-apocalyptic drama starring Skeet Ulrich" (shouldn't that description alone put it on the chopping block?), explained how they received a temporary stay of execution when their small but loyal audience protested network plans to cancel show. The interview raised questions about the validity of Nielsen ratings and how an fervent online audience can bring additional perspective to the performance of a show.
All this talk of measurement gave me an itch to look at some real data. I tracked down the Nielsen audience size (Subscription required) for TV series over the 2006-2007 TV season. Then I pulled from comScore (a Juice client and leading source for data about Internet traffic and usage behaviors) the unique visitors and time spent on websites of TV shows over the same September to May time period.
I had a few questions I was curious about:
- Which shows have dispropotionately larger internet audiences—an indicator of a loyal and rabid fan base? Are there other shows like Jericho that struggle to build a large TV audience, but have a strong online following?
- Which TV show sites have the most engaged audiences?
- What TV networks have been most successful at building online traffic to their sites? Which types of shows spawn online audiences?
The table below shows the top 20 TV series by ratio of monthly unique website visitors to average TV viewership. This metric suggests an ability to get viewers to look for more content, whether it is additional video, information about the actors, or discussion boards. If Jericho's 9.5 million TV viewers (tied for 48th overall) represents the proverbial bubble, there are eight other shows with bubble-level ratings that can also claim strong online support (highlighted in this list).

I also wanted to get a sense as to the engagement of the online audience. Were people simply stopping by the website to check the TV schedule, or were they digging deep for more content? One measure that gets at this question is minutes per unique visitor. The top 20 websites are listed below. Interestingly, 12 of these sites are also found in the previous table. Jericho is one of four of the bad-Nielsen-ratings/strong-online-audience group that overlap with the table above. (NBC, if you are grousing about ratings for The Office, hopefully these numbers will make you feel a little better.)

The final table addresses my third question about the TV networks and types of shows that are best at building an online audience. ABC has done more than twice as well as CBS in getting viewers online, which may be a reflection of the traditionally older CBS audience. Note: I pulled the top-end outliers (American Idol, You Think You Can Dance?, and Deal or No Deal) from the Network comparison.
The second half of the table brings those TV series back into the mix in the reality/contest category, and you can see the impact. I was surprised at the dearth of sitcoms on this list. It may be that a website for a sitcom doesn't typically make sense.

With all the money spent on TV advertising, I can only hope the networks go beyond the top-line Nielsen ratings to try to get a complete picture of their audiences.
16 comments | Show all comments only the last 5 are shown
Jane said:
Excellent article. It seems that CBS is well aware of the online fans yet they continue to depend on Nielsen's. In an article yesterday Leslie Moonves is quoted as saying he wants Jericho viewers in front of the TV because that's how CBS gets paid. This man needs to get a clue. We are in front of the TV and we've gathered many new viewers so CBS needs to make sure they're counted. They won't be as long as this reliance upon Nielsen continues.
Our Jericho board even got nominated for an Emmy. We're the best free advertising CBS could get.
gt said:
Very interesting article and the extra lenghts you went to in order to get the WHOLE picture is greatly appreciated. I totally agree that all viewers need to be counted.
jkegrngrl said:
Skeet Ulrich was the reason I tuned in to the show. He is a phenomenal actor, and when he first came on the scene 12 years ago, he was touted as a virtuoso. He has matured and grown as an actor even more so than his contemporaries and deserves a freaking emmy, especially for the dramatic season finale. Old stereotypes are the refuge for the ill-informed.
ahma said:
Thank you so much, your article gives me hope that the networks might eventually realize having viewers sitting in front of a tv viewing there show live is not the be all end all it once was. This is not the 1950's or 1960's we have all this new technology that allows us to view shows at times that are more convenient for us. I would certainly think that someone at some network will finally come up with a way to count all their viewers. I would love to know what the criteria are for being a neilson family. Father works 9 to 5 Mom stays home and fixes dinner to be served by 5:30 so we can all sit down in front of our tv's in time for prime time television. No taping, no dvr, no Tivo, no ON Demand, just you in your chair in front of the tv. Thankfully, Nina Tassler seems to realize there is a bigger picture. Unfortunatly, Mr. Moonves, is her boss. That would be "CLUE"LES Moonves.
Grace said:
I sure hope Les Moonves and Nina Tassler read this because I LOVE JERICHO!!!
I don't watch online and I am not a Nielsen family, so I never get counted. NOT FAIR!!
I never missed an episode of JERICHO yet!
JERICHO ROCKS!! WATCH IT! You'll LOVE it!!
I PROMISE!!
Jess said:
Thanks for the article - I have been fascinated by this story (although I should admit, I've also been a participant). One thing to also take into consideration is that your numbers take into account those internet views that can be monetized by CBS - but because you are only looking at the 'official' cbs.com site, you aren't really seeing the full impact of the online audience. For example, over 100,000 fans signed a petition, and the self-proclaimed 'Jericho Rangers' have created countless sites to fuel the fan movement. Some of the most popular (no, I don't run any of these) are jericholives.com and jerichorallypoint.com.
Paul said:
Nice post Zach. I agree that the Nielsen ratings system is broken, and getting more broken. However, it's not AS broken as most people think. It exists to measure investment in TV advertisements - not the popularity of TV programs. These things used to be one in the same, these days they are diverging - as you point out. My guess is the divergence hasn't yet reached a point whereby the monetization of the non-TV audience comes close to making up for the shortfall in TV audience ratings revenue. Why is that? Interested to hear your thoughts (if you agree).
Zach said:
Paul, I agree. In fact, the larger failure is on the part of the networks, not Nielsen. Networks need to find ways to succeed financial both when they have a large and shallow audience and when they have built a narrow and loyal audience. Why treat every eyeball uniformly? Surely there are ways to generate more dollars per viewer for these fanatical Jericho fans.
Paul said:
Completely agree. But it's a bit of a conundrum for the networks. The real value in a fanatical fan is that they consume (or are prepared to pay to consume) more of the content. But as long as the networks try and monetize shows based on advertised product consumption (ultimately), they never tap into that real value.
Let's see... 100,000 fanatical Jericho fans each paying $1.50 for one of a 10 episode season... $1.5M. Hmmm... if they paid Skeet what he is actually worth - $10/show - they should be able to break even :)
John said:
I believe the issue here is much more fundamental than network programming decisions and Neilsen ratings. The “broadcasting” technology and business model is a mass market medium which places value on aggregating large, demographically identifiable audiences.
The ultimate users of Neilsen data are the media buyers, and by extension the product advertisers. They are usually much more interested in how the ratings break down at the gender/age level. For example, when Chevrolet wants to advertise the Silverado pickup truck, they are probably going to want to target Males aged 18-34. They might concentrate their media buy on “Monday Night Football,” for instance, rather than spreading it out over many fragments of their target audience, even if those fragments are equally loyal and reliable viewers of other respective programs.
Compare this to the Google model, where the smallest advertiser can pinpoint a potential base of customers for their customers in a precise and predictable way by using contextual ad placement.
The other fundamental problem for broadcasters is that loyalty alone is almost impossible to monetize unless it is highly popular and can be franchised (“Law and Order” for example). The ancillary revenue that networks receive from web sites and merchandise are miniscule compared to the dollars that advertisers will pay to reach a large television audience.
Hadley Wickham said:
In the first table, the second column is labelled "Website audience / TV audience", but the values in the columns are percents. This doesn't make sense to me- does 5.5% mean there were 5.5 times as many web viewers as tv viewers, or only 5% of the number of tv viewers used were website viewers? It's a big difference!
A scatterplot of web audience vs tv audience would also be useful, especially if supplemented with some reference lines (eg. 2x 5x 10x)
Paul Robinson said:
Just out of curiosity, why did you ignore Deal or No Deal in your conclusions? It has by *far* the biggest gap between Nielsen and website audience and it has the longest avg visit time online - yet you don't refer to it once.
I also agree with Hadley - you've spent time putting this stuff together, which is great, but you've not explained what the figures actually mean. Tufte would be ashamed of you! :-)
Zach said:
Hadley, You are correct in pointing out that I incorrectly used percentages when it isn't truly a percentage. The metric is intended to show the size of the online audience relative to the TV audience -- but it isn't as if one is truly a percentage of the other. 5.5% represents the ratio of one audience to the other (as shown in the column header). I find it a stretch to interpret 5.5% as 5.5x.
Paul, Good observation. I had suspected that "contest shows" like Deal or No Deal or American Idol drive traffic to their site by getting people to vote online or play an online version of the game (or look at photo galleries of the Deal models in skimpy dresses). In that sense, I was more interested in talking about shows that seemed to be creating loyal audiences through the characters and content of the show.
Jennifer Reed said:
I was a Nielson TV home. The amount of equipment that had to be placed in and on all my tvs, vcrs, video games etc. sucked. But overall it was kind of cool. Shows like House, Dateline NBC, and the entire cartoon network were watched. I have a large family and we made sure we watched television of substance not like the crap with Paris Hilton. It is kind of cool to feel you have a say in whats good tv. I did this for a few years until I moved. There was no money paid to participate except $30.00 every six months to cover the electric all the annoying equipment used. Furthermore, they wanted us to be very secretive and completely accurate in what we watched, advising us not to use the tv for company noise, etc.. Nielson, to me is very competent in how they research who watches what . They once even called me because the tv was on for several hours on the same channel and wanted to know why. It was because the kids were sick and watched cartoon network all that day. Please, I would not doubt Nielson, they are going to be the most accurate you could get unless you monitored every home in the entire world.
Zach said:
Jennifer, Thanks for sharing the details of the Nielsen family experience. I've always wondered what exactly was involved. My concern isn't whether they do what they set out to do well...it is that they don't attempt to capture the full picture. With DVRs/TiVos and online viewing, the outside-the-living-room picture is becoming increasingly relevant.
Tina said:
Hi Zach,
I am a student in economics at Harvard University and am looking for data on online viewer audiences for specific tv shows on a weekly basis. Did comscore have this data? do you have to pay for the data? do you have suggestions of how i could find this data?
Add a comment
What is Worse Than a "Super Mugging"?
By Zach Gemignani
July 8, 2007
Find more about:
bi
business
reporting
I don't know what you call it, but I know it when I see it. A couple months back I wrote about IBM's sweet $80 million contract to develop ARIS (Achievement Reporting and Innovation System) for the New York City public schools. At the time I used some harsh words to describe this fleecing: swindle...preying on clients' lack of expertise...Dr. Evil...wasted time and effort.
News comes to me from Leonie Haimson, Executive Director of Class Size Matters, that the $80 million price tag is, well, a starting point. She pointed me to a recent article that describes the creeping costs:
The education department's new $80 million student-tracking computer system just got more expensive - and some parents are questioning whether that's the best use of the money.
To ensure that children's test scores and other private data don't get into the wrong hands, the city began accepting bids this week from companies that specialize in safeguarding information, which experts say could add several million dollars to the system's price.
"What's not lost on parents of kids in overcrowded schools is that with the money being spent on this, we could build and staff several more schools," said Tim Johnson, president of the Chancellor's Parent Advisory Council.
Parents are also wondering whether the system's mounting cost is worth it - and why education officials didn't anticipate the extra cost sooner. —New York Daily News
It does seem odd that a $80 million system wouldn't come pretty well stocked with security, particularly from a blue-chip vendor like IBM. On top of that, Leonie hints at other costs that aren't being directly counted toward the implementation of this system:
This initiative has mushroomed into a huge expense that threatens to overwhelm the entire school system, with all the SAFS, data inquiry teams, tests, and even the community district superintendents gobbled up to interpret and try to "coach" schools in the use of the massive data that will be spewed out. The DOE wants to charge much of this to the "contracts for excellence" and our CFE dividend, though it’s a real stretch to see if any of this falls under the specific programs outlined by the state.
Good luck to Leonie, Patrick Sullivan and the others who are stepping up to question this white elephant project.
5 comments
yoshi said:
I won't comment on the contract itself except to say that public school's are generally unsophisticated IT users and do naive things like issue press releases on all the wonderful things a system will do before a single line of code is written.
But to the security question. IBM has purchased several security vendors (most notably ISS) and has always had on staff many excellent IT security folks. However - these folks are never involved in bidding, designing, or developing of systems for clients. Or if they are - its usually not a very involved relationship. Its of no surprise that during the design or development of whatever system they are putting in here that the question of data integrity and access has been raised. It could be a new state law or an audit that is pushing the issue. Or perhaps someone getting a clue (unlikely). But knowledgeable security practitioners are rarely involved at the beginning - which is where they are needed most.
derek said:
It certainly sounds as if IBM have pulled off the <a href="http://www.juiceanalytics.com/writing/2006/12/consulting-and-rice-krispie-treats/">Rice Krispie trick</a> on a grand scale.
Zach said:
Now that is a loyal reader!
derek said:
Surely you mean "now that's what I call combating recency bias!"? :-)
Rob said:
Congrats on your 100th post!
Add a comment
Analytics Roundup: Open knowledge resources
By Zach Gemignani
July 6, 2007
Find more about:
data
- Comprehensive Knowledge Archive Network
- CKAN is a registry of open knowledge packages and projects... the place to search for open knowledge resources as well as register your own—be that a set of Shakespeare's works, a global population density database, the voting records of MPs, and more.
Choosing the Right Metric
By Zach Gemignani
July 1, 2007
Find more about:
dashboard
metrics
Misaligned goals, distorted behaviors, and a misguided sense of success... no, I'm not referring to college graduates. I'm talking about the problems caused by using the wrong metrics in your organization. You've probably seen examples like tracking average customer profitability and losing perspective on the variance in profitability or evaluating customer service reps on calls handled without regard for the quality of the experience. I'd like to offer up a quick-bake recipe for choosing the right metric.
Step 1: Set the context
Metrics generally serve one of two purposes. Start by understanding what you are trying to achieve.
1. Identifying problems. Defining the right metrics in this case requires you to do a little detective work: What is the data residue of a problem? What evidence can be found and how exactly does it show up?
2. Measuring performance. The right success metrics need to focus on measures that can be controlled and where improvement in the number is unabiguously a good thing.
Step 2: Balance the four dimensions of a good metric

Lots of metrics fail in at least one of these dimensions. A few examples:
- Common interpretation: We had a client who made a distinction between "leads" and "prospects" in their marketing organization. Prospects had theoretically expressed more interest in the service through their actions. Unfortunately the line between leads and prospects was always hard to decipher and the definitions were hard to communicate. On a related note, we got a kick out of Tom Davenport's (author of "Competing on Analytics") assertion that a company competing on analytics needs to "invent proprietary metrics for use in key business processes." There is nothing inherently wrong with "invented proprietary metrics" but it sounds like something that is designed to confuse anyone outside of the inner sanctum.
- Actionable: Metrics are frequently too broad for the impact that a particular group can have. Customer satisfaction is a popular dashboard staple, but it is hard for most managers to see how they can have a significant impact on the number.
- Accessible, credible data: Sometimes the most valuable and obvious metrics are frustratingly hard to track. In the web analytics world, unique visitors is important to know, but user deletion of cookies has thrown a wrench into the works.
- Transparent, simple calculation: Top NFL agent Leigh Steinberg says of the famous quarterback ratings metric:"Other than one attorney in our office, I am unaware of a single human being who has the capacity to figure a quarterback rating." I don't know what kind of art majors he hires, but all they need to do is use the simplified formula: (83.33 * Comp %) + (4.16667 * Yds per att) + (333.333 * TD pct) - (416.667 * INT pct) + 25/12.
(Want a little validation of this framework? Avinash, respected web analytics guru, just published a post with "Four Attributes of Great Metrics" and he landed on a strikingly similar set of four: 1) instantly useful (i.e. actionable); 2) relevant (i.e. common interpretation); 3) timely (i.e. accessible); 4) uncomplex (i.e. transparent and simple).)
Step 3: Avoid the metrics bugaboos
Finally, here are a few traps that I've seen in deciding on appropriate metrics:
- Trending and distributions: Don't always try to compress a metric into a single number. Often it is more revealing to show the metric across time or as a distribution to uncover variance.
- Edge cases: There will always edge cases where a metric may not mean what you think it means. These situations are worth understanding, but you shouldn't allow the perfect to be the enemy of the good.
- Setting goals: Could you hold someone accountable for this metric without them throwing out a half-dozen reasons why it doesn't make sense? It's a decent test of the value of the metric.
- Self-serving: Be careful that you don't select metrics simply because you know they'll make you look good.
14 comments | Show all comments only the last 5 are shown
Jeff said:
Well thought out, illustrated and extremely relevant. I encounter to many quarterback rankings that not only are too essoterica to be relevant but miss the mark of being actionable. I am lucky to be part of an organization that is quick to call BS on metrics that miss the mark. I appreciate this perspective and will be referencing often :)
Friedbeef said:
That is an extremely well written article. Thanks a lot for sharing
derek said:
Jeff, what's a "quarterback ranking"?
Zach said:
Derek, Jeff is referencing the quarterback rating system used in the NFL to try to measure the effectiveness of individual quarterbacks. They use an intricate calculation to arrive at a single number. More here: http://en.wikipedia.org/wiki/Passer_rating
William Reeve said:
As a former McKinsey consultant, and now a COO of an e-commerce company, I appreciate the value of good metrics. I have to say tho that I have never seen anybody distill the essence of a good metric nearly as effectively as you have in this article. Thank you very much for your article. My former employer, Forrester Research, could charge $000s for such insight - and rightly so!
Henk said:
Finding the right metrics (or KPIs) to measure performance or to identify problem areas for an organisation is THE challenge, indeed. On the highest level, they are usually too abstract to be meaningful (actionable), and drilling down may easily let you get lost in a sea of details, losing to see the forest for the trees. This article nicely summarizes the problem and points into the right direction for analysis. Well done, Zach. We need you!
Darius Wiles said:
If you are interested in this article, you may want to take a look at Andrew Jaquith's book, "Security Metrics: Replacing Fear, Uncertainty, and Doubt". It was recommended to me but I've only just started reading it so haven't drawn my own conclusion yet.
Ben Yates said:
Your blog is great, but your navigation links don't work (Firefox, Windows XP). Diminishes your credibility, which rests on being uber-cool tufte-style usability geniuses.
Jeff said:
I've got FF & WXP here, along with the rest of my office. Links work fine.
Eduardo said:
He might be referring to the "Previous" and "Next" article links at the bottom of the writing. Those both link back to this page instead of the previous and next articles like they should. Not credibility diminishing in my eyes, but a smidge of an inconvenience.
Matt said:
Thanks, I've been wrestling with a document I've been writing, 'Key Metrics' for our support centre. After several weeks I decided I didn't like where it was going, but didn't understand why.
Your post, and metric framework, has given me pleanty to think about.
I get that metrics should be actionable, but some are just intresting and are useful for describing a point in time.
For instance, Number of Calls to service desk, is a reasonable barometer of how busy the service desk is.
Your framework seems to devalue these types of stats.
I would love some clarification around this.
Eric W said:
I presume you mean "credible" data?
Dawn said:
The diagram displays the error in spelling. But it's correct in the paragraphs below. Minor error Eric.
gihan(mayura) said:
the diagram displays a sumaricing of metric







37 comments | Show all comments only the last 5 are shown
Daniel Waisberg said:
Hi Chris,
the screencast is really amazing. I learned some very helpful tips. Please do others like this one.
Now, there is only one thing that you did not include in your graph (which is a long time doubt I have): the gridlines crossing OVER the bars. I have seen several graphs by Tufte and Phew that show white gridlines crossing over grey bars; I love that! Isn't it possible to do it on Excel?
As for the graph itself, there is one thing I did not like. IMHO, there should be one additional piece of information, something like "probability of death". Not sure if this one would make. My problem is that the graph might be used to say, for example, that prostate cancer has a worrying number of new cases, and deaths might increase. However, since we do not have information regarding old cases to compare to deaths, the simple design might lead us to think deaths will highly increase as a consequence of high number of new cases. Sounds reasonable?
Anyway, thank you very for the screencast.
Chris Gemignani said:
Daniel,
The built-in gridlines pass _under_ the chart elements in Excel rather than over, so it's not possible to recreate the Times' treatment.
Probability of death is certainly an important additional measure, but so is median survival time or expected years of life. You can infer probability of death in a crude way from DEATHS / NEW CASES and I'm happy enough with that.
Cheers, Chris
Tony Rose said:
Nice job Chris. I had to laugh a few times as you were getting some push-back from Excel. Sometimes the first cut is better, or more entertaining, than a polished one... Showing how to format the number so MEN and WOMEN are labeled at zero was very helpful. The NYT did a very nice job setting this chart up. Keep up the great work!
Chris Gemignani said:
Thanks, Tony. There was a lot more mumbling under my breath than you heard. I recorded using Parallels to run Excel on my Mac and the keyboard mapping is a little wonky and the delete key doesn't work. Grrr.
It is so much easier to recreate an existing design than to create a great new design from scratch.
Rupesh Tripathi said:
Easier workaround that space guesswork (to centrally align the category labels), Chris, is to add another column where you can put in the number of spaces required; and use the formula =CONCATENATE(REPT(" ",no. of spaces,category label). Some hit and trials will give the result.For a long list / to start with something scientific before hit and trial, here is the solution:
- Figure out the length of longest label(LEN function).
- For each label, use the formula =(length of longest label- length of current label)/2
- Logically this should put in the correct number of spaces in front of each label.. however due to some reason, (may be the width of space character), the alignment seems slightly leftwards. So I add "1" to the formula above.
- In most cases, this would render center alignment. In the skewed cases, one can overwrite the formula with a higher/lower number and achieve desired result.
Wonderful screencast.
Rupesh Tripathi said:
Small correction - Apologies.
Read the formula =CONCATENATE(REPT(" ",no. of spaces,category label) as
=CONCATENATE(REPT(" ",no. of spaces),category label)
Chris Gemignani said:
Interesting idea, Rupesh. Centering those darn labels was the most time consuming part of this exercise, and they still don't look all that great. The slight leftward bias in my labels also relates to charting internal margins and other Excel esoterica. Again, Grrrr.
Michael Doan said:
Great screencast! Thanks for sharing it. I've been using Excel for awhile now but I've never had much of an opportunity to incorporate graphs. When I do, its often a frustrating experience.
James McMurry said:
Nicely done. Excel graphing masters have their own special kung-fu.
Side note: Assuming you're using a Mac notebook (sounds like it), try using the "fn" key in conjunction with "delete". This should make Windows delete as you expect.
Chris Gemignani said:
James, I am using a Mac notebook and fn-Delete does indeed work as a Windows delete. Many thanks.
derek said:
Gridlines over the top is easy if you roll your own using a Line or XY (Scatter) series.
There's a limit to the different types of series that can be combined, but quickly glancing at the design, it doesn't immediately seem to me that you've reached it yet.
derek said:
A question for you typography experts: what widely-available free font most closely mimics the lettering in New Yorks Times graphics?
Mike Ward said:
Very nice graph, learnt a couple of neat tricks there.
Presuming that we'd be creating this graph to be printed out, or exported as a graphic, I'd be tempted to use either a text box or do something with the camera object here for the series labels.
It might require some altering of cell heights, but I'm sure something could be achieved.
Nils said:
I've only just started making screencasts professionally, but this is one amazing example. Something to look up and aspire to.
Chris Gemignani said:
Good question, Derek. I'm not a huge typography geek, but the NYT font looks like Helvetica Neue which is a really nice multi-weight version of Helvetica that's available free on OS X, but not in Windows land. It's a font we use a lot for presentations here in Juiceland.
Microsoft released a bunch of good new fonts with Windows Vista. They're available as a free download for older versions of Office if you google for "Microsoft Office Compatibility Pack". If you're interested in typography and use Windows you should get these fonts.
I tried using these fonts on the graph and there is a slight improvement. See above for a version with improved fonts.
doug said:
For the headline face, NYT is using Franklin Gothic (Bitstream, not ITC); for the body text, Helvetica. Their online features (at nytimes.com) seem to be using Helvetica for all text.
Great screencast.
Chris Gemignani said:
Thanks, Doug. Lazyweb, we thank thee!
Pete Skomoroch said:
Nice work Chris. I thought this chart looked sharp when I saw it on reddit the other day. There have been a number of nice graphics and interactive "infographics" coming out of the NYT lately. Any idea what they use to produce these?
Can we look forward to a script/screencast showing how to reproduce this in matplotlib/python?
-Pete
derek said:
Thanks Chris and Doug. I had a Franklin-a-like already, and I've downloaded the Powerpoint 2007 Viewer to get the new Vista fonts, and used Calibri for a Helvetica substitute.
Adam Richardson said:
This is a great tutorial, I learned a lot on this. I've always been turned off by Excel's charts but never spent the time to learn how to make them better.
Wrote it up on my blog too: http://richardsona.squarespace.com
Henk said:
Well done, guys.
I was somewhat intrigued by Daniel Waisberg's wish to have the gridlines OVER the bars. Dereks' suggestion to make a combichart with your own gridlines is one possible solution but I don't think so easy as he said for most of us. I see two possible ways to get these gridlines on top of the bars (although I would like to add that I don't think it's very necessary to do so). I share both ideas here, for discussion purposes.
1. You can use a stacked bar chart with a thin "white" lining around the fill. This requires a bit juggling with the spreadsheet with conditional numbers (for the length of the bars; if it exceeds the default value between the grid values, it needs to be cut off).
Note: I fear this is not easier than derek's suggestion. It also requires some forward thinking about the grid, the default value taken as a variable.
2. An overlay chart. Essentially you split the chart in (a) the bars and (b) the value and category axis, including axes values and grid lines. Make sure the plot area is transparent, and that the dimensions are EXACTLY equal. Now position (b) on top of (a) (Select chart (b) and use ALT+mouse to snap it into position over (a) accurately).
Note: this method is a bit inflexible in the sense that resizing is elaborative. Moreover, in this particular example of a combined chart it may not be so easy to do.
I hope this makes sense.
derek said:
Well, I meant for someone with the skills to reproduce Chris's graph. At that level of play, the drawing of arbitrary lines over the top usng a XY (Scatter) graph range should be peanuts.
Unfortunately I can't get the Quicktime screencast to work, so I don't know in enough detail how the trick was done to advise how to take the next steps. It would be something like this: create a range of cells with the following values:
<pre>X-value Y-value
-150,000 0
-150,000 12
-100,000 0
-100,000 12</pre>
...and so on. Fix the secondary x and y axis scales to the appropriate values (if they're not already being used for something else--if they are, have a think aboubt designing your way around the problem). Format the line so it is the same as the background colour (white). Now they will only be visible when they cut across the bars.
Build-your-own axis scales (which is very similar to this) is also a highly useful technique; I would go so far as to say it's the most powerful single piece of Excel hackery there is. Both are described in the Juice article "<a href="http://www.juiceanalytics.com/writing/2006/08/tufte-charts-in-excel/">Tufte Charts in Excel</a>" by Zach.
Brian Timoney said:
Make sure you install those Windows updates that they were bothering you with during the screencast....
Keith said:
I bet that is a very good illustration of how to produce a good presentation, particularly in a business point of view.
All marketing managers should learn to produce these types of presentations!
derek said:
<i>Make sure you install those Windows updates that they were bothering you with during the screencast....</i>
They're not bothering me with any updates, either in Opera or Internet Explorer. Both browsers, after taking a long time to download the very large screencast, report:
"QuickTime is missing software required to perform this operation.
Unfortunately, it is not available on the QuickTime server."
I don't see any suggested action arising from that message.
Shawn Mitchell said:
I am not an Excel guru but I did manage to get the grid lines to show through the bars in the graph.
Below is a link to a screen shot of the effect using Excel 2007.
<a href="http://photobucket.com" target="_blank"><img src="http://i159.photobucket.com/albums/t152/robertshawnmitchell/transparent_bar_graph.jpg" border="0" alt="Photo Sharing and Video Hosting at Photobucket"></a>
Shawn Mitchell said:
Here is a close up of the chart with both data series using transparency.
<a href="http://photobucket.com" target="_blank"><img src="http://i159.photobucket.com/albums/t152/robertshawnmitchell/closeup.jpg" border="0" alt="Photo Sharing and Video Hosting at Photobucket"></a>
derek said:
Shawn, you're using a feature that is new in Excel 2007, but you've misunderstood the effect that is being sought. It is not to have a black gridline visible under a translucent bar (though there are ways to acheive that effect pre-2007 also).
It is to have a white gridline visible *over* a solid bar, but invisible on the white background. That is acheived by a custom line series, but not, I believe, by built-in gridlines even in XL2007. They're still underneath the data series.
NH said:
Cool... my recollection of font usage:
AIC Franklin Gothic -heds & subheads, 12pt. (1 col.), 14pts. (2cols.), etc... also used for smaller uppercase (7pts.)
Helvetica & Helvetica-Light: body text (9pts.)
Helvetica-Bold (8-9pts.): bolded upper & lower case text.
Helvetica Light-Oblique: source line (7 or 8pts?)
AIC Imperial: credit (5.8pts?)
jen said:
I can't get my second graph to flip and display in reverse. It does one of two things:
1. the order of the Y axis values sorts in the opposite order;
2. only one of the data series moves to the opposite side. the one on the secondary axis stays put.
Any suggestions?? Thanks!
Javaun said:
Hi Derek. I too use Excel 2003, and so I guess I don't have the bar transparency feature that Shawn proposed to make the gridline appear to float over the bar. Still, Shawn's idea would work to make the gridline float over the bar but appear transparent on the background. He simply needs to change the dotted gridline color to white. The white will show briefly through the transparency (may appear off-white) but will be indistinguishable against the backround. I'm guessing that for the NY Times graph, they did a rough mockup in excel using ugly colors and ugly fonts, and then a designer traced it (to preserve the scale) in Illustrator and beautified it with color and fonts.
sesha said:
Great work. Keep posting to benefit many like me.
Can you also help me in constructing graphs on a mckinsey chart that we use at our office. My problem is to edit the text boxes and graphs every time i need to update the data
Zach said:
Sesha, we have developed an approach for automatically updating PowerPoint slides (charts, text boxes, tables) from Excel spreadsheets. I'm not sure if that is exactly what you are referring to. We can discuss offline if it is.
Sarah said:
I created a similar graph using Jon Peltier's tornado graph as a starting point. I was able to get white gridlines on top of the bars by creating a dummy series and then adding y-error bars. I had the additional requirement of getting the Male and Female sides into a single chart, so I had to use a dummy series for the y axis anyway. Here is what it looks like: http://flickr.com/photos/saamiam/2176279190/
brandie said:
my father died of lung cancer...hahahha jking
Ashutosh said:
I think NYT uses Tableau to create its graphs. I have many charts in NYT, which no doubt look like Tableau charts.
Jon Peltier said:
Nice charts, and comparisons within sex are easy. The problem with two-sided "tornado" charts like this, is it is very difficult to compare the two sides, male vs female. Obviously men have more prostate cancer, and women more ovarian and breast cancer. But I can't tell the differences in pancreatic, colorectal, or non-hodgkins.
An option would be to put more space between the males bars, and insert the corresponding female bars.
What was the basis for sorting? It wasn't by value, nor by alphabetical order.
said:
Add a comment