tufte

Tufte shares wisdom for data presenters

Tufte1
Tufte1

For a famous person, Edward Tufte is adept at avoiding the papparazzi. You probably know this iconic Tufte teaching picture. But it is pretty hard to find another picture of him.

Until now. The clever folks at AdAgeStat were able to get a shot (undoubtably with a bowtie camera) of Tufte for an interview on their AdAgeStat blog.

Tufte in full color
Tufte in full color

The interview is worth a read. It covers some of the typical Tufte hobby-horses, like this rant about PowerPoint:

"PowerPoint benefits the bottom 10% of presenters by forcing them to have points, some points ... any points at all. And the best 10% of presenters have such good content, style and self-awareness that PowerPoint does little damage. PowerPoint should be used solely as a projector operating system to show 100% content, without the bullet grunts, logos and the formatting nonsense from the Strategic Communications Department, and the $20 million Pentagram corporate format guidelines."

That stuff aside, there were some great nuggets about data presentation. For example his take on presenters and credibility:

"Presenters need (1) to tell a coherent story and (2) to convince their audience of their credibility. A good way to gain credibility is not to have lied to the same audience last month. Another is to demonstrate that you are not a cherry picker, basing your case on evidence selection rather that on evidence. Another necessity is to demonstrate your mastery of detail."

In my experience, providing your audience with some (limited) flexibility to interact with the data is a great mechanism for building credibility. Have the confidence to allow access to more than cherrypicked data and you won't come across as manipulative.

Tufte pushes back on the notion of being "overwhelmed by data" by saying:

"Overload, clutter, and confusion are not attributes of information, they are failures of design. So if something is cluttered, fix your design, don't throw out information. If something is confusing, don't blame your victim -- the audience -- instead, fix the design."

In the world of business intelligence and reporting software, there isn't a lot of empathy with audiences. The focus is squarely on the user trying to create something, not the reader trying to understand the content.

Finally, he hits on a seldom-discussed gap in data analysis by noting that "good content reasoners and presenters are rare, designers are not."

In conversations with people like Andrew Abela and Nancy Duarte, we've thought a lot about how tools can help people better present data. In the end, it is still a very human art form to synthesize understanding about a problem and construct a logical argument or story around it. Tools can only help facilitate and guide the process. That's what we are trying to do with Slice.

30 Great Visualization Resources in 30 Days

A lot of the applications that Juice creates are designed to make information more accessible to people who wouldn’t consider themselves to be data experts. They realize the value in the data that they have, and in many cases they have some sort of analytics solution in place, but they know they’re not getting as much value from their data as they should. One of the hurdles we frequently come up against is that people who aren’t actively participating in the visualization discussion don’t know what’s possible. All they’ve ever seen, in many cases, are the confusing dashboards, charts, and graphs that are all too prevalent from the vendors in our space. You know the ones: a thick layer of technology slathered with some gloss and wiggle, between two slices of "do it yourself".

In many cases, we find ourselves closing this gap by referring to some of the best examples of work out there. As we were thinking about this, the idea to provide a simple walk through of these examples came into being. The result: a 30 day calendar chocked full of some of the best samples of skills enhancing examples we could find.

30 Days to Better Visualization

Each day is a bite sized chunk and takes only a few minutes to watch, read, do, or play. Some of the days are comprised of Juice content, but most days are from other sources that we’ve found useful.

You can download it to use yourself, or to share with your friends who need to expand their info-viz horizons. Either way, we think it’ll get your creative juices flowing.

Tufte-Style Comparison Chart Generator

Last week, we shared a rendition of a Tufte graphic using just a few lines of Nodebox code. As our commenters pointed out, Python is great, but it may not be every business analyst’s carnal desire to learn a programming language just to generate some nifty graphs. I spent some time to push Chris’s Nodebox rendition into a PIL-based Windows tool that can generate the same sort of comparison graph from an Excel file on the fly.

The result is The Comparison Chart Generator 1.0. The installation instructions are relatively simple. Unzip the zip file, and run comparisionchartgenerator.exe.

Alternatively, we have a new excel chart that creates the same effect using only excel functionality. Download the Excel Tufte Line Chart here.

If you are using the Chart Generator, start with some data in an Excel (xls) or Comma Delimited (csv) format. The data for this graph has to be contained within the first sheet starting with cell A1, as in the following picture.

Excel Dialog

Select an input file. There are a couple example files bundled with the download.

Open File Dialog

After selecting a file, you’ll be prompted to modify a few of the basic options available for the chart.

Options Dialog

Finally, save the result as a jpeg.

Save File Dialog

Here is the same image found in Tufte’s textbook processed using the Comparison Chart Generator. It is generated using the csv example file bundled with the download.

Tufte-esque Chart by Comparison Chart Generator

Those of us who have undergone lasik eye-improvement surgery may still prefer the sharp crisp Nodebox results, but for the rest of us, this image looks pretty good. Let us know if this tool is useful. If there is enough of a positive response, we may consider expanding functionality for other fancy Tufte-esque charts.

If you do prefer Nodebox, I have an updated script here. This pushes the script up to 20 lines of code or so, but the extra 9 lines allow the labels to push themselves apart on their own. If you want to look at the source code for the Windows program, you can get it here. I used py2exe to compile it into an executable. The code, however, has not been thoroughly commented or cleaned as of yet, so edit it at your own risk.

Real-World Tufte Graphics in 11 Lines of Code

Check out our followup post that describes how we created a downloadable Windows application or an excel spreadsheet you can use to create these graphics.

One of the troubles with Tufte is the frustrating infeasability of his approach to design for real people in business. One of his recommendations is to use Adobe Illustrator.

Adobe Illustrator is a big serious program that can do almost anything on the visual field (other than Photoshop an image). Most of my sparkline work was done in Illustrator. Fortunately all graphic designers and graphic design students have the program and know how to use it, so find a colleague who knows about graphic design.

Raise your hand if you have a graphic design assistant at your beck and call. I thought not.

One of the tools we use for rapid prototyping at Juice is NodeBox.

NodeBox is a Mac OS X application that lets you create 2D visuals (static, animated or interactive) using Python programming code and export them as a PDF or a QuickTime movie. NodeBox is free and well-documented.

All true. But it’s more helpful to think of NodeBox as a free Adobe Illustrator that you can program in the world’s easiest programming language. Oops, here’s the right link.

I wanted to see if we could reproduce the following graph from The Visual Display of Quantitative Information, p 158.

Tufte Current Receipts Graphic

Here’s the code. It’s 11 lines of code if you exclude entering the data and setting things like fonts and colors.

size

(

500

,

700

)

font

(

'Palatino'

);

fontsize

(

12

)

stroke

(

0.4

)

# a medium grey for lines

fill

(

0.2

)

# a slightly darker grey for text

<

h1

>

data

=

(

label

,

first

,

last

,

label

-

fudge

-

factor

)

h1

>

data

=

[

(

'Sweden'

,

46.9

,

57.4

,

0.

,

0.

),

(

'Netherlands'

,

44.0

,

55.8

,

.

3

,

0.

),

(

'Norway'

,

43.5

,

52.2

,

0.

,

0.

),

(

'Britain'

,

40.7

,

39.0

,

0.

,

0.

),

(

'France'

,

39.0

,

43.4

,

0.

,

0.6

),

(

'Germany'

,

37.5

,

42.9

,

0.

,

-

0.4

),

(

'Belgium'

,

35.2

,

43.2

,

0.

,

0.

),

(

'Canada'

,

35.2

,

35.8

,

.

8

,

0.4

),

(

'Finland'

,

34.9

,

38.2

,

-

0.5

,

0.

),

(

'Italy'

,

30.4

,

35.7

,

0.3

,

-

0.3

),

(

'United States'

,

30.3

,

32.5

,

-

0.3

,

0.

),

(

'Greece'

,

26.8

,

30.6

,

0.4

,

0.

),

(

'Switzerland'

,

26.5

,

33.2

,

-

0.2

,

0.1

),

(

'Spain'

,

22.5

,

27.1

,

0.

,

0.3

),

(

'Japan'

,

20.7

,

26.6

,

0.

,

0.

),

]

text

(

"Current Receipts of Goverment as a Percentage of "

"Gross Domestic Product, 1970 and 1979"

,

20

,

70

,

width

=

215

)

text

(

"1970"

,

WIDTH

*.

28

,

HEIGHT

*

0.03

)

text

(

"1979"

,

WIDTH

*.

68

,

HEIGHT

*

0.03

)

def

ypos

(

val

):

# calculate a vertical position by scaling between 10% and 90%

# of the height of the image

return

HEIGHT

*

(

0.9

-

0.8

*

(

val

-

minval

)

/

(

maxval

-

minval

))

<

h1

>

find

the

minimum

and

maximum

values

in

the

range

h1

>

alldata

=

[

d

[

1

]

for

d

in

data

]

+

[

d

[

2

]

for

d

in

data

]

minval

,

maxval

=

min

(

alldata

),

max

(

alldata

)

for

label

,

start

,

end

,

startfudge

,

endfudge

in

data

:

align

(

RIGHT

)

text

(

label

,

0

,

ypos

(

start

+

startfudge

)

+

4

,

width

=

0.25

*

WIDTH

)

text

(

"

%0.1f

"

%

start

,

0.25

*

WIDTH

,

ypos

(

start

+

startfudge

)

+

4

,

width

=

0.07

*

WIDTH

)

align

(

LEFT

)

text

(

label

,

WIDTH

*.

75

,

ypos

(

end

+

endfudge

)

+

4

)

text

(

"

%0.1f

"

%

end

,

0.68

*

WIDTH

,

ypos

(

end

+

endfudge

)

+

4

,

width

=

0.07

*

WIDTH

)

line

(

WIDTH

*.

33

,

ypos

(

start

),

WIDTH

*.

67

,

ypos

(

end

))

Here’s what the result looks like.

Tufte Current Receipts Graphic with NodeBox

We have some great followups to this planned for next week. We’ll reimplement this code with the Python Imaging Library, which will open things up for Windows users. We have some great plans for mashing these graphics up with our just released Google Analytics API.

Check out our followup post that describes how we created a downloadable Windows application you can use to create these graphics.

New Year’s Resolution: Tufte and the iPhone

Edward Tufte has produced a illuminating video tour of the user interface of the iPhone. The video illustrates Tufte’s struggles to come to grips with the difference between dynamic screen resolution and the resolution of printed paper. Tufte is prone to grandiose pronouncements, like this one:

All history of improvements in human communication is written in terms of improvements in resolution: to produce, for viewers of evidence, more bits per unit time, and more bits per unit area. Slideware is contrary to that history. Trading in reductions in resolution for user convenience or for pitching may be useful in mass market products or in commercial art, but not for technical communications. The solution is not to rescue slideware design; the solution is to use a different, better, and content-driven presentation method. On this solution, see our thread PowerPoint Does Rocket Science—and Better Techniques for Technical Reports — Tufte Nov 10 2006

Somehow, I don’t think the importance of the Gutenberg Bible related to it showing “more bits per unit area.” Quick, count the “bits per unit area.”

Gutenberg bible courtesy of Wikipedia
Illustrated bible courtesy of Wikipedia

It didn’t take bits per unit area to revolutionize communication in the past and it won’t in the future either. The iPhone is a tremendously engaging information device and points the way forward for information displays. Here’s what the iPhone does well:

Maximize screen real estate: Controls are only visible when needed, fading away gently when you are concentrating on content. Tufte furiously neologizes, calling this “computer information debris.” Control junk is more apt, more terse, more Tuftian.

Direct manipulation: As Tufte says: information is the interface. Filtering and choosing should take place in the context of direct manipulation. A good essay on the possibilities of direct manipulation can be found here.

Fun: Above all, information can be fun and engaging to navigate. Tufte condemns Apple’s stock ticker for having “cartoony” and PowerPoint-like displays and offers an improved version (with 5 digits of precision). Apple’s cheery display offers a more entertaining, usable interface for day-to-day usage.

With our empathy for the day-to-day troubles of the business person seeking insight in data, it’s frustrating listening to Tufte. He is clearly an academic, with academic interests and academic timeframes. As much as his work is respected and inspirational within business circles, he makes little effort to enable his message to be implemented.

Good Tufte: Clutter and overload are not an attribute of information, they are failures of design. If the information is in chaos, don’t start throwing out information, instead fix the design.

Bad Tufte: “…the conclusion of sparkline analysis in Beautiful Evidence, where the idea is to make our data graphics at least operate at the resolution of good typography (say 2400 dpi).” http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msgid=0002NC&topicid=1 *Ed: At least 2400 dpi? Orly?

Mostly right Tufte: “Thus the iPhone got it mostly right.”

Mostly wrong Tufte: “Adobe Illustrator is a big serious program that can do almost anything on the visual field (other than Photoshop an image). Most of my sparkline work was done in Illustrator. Fortunately all graphic designers and graphic design students have the program and know how to use it, so find a colleague who knows about graphic design.” http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msgid=0000Jr&topicid=1&topic=Ask%20E%2eT%2e

It is heartening to see Tufte engage and connect his mental frameworks to our modern, screen-oriented, graphics-accelerated, not graphics-designed world. But the future of information design and interaction belongs to the iPhone, not the printed page.

Excel 2007 and the Lie Factor

“The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the quantities represented.”

Edward Tufte calls violation of this principle the “Lie Factor”. The implementation of in-cell data bars in Microsoft Excel 2007 is a big offender.

Almost a year ago, I was surprised to discover that the Microsoft Excel 2007 development team didn’t understand what zero means. Their implementation of in-cell data bars showed a bar in a cell, even if the cell had a zero or very low value.

Data bars in the Excel 2007 prototype

That was in the Excel 2007 Beta. Things haven’t improved in the current version of Excel 2007. The default setting for data bars in Excel 2007 is to scale to bars so that the smallest bar is based on the smallest value in the selected range and the largest bar is based on the largest value. It still appears that the smallest bar will be no smaller than five or ten percent of the width of the cell. Here’s a sample:

Sample data bars in Excel 2007

So, if you select a range that has values between 600 and 700, the 600 would have a little bitty bar and the 700 would have a full-width bar. Based on the bars, it would look like the 700 is ten to twenty times larger than 600. Outside of Redmond, this is generally regarded as untrue.

What’s more, if you create two sets of data bars side by side, each group of data bars scales itself independently even though they look the same. Take a look at this screenshot:

Sample data bars from two different conditional formats in Excel 2007

Notice the top seven cells have data bars that have one set of scaling and the bottom data bars have a different scaling. However, they look identical, and users should generally expect these bars to have the same scale.

Here are the rules:

  1. Defaults matter! It doesn’t matter that you can do data bars correctly in Excel. The default should be to do it right and it should be hard to do it wrong.
  2. The “right way” to make data bars is to make the length of the data bar directly proportional to the value in the cell. If one cell has a value twice another it should have a bar that is twice as long.
  3. Remove the default gradient shading. The gradient makes it hard to tell where the bar ends, obscuring what you’re trying to show.
  4. Continuous cells with data bars should all use the same scale. Use different colors to indicate ranges that have different scales.

Excel 2007 supports at least twenty-five different combinations of ways of specifying the length of the data bar.

Five different ways of setting data bars

Exactly one of those ways is correct. Base the shortest bar on the number 0. Base the longest bar on the highest value. Turn off the gradient. If you want to see bars based off percentile or some custom formula, then be explicit. Create a new column, create your formula, create bars on that column.

Please, guys, this isn’t rocket science. This is plain common sense. You would not ship Microsoft Word with a glaring bug in the way text renders. You would not ship Excel with a broken statistical function that people use everyday. Delivering deceitful-by-design infographics betrays your central role in democratizing the analysis of data. Until you fix this, in-cell ASCII art still remains the best way to explore data visually.

A disclosure: We do not currently use Excel 2007 at Juice Analytics. This is not due to a high-minded sense of moral outrage but is merely a reflection of our clients’ environments.

Analytics Roundup: Fixing Excel and dashboards

Ask E.T.: Cleaning up Excel’s poshlust graphics
Discussion of the Chart Cleaner along with other approaches to make Excel look good.

The Universe of Discourse : Excessive precision
Humorous take on one of the ways excessive precision can creep into reports.

Data Visualization: Intelligent Dashboard Design
The third in a series of columns that feature the winners of DM Review’s 2005 data visualization competition.

Godin, Tufte and Types of Infographics

A few days ago Zach wrote about Seth Godin’s take on Edward Tufte. You know you’re really onto something when your first three comments include: "Seth Godin is out of his gourd and totally wrong." and "Hallelujah, Seth!!!!!" (note the five exclamation points).

Let’s start with some facts:

  • Godin is a provocateur. "I think this is one of the worst graphs ever made," he says about the Napoleon graph. That’s hardly a well-reasoned statement—but it makes a point. Personally, I think Godin’s in way over his head when talking about what graphs are for.
  • Tufte is a provocateur. "This is, for example, the conclusion of sparkline analysis in Beautiful Evidence, where the idea is to make our data graphics at least operate at the resolution of good typography (say 2400 dpi).", he writes. This provocation is more subtle than his well documented aversion to PowerPoint. He’s saying that a computer screen is not an effective tool for data graphics.
  • We are provocateurs, too. Pitting these luminaries against each other with only a brief amount of context is a recipe for delightful blog swirl and discussion.

In a battle between provocateurs it’s best to at least keep your sense of humor about you. You should also be careful and clear when defining your terms.

There are at least two categories for infographics: exploratory and explanatory. A great example of exploratory infographics is what Hans Rosling is doing with Gapminder. This shows us that we can use infographics to go on a personal journey of discovery to understand data. I choose what questions to explore and how to represent the data. Exploratory graphics can be quite complex because I maintain a thread of context in my mind as I explore the data. Animation is very useful here to help maintain context while changing dimensions.

Explanatory graphics are at best the distilled product of exploration and at worst, as Tufte often points out, a tool of deception. Explanatory graphics are often used to establish facts to guide a discussion. "We’re selling more widgets than wodgets! The widget sales trend is up!". I’m sure this is what Godin is talking about: "I think you’re trying to make a point in two seconds for people who are too lazy to read the 40 words underneath."

Tufte has done a great job at increasing awareness of good information display. On the other hand, he promotes graphs that are strongly tied to a specific context—a facet of the data that they are illustrating. For instance, the Minard graph is a story about the survivors of Napoleon’s march. It does not directly illuminate the battles fought, or how men died, or the story of the armies that faced Napoleon, or the demographics of his army, or the strategic choices Napoleon faced. While this famous graph illustrates many dimensions, it obscures many others, and we need to be aware about this editorial judgment.

Tufte frustrates on a number of levels. He is enormously influential in business. Businesses send people to his seminars and they come back energized with the essential truthfulness of his message. Yet weeks later those principles are abandoned by the lack of practicality of his message. No one in business is going to design a graph in Adobe Illustrator as he can. They use Excel. Seldom can we spend days or weeks refining and testing a graph. The work must be done and then we move on.

Notes:

The Google Video links here jump directly to a point in the presentation. You can create direct links like this:

http://video.google.com/videoplay?docid=-4101280286098310645#17m26s

The #17m26s appended to the end of the URL jump you to 17 minutes and 26 seconds into the video.

Thanks for all the comments on the previous post, but I wanted to single out Jorge Camoes for a particularly level-headed comment. Thanks.

Some more discussion can be found at Emergent Chaos were Thomas Ptacek has posted an insightful comment.