excel

Delivering Data in Excel: The DTP Framework

Here at Juice we build fewer Excel dashboards than we used to. Excel itself is a decidedly imperfect vessel for any serious development--it’s simply too easy to veer off of the disciplined track onto the underbrush.

Even so, Excel remains a playground where we can do surprising things. For instance, check out our Excel lightbox and an Excel tagcloud. We could appropriate everything that you find on the webbiest of Web 2.0 websites and build our Uruk-hai equivalents.

The key to staying on the rails when building Excel tools--either dynamic dashboards or simply to explore data--is discipline. At Juice, we use a methodology that we call "DTP" (Data Tansform Present). The foundation of DTP is the rigorous separation of data from presentation. This is similar to a well-known approach when building computer user interfaces called Model-View-Controller. I’m going to cover some of the key principles and we’ll follow up with an example later on the blog.

Data

Data is the raw material of any visualization or report. It needs to be easy to add data or change data without having to change anything else about your dashboard.

We store raw data with dimensions preceding metrics in blocks in separate worksheets. If you want to sound pretentious, you can call this "first PivotTable normal form". Key points:

  • Have one worksheet for each data source.
  • Call these sheets "Data", or "{Title} Data".
  • Place them at the end of your workbook.
  • Data is snug to the top left of the spreadsheet. This allows us to use dynamic ranges. Dynamic ranges let you add data and have it automatically incorporated in all PivotTables.
  • Ensure that column names are in the first row.
  • Dimensions before metrics
  • Place your dimensions before metrics.

Transform

We use PivotTables to transform the data into the structure we need.

  • Call these sheets "Transform" or "XXXXXXX Transform".
  • Create one sheet for each issue that you are exploring. This doesn’t mean that you will only create one PivotTable. You may have multiple PivotTables to support different views or perspectives on an issue.
  • Show all items
  • Turn on "show items with no data" for row and column dimensions.
  • We are seeking predictability, we want to the PivotTable to always be the same size regardless of what the PageField filters are.
  • All dimensions must have a home
  • Place all the dimensions that aren’t used as rows or columns in the PivotTable as page fields. Every dimension should have a home. PivotTable settings
    • Set all PivotTables to not store data and refresh on open.

Present

The Presentation page copies data from the Transform page(s) and formats it for display. It also allows users to control what data is being displayed.

  • Build a user interface to interact with your data. There are many ways to let people interact with your data, but one of the easiest is to use a PivotTable as your interface. This is described below.
  • We use an in-house style guide for graphs that you can see in our Chart Chooser.
  • If the Presentation page is likely to be printed, preset the print range.
  • When copying data from the transformation page to the presentation page, blank values will come out as zeros. We use a simple formula, =if(’Transform!A2’<>"",’Transform!A2’, ""), to ensure that blanks remain blanks.

Using a PivotTable as your interface

A simple way to let people manipulate your data is place a PivotTable containing only PageFields but no data on the presentation sheet. A Visual Basic macro triggered to run whenever the PivotTable changes then pushes out any changes to the master PivotTable to all the PivotTables on your Transform sheet.

Here is the code to make this happen.

This drives our PivotTables in concert and ensures they stay in sync.

That’s a basic overview of our DTP technique. You can try a simplified version of DTP here.

DTP Example.xls

We’ll be back soon to talk through this example.

Bubble, bubble toil and trouble

Recently we wanted to show how Concentrate, our new long-tail keyword tool, could give you a view of search patterns across travel websites.As political junkies, we were inspired by this chart from our friends at the NY Times.

NY Times candidate word bubble chart

The first tool we tried, simply on principle, was Excel 2003. As expected, making a NY Times quality bubble chart in Excel 2003 is a hard problem. Here’s a draft of how far I got before giving in to label fatigue.

Excel NY Times bubble

The bubbles themselves aren’t tough, but getting the labels right is hard. I’d love to see a solution, so if any reader wants to tackle it eternal fame can be yours. Here is a CSV if you want to try.

travelpatterns.csv

Another of the tools we use at Juice is NodeBox, which we used to make this:

Concentrate pattern comparison

Here’s the code that made the graph.

The power of a programmatic approach like this is that by changing a line or two, you can get the following. Click for a larger version. Click the text for the code..

Display the largest item in each row as a red square

With great power comes a great need to exercise restraint. Otherwise you end up like these poor chaps. Must... flex... restraint... muscles...

Decorating data

An early Christmas present has arrived from the DabbleDB team for the 100 million or so of us that have to work with data on a day to day basis.

They’ve created a do-what-I-mean web tool that lets you show how you want data to be restructured and bang! it’s done. Check out the video.

Cleanup data in action

It’s a great idea and a elegant, easy to use interface. There are so many directions I’d love to see them take this tool.

Cleanupdata is a great name, but they’re really giving you better ways to restructure data. This tool won’t help you find and fix errors and anomalies in data. At least not yet.

I also hope they extend cleanupdata to let people automate these data restructuring operations. If only you could apply a cleanup created in cleanupdata.com to 1,000 Excel spreadsheets or to a database table.

If you like this, it’s worth checking out DabbleDB. They have rethought the database with a database/spreadsheet/web forms/visualizer platypus of a tool. It lets your data be pliable in ways that databases don’t allow, while retaining structure that spreadsheets don’t recognize.

Added: Avi Bryant, one of the authors of the cleanupdata.com service notes that the example in the screencast is motivated by this post on cleaning data in Excel. Compare and contrast. I know most people would prefer to avoid ="("&MID(H2,1,3)&") "&MID(H2,4,3)&"-"&MID(H2,7,4) in order to format a phone number.

Analytics Roundup: TIps for showing, sharing, communicating

Developer’s Guide - Google Chart API - Google Code
Beautiful stuff, particularly the Venn diagram.

Align Journal - BI Worst Practices
We often see articles on BI "Best Practices" here is an article telling us what NOT to do.

flot - Google Code
Attractive Javascript plotting for jQuery.

ongoing · On Communication
Interesting blog post about how different forms of communication rank for immediacy, lifespan, and audience reached.

The Excel Magician: 70+ Excel Tips and Shortcuts to help you make Excel Magic : Codswallop

SlideShare
Source for presentation ideas.

Introducing Chart Chooser

Find and Download Great-Looking Excel and PowerPoint Charts

Chart Chooser is an online tool that answers two questions we commonly get:

  1. What type of chart should I use to show my data?
  2. How can I make good looking Excel or PowerPoint charts?
Chart Chooser

Chart Chooser is easy:

  1. Check the boxes on the left that best describe your objective
  2. Select the chart that you want to use
  3. Choose from Excel or PowerPoint downloads to get a formatted chart template

A few notes about Chart Chooser:

  • Thanks to Andrew Abela of Extreme Presentations for inspiring Chart Chooser with his “Choosing a Good Chart” post and for working with us to put this tool together.
  • We’ve tried to make the charts both Tufte-compliant (i.e. minimal chart-junk) and visually attractive (thanks to Google for the color scheme).
  • Feel free to suggest other types of charts that you’d like to see in the Chart Chooser. Send an example to chartchooser@juiceanalytics.com.
  • If you’d like a customized version of Chart Chooser for your organization, write us at chartchooser@juiceanalytics.com or call me at 202.251.7750.

A Boon or a Pest? Google Apps Haiku Contest

We were recently asked to answer some questions about our usage of Google Apps. We’re writing up some business-prosey answers, but poetry is another way of capturing the experience. Here are the questions and our answers in loose haiku.

1) Where are you located?

In Herndon, VA
Beside the flowing traffic
Grove Street, 555

2) What does your business do?

What are these numbers?
Sea of corporate data
Juice is your life raft

3) How many people do you employ?

The cat leaps, clawing
The coiled bird escapes
Seven feathers fall

4) Who are your main competitors?

Thundering feet pound
Yet by the rippling puddle
The mammal sips uncaring

5) Why did you decide to use Google Apps, and why did you choose Google over other commercial or opensource alternatives?

Spring air warms the tree
Talking Heads song can’t fight
O2 for free

6) What products did it replace and why?

Old friends whither
In spring, new shoots grow
Excellence in change

7) Are you using the Standard (free) version or Premier (paid) version and why?

The raven’s keen eye
Gathers all he needs
He has no wallet

8) Which applications do you use (Gmail, Talk, Calendar, Docs and Spreadsheets, Page Creator….All?) . Which ones give you the most benefit?

Star, thread, search
Dinner for seven at seven
Featureful sunrise

9) How many people are using Google Apps and how?

Does the happy frog count
Beside the spring bullrushes
How many croaks he hears?

10) What benefits have you derived from using the Google Apps? (quantifiable benefits if available)

Deep frozen roots
Towering tree, branches drooping 
A nut in the snow

11) What features of the product do you appreciate most and why?

Was it you or me?
Making rash changes
Revision history

12) What’s been the overall impact of using the Google Apps?

Hive mind emerges
Cicadia-like, a boon?
or a pest?

13) Any advice you’d give others in implementing and using Google Apps?

Internet down?
Keep a chair warm
At local Starbucks

14) Are you using any other Google applications such as Maps? AdWords? AdSense? Please elaborate.

Reroute my route? Cool!
Every trip now includes
A stop at IKEA

15) What improvements would you like to see in Google Apps that would benefit your business?

A shopping list
is useful, but PivotTables
sparkle in sunshine

Care to share your experience with Google Apps? We’ll highlight the best haiku in a later post.

Analytics Roundup: Chicken presentation and so much more

Programming Collective Intelligence
Pulling information from community contributed data.

Videos that can change your organization
Top ten business videos on YouTube.

The Encyclopedia of Business Cliches

UC Berkeley CS160 User Interfaces Fall 06
Course readings and student notes.

Language Log: Chicken: the PowerPoint Presentation
The presentation you dare not give.

Prometheus Meets the Enterprise Management System
I laughed, I cried, I laughed again.

Diagrams: Tools and Tutorials

Data Visualization: Modern Approaches
A grab bag of ideas.

fontblog : Introducing Ambiguity
A typographic symbol to indicate ambiguity, compare to the typographic mark lol which indicates stupidity.

Whimsley: The Netflix Prize: 300 Days Later

Process Trends Website
Good excel charting and visualization tips.

BusinessWeek: Who Participates And What People Are Doing Online
A simple and fairly effective use of square pies.

Recreating the NY Times Cancer Graph

This New York Times cancer graph is a beautiful piece of work.

NY Times cancer graphic

I wanted to see if we could reproduce it with everyday tools.

Excel reproduction of the NY Times cancer graphic

Click here to watch a screencast showing how it was done. Warning the screencast is a little long—14 minutes—and a little unpolished. One cut, no retakes, banzai analytics!

Derek raised an interesting question about how to find the fonts used by the New York Times. While I don’t think you can find a high quality free version of these fonts (Helvetica Neue, Univers?), Microsoft has made some very good new fonts for Vista and these are also available to Microsoft Office users through a compatibility pack. Here’s a link or google for "microsoft office compatibility pack". I recommend using these fonts.

Here’s a version of the graph with these new fonts and more emphasis on getting the typography right.

Excel reproduction of the NY Times cancer graphic with better fonts

Excel 2007 and the Lie Factor

“The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the quantities represented.”

Edward Tufte calls violation of this principle the “Lie Factor”. The implementation of in-cell data bars in Microsoft Excel 2007 is a big offender.

Almost a year ago, I was surprised to discover that the Microsoft Excel 2007 development team didn’t understand what zero means. Their implementation of in-cell data bars showed a bar in a cell, even if the cell had a zero or very low value.

Data bars in the Excel 2007 prototype

That was in the Excel 2007 Beta. Things haven’t improved in the current version of Excel 2007. The default setting for data bars in Excel 2007 is to scale to bars so that the smallest bar is based on the smallest value in the selected range and the largest bar is based on the largest value. It still appears that the smallest bar will be no smaller than five or ten percent of the width of the cell. Here’s a sample:

Sample data bars in Excel 2007

So, if you select a range that has values between 600 and 700, the 600 would have a little bitty bar and the 700 would have a full-width bar. Based on the bars, it would look like the 700 is ten to twenty times larger than 600. Outside of Redmond, this is generally regarded as untrue.

What’s more, if you create two sets of data bars side by side, each group of data bars scales itself independently even though they look the same. Take a look at this screenshot:

Sample data bars from two different conditional formats in Excel 2007

Notice the top seven cells have data bars that have one set of scaling and the bottom data bars have a different scaling. However, they look identical, and users should generally expect these bars to have the same scale.

Here are the rules:

  1. Defaults matter! It doesn’t matter that you can do data bars correctly in Excel. The default should be to do it right and it should be hard to do it wrong.
  2. The “right way” to make data bars is to make the length of the data bar directly proportional to the value in the cell. If one cell has a value twice another it should have a bar that is twice as long.
  3. Remove the default gradient shading. The gradient makes it hard to tell where the bar ends, obscuring what you’re trying to show.
  4. Continuous cells with data bars should all use the same scale. Use different colors to indicate ranges that have different scales.

Excel 2007 supports at least twenty-five different combinations of ways of specifying the length of the data bar.

Five different ways of setting data bars

Exactly one of those ways is correct. Base the shortest bar on the number 0. Base the longest bar on the highest value. Turn off the gradient. If you want to see bars based off percentile or some custom formula, then be explicit. Create a new column, create your formula, create bars on that column.

Please, guys, this isn’t rocket science. This is plain common sense. You would not ship Microsoft Word with a glaring bug in the way text renders. You would not ship Excel with a broken statistical function that people use everyday. Delivering deceitful-by-design infographics betrays your central role in democratizing the analysis of data. Until you fix this, in-cell ASCII art still remains the best way to explore data visually.

A disclosure: We do not currently use Excel 2007 at Juice Analytics. This is not due to a high-minded sense of moral outrage but is merely a reflection of our clients’ environments.

Analytics Roundup: Databases and spreadsheets

Sharing what matters | Economist.com
This requires a new level of flexibility in the database. When building most databases today, programmers decide in advance what sort of questions users might wish to ask of the data, by defining what are known as the “schema"

Allowing Business Users To Program Your System Is A Recipe For Disaster
Parts of the solution make sense, but the problem is scare-mongering.

Spreadsheet errors, news stories about spreadsheets with costly mistakes
Ok, now who’s going to measure the benefits?