1. Skip to navigation
  2. Skip to content
  3. Skip to sidebar

When Nathan Yau of FlowingData poses a visualization challenge, we listen. When that data prominently features Charlie Sheen who is busily stirring television’s pot with a very big spoon, we can’t resist. And also because we share Charlie’s philosophy: “The only thing I’m addicted to right now is winning.”

Nathan challenged people to build a visualization showing TV’s top earners from a CSV datafile with 72 actors. Here is our entry

Click to open in a new window

An introductory video

How did we approach the problem of making TV Actor data tell a story?

Nathan’s dataset was neither very rich (just actor name, type of show, and salary) nor very large (72 actors were included). Fortunately, the two weaknesses combined to be a strength; it was easy to hit IMDB and fill in essential missing details. We added show name, network, average IMDB rating, and gender. Here’ is the updated data set.

1) Tell a clear story

What information is most interesting? You can never go wrong focusing on the war between the sexes. We wanted to let people see the data in many different ways while always being able to easily tell the difference between men and women. We also wanted to show both averages and detail to let you see which actors are driving those averages.

2) Depth and details

When designing the Explorer we were inspired from David McCandless and Andy Perkins’ “lava lamp” visual exploration of scientific support for medical supplements.

Snake Oil Supplements Visualization

Our visualization is a simplified version of a typical bubble chart, but eliminating positioning along the x-axis as a indicator of value. Interestingly, this simplification opens up possibilities to use that axis in new ways. We could now split the chart horizontally to show dimensions such as show type, gender, and network.

The bubbles were placed on top of a horizontally scaled bar chart that showed the average income for the group. The width of the bar depends on the total income of the group. Here are the two visualizations seen separately.

TVs top earners bars TVs top earners bubbles

Salaries were scaled vertically from $0 to $700k with Charlie Sheens bubble protruding assertively out of the top of the frame.

3) Pow! One more thing

Steve Jobs is famous for ending presentations by showing “just one more thing”, then socking his audience with some unbelievably cool new feature. We wanted this visualization to have that same sense of surprise and discovery. If you hold down the mouse button on any group, the bubbles part to reveal the Male/Female breakout of average salaries within that group.

TVs top earners one more thing

Finding insights

With richer data and a flexible visualization, we found some interesting stories in the data. Why does Charlie Sheen get to demand a raise from his reported salary of $1.25 million per episode while that amiable fellow Matthew Morrison from hit show Glee make a piddly $30,000 per episode? When we break the chart out by “seasons on air,” it is clear that as a show ages, the actor’s ability to capture more value increases. Presumably the risk of the show failing declines—and the actor’s bargaining power improves. Charlie Sheen has spent the past few weeks testing this hypothesis.

TVs top earners seasons on the air breakout

If this sort of thing interests you, perhaps we can help you with your dashboard and data display problems.

Topics:



Juice’s own Ken Hilburn brings it home at the Strata conference. He sits down with Mac Slocum of O’Reilly for a few data softballs. Here’s a second-by-second account.

[0:03] Q: What are the most common visualization mistakes that people are making?

[0:03] A: The bottom line is usability. Stay focused on your purpose, making decisions, taking action. Stay simple, if you have to explain you’re failing.

[0:15] Brett Favre endorses Wranglers, but Ken wears Data Wanglers [not shown].

[0:59] Dropping names, and twisting the knife on usability.

[1:25]? What better time to confront a little gap in your knowledge than when you’re being filmed? It’s Antoine de Saint Exupéry.

[1:48] Q: Do we need different tools to create simple visualizations?

[2:05] Plentiful shout outs to friends in the industry. Even Business Objects gets a friendly mention, is Ken getting soft?

[2:53] A difficult point to cover in a short time. We speak of data journalism and telling stories with data, but there are really no great tools that allow this in an everyday business environment. There is a lot of attention, not much progress.

[3:15] Q: What makes a great dashboard?

[3:25] A: Zach and Ken just delivered a 3 hour tutorial on the subject earlier in the week. Can Ken cover it in 30 seconds? Attention, context, and data drilling are keys but there’s precious little time to do more than mention big concepts.

[4:20] Q: Are dashboards too complicated?

[4:22] A major softball to close the session. Cheshire cat grin from Ken.

[4:54] “Getting your brain around it.” What Ken isn’t saying is that we know today’s businessfolk aren’t just looking at one dashboard, they have to look at many. They have to integrate info from lots of disparate systems into a picture of how their business is doing. If your info is harder, more complex, presents a bigger cognitive load, or is slower to load, then it’s going to be less valuable.

If you haven’t had the pleasure of getting to know Ken, stay tuned. We’ll be doing more Viva Visualization events in ’11 with more detailed prescriptions for great dashboards and there’s no better way to spend your morning than eating a nice warm bagel and listening to Ken.

Topics:



At Juice, we work with web analytics APIs large and small, from Google, comScore and Omniture. The Google Analytics API is our favorite. It powers the world’s best, most widely deployed analytics site. And it powers Juice products like Concentrate (innovative search analytics) and Vasco de Gapi (a tool for exploring the Google Analytics API).

We were approached by the Google Analytics API team to find ways to explore new ways of looking at data with the API, and we were excited by the possibilities. We’ve been working on our own visualization framework, JuiceKit™, that integrates the power of the Flare Visualization Library with Adobe Flex.

The result is Analytics Visualizations, two visualizations powered by the Google Analytics API that are free to use. You just need a Google account with access to Google Analytics data to explore your own data.

Analytics Visualizations Home Page

Referrer Flow

Curious about what sites are linking to you and what content is benefitting the most? Referrer Flow answers those question and shows how results change over time. Here is a brief video introduction:

Referrer Flow is a stream of daily treemaps showing pageviews and bounce rates for various groupings of your website’s pages. You can group by combinations of page title, referrer and url. Clicking on the treemap will filter all the data by the page, referrer or url that you clicked on. Click again to clear your filter.

Keyword Tree

A list of top keywords isn’t enough to really understand how people are searching and finding your site. Keyword Tree visually displays the most frequently used search keywords and how they are used together. Here’s a video overview:

You’ll see a frequently used search term at the center and the words and phrases that are most often used in combination with that word. Pick a different starting word by typing into the box in the upper right or selecting from the top word across the bottom of the screen. The words are sized by their frequency of use and colored by bounce rate (or % new visitors or average time on site). Roll over a word to see details about that combination of connected words.

Depth and Discovery

In designing these visualizations we focused on the question: how can we let users uncover the unexpected? That means designing targeted visualizations focused on limited well-defined issues. The Referrer Flow monomaniacally focuses on a single question “What pages are people viewing on your site and where are they coming from?” The Keyword Tree is laser-focused on word ordering and what that means for keyword performance.

The Google Analytics reporting tool is a great general-purpose reporting solution. It gives the advanced users everything they need to answer specific questions. However, its generality means it has limited ability to focus on two issues; depth and discovery.

The Google Analytics API is Google’s solution to this problem. It’s an opportunity both for businesses like ours that can create new ways of analyzing data, and for large sites that can use the API for integration, custom analytics, and more.

Thanks to Nick Mihailovski at Google for his gracious support, help and encouragement and Avinash Kaushik for inspiring this idea.

Topics:
, , ,



The following is an excerpt from our three-part series: “A Guide to Creating Dashboards People Love to Use”. It is chock full of best practices and practical tips for designing dashboards. This particular nugget is something we’ve used to great effect and wanted to make sure our readers didn’t miss out simply because they were afraid of ending up on our mailing list. There is even a movie version.

We’d like to offer a simple framework for effective use of fonts in your dashboard. With a few simple decisions, you can ensure that the text on the dashboard will both look good and communicate effectively. The majority of text on the page falls into four categories:

  • Body text is clean, readable content
  • Headers separate and name major sections of your work
  • Notes describe additional things the reader should be aware of. These should fade into the background unless we call attention to them.
  • Emphasis text is what we want our reader to pay particular attention to.

The following table describes an approach for deciding how to display each of these text types. The yellow highlights indicate where you need to make decisions.

Simple Font Framework

It comes down to three basic decisions:

  • Choose size and font of the body text
  • Decide if the header is going to flip to serif or sans-serif—and whether it is going to have any style
  • Decide what to do about emphasis—color or (bold or italic)

A few things things don’t fit neatly into one of the four text categories listed above, such as table headers and graph titles. We tend to use a combination of styles to handle these exceptions.

Stick to this framework and we guarantee your dashboard will look better. Take a look at this example, starting with a standard-looking Excel report without out much thought put toward the fonts:
Simple Font Framework Example 1


The following version of the same report cleans up the table, chart, and fonts:

Simple Font Framework Example 2


A final version uses Georgia for the title font and brings in a new emphasis color. The result: a totally different but equally clean and readable report.

Simple Font Framework Example 3

Topics:
, ,



Here at Juice we build fewer Excel dashboards than we used to. Excel itself is a decidedly imperfect vessel for any serious development–it’s simply too easy to veer off of the disciplined track onto the underbrush.

Even so, Excel remains a playground where we can do surprising things. For instance, check out our Excel lightbox and an Excel tagcloud. We could appropriate everything that you find on the webbiest of Web 2.0 websites and build our Uruk-hai equivalents.

The key to staying on the rails when building Excel tools–either dynamic dashboards or simply to explore data–is discipline. At Juice, we use a methodology that we call “DTP” (Data Tansform Present). The foundation of DTP is the rigorous separation of data from presentation. This is similar to a well-known approach when building computer user interfaces called Model-View-Controller.
I’m going to cover some of the key principles and we’ll follow up with an example later on the blog.

Data

Data is the raw material of any visualization or report. It needs to be easy to add data or change data without having to change anything else about your dashboard.

We store raw data with dimensions preceding metrics in blocks in separate worksheets. If you want to sound pretentious, you can call this “first PivotTable normal form”. Key points:

  • Have one worksheet for each data source.
  • Call these sheets “Data”, or “{Title} Data”.
  • Place them at the end of your workbook.
  • Data is snug to the top left of the spreadsheet. This allows us to use dynamic ranges. Dynamic ranges let you add data and have it automatically incorporated in all PivotTables.
  • Ensure that column names are in the first row.
  • Place your dimensions before metrics.
    Dimensions before metrics

Transform

We use PivotTables to transform the data into the structure we need.

  • Call these sheets “Transform” or “XXXXXXX Transform”.
  • Create one sheet for each issue that you are exploring. This doesn’t mean that you will only create one PivotTable. You may have multiple PivotTables to support different views or perspectives on an issue.
  • Turn on “show items with no data” for row and column dimensions. Show all items
  • We are seeking predictability, we want to the PivotTable to always be the same size regardless of what the PageField filters are.
  • Place all the dimensions that aren’t used as rows or columns in the PivotTable as page fields. Every dimension should have a home.
    All dimensions must have a home

    • Set all PivotTables to not store data and refresh on open.
      PivotTable settings

Present

The Presentation page copies data from the Transform page(s) and formats it for display. It also allows users to control what data is being displayed.

  • Build a user interface to interact with your data. There are many ways to let people interact with your data, but one of the easiest is to use a PivotTable as your interface. This is described below.
  • We use an in-house style guide for graphs that you can see in our Chart Chooser.
  • If the Presentation page is likely to be printed, preset the print range.
  • When copying data from the transformation page to the presentation page, blank values will come out as zeros. We use a simple formula, =if(’Transform!A2’<>"",’Transform!A2’, ""), to ensure that blanks remain blanks.

Using a PivotTable as your interface

A simple way to let people manipulate your data is place a PivotTable containing only PageFields but no data on the presentation sheet. A Visual Basic macro triggered to run whenever the PivotTable changes then pushes out any changes to the master PivotTable to all the PivotTables on your Transform sheet.

Here is the code to make this happen.

This drives our PivotTables in concert and ensures they stay in sync.


That’s a basic overview of our DTP technique. You can try a simplified version of DTP here.

DTP Example.xls

We’ll be back soon to talk through this example.

Topics:
, , ,



Recently we wanted to show how Concentrate, our new long-tail search analytics tool, could give you a view of search patterns across travel websites.
As political junkies, we were inspired by this chart from our friends at the NY Times.


NY Times candidate word bubble chart

The first tool we tried, simply on principle, was Excel 2003. As expected, making a NY Times quality bubble chart in Excel 2003 is a hard problem. Here’s a draft of how far I got before giving in to label fatigue.

Excel NY Times bubble

The bubbles themselves aren’t tough, but getting the labels right is hard. I’d love to see a solution, so if any reader wants to tackle it eternal fame can be yours. Here is a CSV if you want to try.

travelpatterns.csv

Another of the tools we use at Juice is NodeBox, which we used to make this:

Concentrate pattern comparison

Here’s the code that made the graph.

The power of a programmatic approach like this is that by changing a line or two, you can get the following. Click for a larger version. Click the text for the code..

With great power comes a great need to exercise restraint. Otherwise you end up like these poor chaps. Must… flex… restraint… muscles…

Topics:
, , , ,



Decorating data

Chris Gemignani

An early Christmas present has arrived from the DabbleDB team for the 100 million or so of us that have to work with data on a day to day basis.

They’ve created a do-what-I-mean web tool that lets you show how you want data to be restructured and bang! it’s done. Check out the video.

Cleanup data in action

It’s a great idea and a elegant, easy to use interface. There are so many directions I’d love to see them take this tool.

Cleanupdata is a great name, but they’re really giving you better ways to restructure data. This tool won’t help you find and fix errors and anomalies in data. At least not yet.

I also hope they extend cleanupdata to let people automate these data restructuring operations. If only you could apply a cleanup created in cleanupdata.com to 1,000 Excel spreadsheets or to a database table.

If you like this, it’s worth checking out DabbleDB. They have rethought the database with a database/spreadsheet/web forms/visualizer platypus of a tool. It lets your data be pliable in ways that databases don’t allow, while retaining structure that spreadsheets don’t recognize.

Added: Avi Bryant, one of the authors of the cleanupdata.com service notes that the example in the screencast is motivated by this post on cleaning data in Excel. Compare and contrast. I know most people would prefer to avoid ="("&MID(H2,1,3)&") "&MID(H2,4,3)&"-"&MID(H2,7,4) in order to format a phone number.

Topics:
, ,



Analyticstime!

Chris Gemignani

If you struggle to legitimize analytics within your organization, you can’t touch this video for a powerful explanation of the impact of analytics:

MC Hammer at the AlwaysOn/STVP Summit at Stanford, “Music Artists Go Entrepreneurial.” Around minute 24:00.

Topics:
, ,



Straight from the parallel universe where clever and horrible go together like peanut butter and chocolate comes the following press release:

We are excited to announce the launch of our new community website for Sears and Kmart customers. The service you originally registered with, My SHC Community is now called sk-YOU. The new name represents “Sears and Kmart, building a better relationship with you” and that is of course, part of our vision and mission. It is a growing and personalized online community currently comprised of 40,000 consumers who want to be heard. You can share ideas, opinions and thoughts on a wide variety of topics from travel to kitchen appliances and cell phone service. It enables you to provide feedback and guidance on the offers and shopping experiences that are most important to you.

I can see how this sounded wildly clever in a meeting.

Mash Sears, Kmart, and “you” all together and look what you get. It shows our commitment to the customer and it sounds like “sku”.

Bzzzt, horrible. People don’t care about stock keeping units—and they certainly don’t want to be associated with one. They don’t care about clever. Unless you’re a financier, there’s no reason to associate Sears with Kmart. Branding should help the you understand and remember a product. It’s not about how you perceive the customer or about how you perceive an internal initiative. The dash and all caps YOU makes it harder for the customer to remember. But I ramble.

At Juice, our naming bible is available in PDF form from Igor International.

http://www.igorinternational.com/process/naming-guide-product-company-names.php

The central wisdom of this guide—and it’s packed full of gems, naming taxonomies by industry, checklists, taglines, case studies—is that names fall into the following categories.

Descriptive names (names that describe what the product or company does)
BMW, IBM, AdWords

  • Good for a product, easy to remember
  • Rough sledding for a company name, as there will be dozens of companies in the field with similar names (unless you have 100 years of meticulous branding like BMW and IBM)

Invented names with latin roots

  • Aquilent, Taligent, Acela, Agilent
  • “Safe” choices, hard to remember, a blank slate. Generally too clever by half. Hey, did you think it was clever to name a company as a cross between “agile” and “intelligent”? Nobody cares!

Invented names that are fun to say

  • Snapple, Oreo, Kodak
  • Fun to say, opens the door for lots of positive associations with strong branding

Experiential names (names that describe the experience of the company or product)

  • Navigator, Safari, TrailBlazer, Fidelity
  • Intuitive but common, doesn’t differentiate, a workmanlike approach for a product

Evocative names (names that evoke feelings about the experience you will have with the company—those feelings may even be initially negative)

  • Caterpillar, Apple, Amazon, AirPort, Target, Yahoo, Virgin
  • Connects emotionally with people because they have lots of previous experience with the word. “Scary” choices that are hard to get a committee to agree to

We often are are asked why we’re named “Juice”—Igor is the answer. When we go places, people say “Heeey, Juice guys!”—if you’re a client, be aware you’re not the first one to use that line. We benefit from every dollar Nantucket Nectars spends on their “Juice Guys” ads and we love it. Every dollar Tropicana spends helps you remember our name. Even OJ Simpson is on our branding team.

If you’re naming an internal product, steer toward descriptive names or evocative names. If you’re creating a reporting portal, don’t be afraid to call it “Report Portal”. Or call it “Butterfly” or “Moonbeam.” Brighten people’s lives by delivering fun, or ease their lives by not making them remember some obscure acronym. Most of all, remember to be a servant of your customers and that clever is not equal to smart.

Topics:
, ,



Check out our followup post that describes how we created a downloadable Windows application or an excel spreadsheet you can use to create these graphics.

One of the troubles with Tufte is the frustrating infeasability of his approach to design for real people in business. One of his recommendations is to use Adobe Illustrator.

Adobe Illustrator is a big serious program that can do almost anything on the visual field (other than Photoshop an image). Most of my sparkline work was done in Illustrator. Fortunately all graphic designers and graphic design students have the program and know how to use it, so find a colleague who knows about graphic design.

Raise your hand if you have a graphic design assistant at your beck and call. I thought not.

One of the tools we use for rapid prototyping at Juice is NodeBox.

NodeBox is a Mac OS X application that lets you create 2D visuals (static, animated or interactive) using Python programming code and export them as a PDF or a QuickTime movie. NodeBox is free and well-documented.

All true. But it’s more helpful to think of NodeBox as a free Adobe Illustrator that you can program in the world’s easiest programming language. Oops, here’s the right link.

I wanted to see if we could reproduce the following graph from The Visual Display of Quantitative Information, p 158.

Tufte Current Receipts Graphic

Here’s the code. It’s 11 lines of code if you exclude entering the data and setting things like fonts and colors.

size(500,700)
font('Palatino');
fontsize(12)
stroke(0.4)  # a medium grey for lines
fill(0.2)    # a slightly darker grey for text  

<h1>data = (label, first, last, label-fudge-factor)</h1>

data = [ ('Sweden', 46.9, 57.4, 0., 0.),
         ('Netherlands', 44.0, 55.8, .3, 0.),
         ('Norway', 43.5, 52.2, 0., 0.),
         ('Britain', 40.7, 39.0, 0., 0.),
         ('France', 39.0, 43.4, 0., 0.6),
         ('Germany', 37.5, 42.9, 0., -0.4),
         ('Belgium', 35.2, 43.2, 0., 0.),
         ('Canada', 35.2, 35.8, .8, 0.4),
         ('Finland', 34.9, 38.2, -0.5, 0.),
         ('Italy', 30.4, 35.7, 0.3, -0.3),
         ('United States', 30.3, 32.5, -0.3, 0.),
         ('Greece', 26.8, 30.6, 0.4, 0.),
         ('Switzerland', 26.5, 33.2, -0.2, 0.1),
         ('Spain', 22.5, 27.1, 0., 0.3),
         ('Japan', 20.7, 26.6, 0., 0.), ]

text("Current Receipts of Goverment as a Percentage of "
      "Gross Domestic Product, 1970 and 1979", 20, 70, width=215)
text("1970", WIDTH*.28, HEIGHT*0.03)
text("1979", WIDTH*.68, HEIGHT*0.03)

def ypos(val):
    # calculate a vertical position by scaling between 10% and 90% 
    # of the height of the image
    return HEIGHT * (0.9 - 0.8 * (val - minval) / (maxval - minval))

<h1>find the minimum and maximum values in the range</h1>

alldata = [d[1] for d in data] + [d[2] for d in data]
minval, maxval = min(alldata), max(alldata)

for label, start, end, startfudge, endfudge in data:
    align(RIGHT)
    text(label, 0, ypos(start+startfudge)+4, width=0.25*WIDTH)
    text("%0.1f" % start, 0.25*WIDTH, ypos(start+startfudge)+4, width=0.07*WIDTH)
    align(LEFT)
    text(label, WIDTH*.75, ypos(end+endfudge)+4)
    text("%0.1f" % end, 0.68*WIDTH, ypos(end+endfudge)+4, width=0.07*WIDTH)
    line(WIDTH*.33, ypos(start), WIDTH*.67, ypos(end))

Here’s what the result looks like.

Tufte Current Receipts Graphic with NodeBox

We have some great followups to this planned for next week. We’ll reimplement this code with the Python Imaging Library, which will open things up for Windows users. We have some great plans for mashing these graphics up with our just released Google Analytics API.

Check out our followup post that describes how we created a downloadable Windows application you can use to create these graphics.

Topics:
, , , ,



Page 1 of 131234510...Last »