Analytics Roundup: Visualization goodies

The Problem With Presentations
Don't let presentation software keep you from getting your story across,

Webfoot's Mapeteria: Map Colouring
Want to make a choropleth thematic map (i.e. coloured based on your data) for Canadian provinces, U.S. states, or French départements?

The Econ 101 Management Method - Joel on Software
Instead of having smart people figure out how to train their frontline customer service workers to serve customers well and profitably, they make up metrics that sound good and let the low wage, high-turnover customer service people come up with their own.

The rise and fall of IT | Perspectives | CNET News.com
Scrap IT? A well-reasoned argument for scrapping the term "IT."

Gallery of Data Visualization

Summize - Summarized product reviews
A nice visualization for showing rankings.

The Extreme Presentation(tm) Blog: Choosing a good chart

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

0 comments | Add a comment

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment






GIS for the rest of us... or Another Excel-based Mapping Tool

A client recently asked us if we could help him find the five locations that have the most customers nearby. I dug into the annals of our blog to find two tools that together ("wonder twin powers activate... shape of mini-GIS solution") could solve this problem:

1. Juice geo-coding tool lets you get the precise latitude and longitude for a list of addresses and plot these locations in Google Earth.

2.ZIP code finder lets you enter a US address and returns zip codes within a certain number of miles of that address.

Led by our resident Excel guru David, we combined these features into an Excel tool that lets you answer common location-based questions such as:

  • How many libraries are within 10 miles of Worchester, MA?
  • Which cities have the most libraries within 10 miles? 20 miles?
  • Could I see the library locations on a map, please?

Here's how it works:

1. Drop your location information into the "Geocode data" tab. We are using the Yahoo geocoding service.

2. Go to the results tab and fill in the table with your selected location addresses and distances. Press "Calculate."

Download the Excel file here: Juice Distance Tool

We had a couple other features in mind, but thought it would be better to get some reader reaction before we loaded it up.

This tool is released under the MIT license.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

10 comments | Show all comments only the last 5 are shown


July 1, 2007
Zach said:

Last I tried, Yahoo's geocoding service (the one baked into this tool) returns a lat/long value for just a zip code.


July 2, 2007
kelvin said:

This is a GREAT TOOL! but ya, I can't see the site names from the JUICE tool...however, geocode 3.1 xls works, BUT it ends with a debug error after it completes the macro. the end result file provides the site names tho, which is VERY HELPFUL for me! is there a way to plugin the address for each site name/location? that would be ideal.

I use this tool to help me assess the number of apartment complexes in a given area. VERY USEFUL for me to get a quick idea of competition in a particular area i'm looking to invest in!


February 12, 2008
Jerry said:

Chris (#4) It has nothing to do with Excel 2003. The outputKML macro is referring to wrong column numbers. I changed my Geocode tab to make the Name/Description column into two separate columns & then changed two lines to reflect it:
name = CStr(ActiveSheet.Cells(r, 7))
description = CStr(ActiveSheet.Cells(r, 8))


February 12, 2008
Jerry said:

If you're behind a firewall, the calls to http.send will fail ....

Function yahooAddressLookup(addr As String, Optional Row As Long) As String
' perform RESTian lookup on Yahoo
Dim response As String

URL = "http://local.yahooapis.com/MapsService/V1/geocode?appid=yahoo_test&location=" & addr

'Create Http object
If IsEmpty(http) Then Set http = CreateObject("WinHttp.WinHttpRequest.5.1")

'Send request To URL
http.Open "GET", URL

http.send
....
....

Only way around it is to add code for http options to work with proxy authentication


October 26, 2008
Eric said:

Do you guys know how I can implement something similar to your Excel geocoder into MS Access where I can use a table of address records instead of a sheet of address rows in Excel.

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Analytics Roundup: Naming matters

Igor | Naming companies, naming products
Definitive essay on naming from the company that inspired our name.

Vox Populi: Best practices for file naming | 43 Folders
One approach to a tough problem that we all have.

Seth's Blog: Worst powerpoint slide ever used by a CEO
Pretty bad, but surely not the worst.

HARLEM-13-GIGAPIXELS.COM
Extreme data presentation in a different realm.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

0 comments | Add a comment

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment






Why make 100 charts when one will do?

Charts are a great way to explore data. Here is some American baseball data showing player salaries over a five year period.

Baseball salaries by team over time

Charting this data with a line chart would allow us to see trends in salaries by team. However, when we use Excel’s default chart, we get something that looks like this:

Excel's default multiline chart

That’s quite a mess. It would be a lot easier if we could create one chart for each row.

The OFFSET function is going to help. In its simplest form the OFFSET function works like this:

OFFSET(anchor, rows from anchor, columns from anchor)

That is, OFFSET will start with the anchor cell, go down a number of rows from that anchor and over a number of columns and return the value it finds.

OFFSET function

We can use the OFFSET function to create cells that pull a single row of data out of the table dynamically. We create a new row atop of our data and create a series of OFFSET functions that all rely on a single cell (the big yellow one) for their row offset. So changing one cell will pull different rows of data into our fixed location.

Creating a dynamic row that doesn't move

Now, chart the data that doesn’t move.

Charting the dynamic row

After fixing the chart, we’d like to make it easy to change the value in the big yellow cell.

We can use Excel Forms to build a lightweight user interface. Bring up the Excel forms toolbar by rightclicking on any toolbar and choosing Forms. Place a scrollbar beside the chart.

Excel Forms

Right clicking on the scrollbar allows you to Format Control. Link the control to the cell that is controlling all the row offsets. Now, moving the scrollbar will update the chart.

Chart with scrollbar Selecting Format Control Formatting the scrollbar control

Now, the scrollbar controls the chart. Here is the baseball spreadsheet for you to play with: Baseball_offset.xls Have fun!

On the way to 100 charts

Note: this post is adapted from a presentation I gave at eMetrics 2007 in San Francisco.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

34 comments | Show all comments only the last 5 are shown


October 13, 2009
ray said:

great and simply outstanding!!!


October 22, 2009
Tim said:

This is great!! This is such a time saver. One question, how do you auto scale the axis in case you have a very wide range of data. For example in one graph I need a to scale to 200, in the next one I need to scale to 5000.

Thanks and keep up the great work you're doing.


February 26, 2010
Sean said:

It sounds like an amazing concept, however, whilst using Excel 2003. It seems that the 'Control' tab on the 'Format Control...' window is missing. Would you be able to provide a sample excel document so that I could look at the VB code behind the scroll bar? or provide the code on here?

Many thanks!


February 26, 2010
Sean said:

Oops, I didn't see the sample document mid way down the page!

Many thanks!


June 29, 2010
Yan said:

Love it! Very easy to understand. Thanks!

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





The Google Analytics relaunch

Google Analytics has been rebuilt and the result redefines the frontiers of doing analytics on the web. Avinash Kaushik has the definitive early review.

Google Analytics v2

I had the privilege of attending the launch and playing with the early release. Here are a few things I noticed.

  • Speak my language: Google has put a lot of effort into replacing specialized terms with everyday ones. This makes the application usable by a broad base of people and is one way to fight GUI Jock-itis.
  • Speed kills: The interface is easily reconfigurable and fast. I've long argued that interface speed is a substitute for configuration options. I'm curious to play with the tool and get a better sense if this is true.
  • Flex rules: Much of the componentry for viewing data in Google Analytics is built in Adobe Flex. This is similar to Google Finance, and not at all like GMail or Google Reader, which use the GWT. We believe this has profound implications for analytical tools on the web and will dig into this in later posts.
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

2 comments


May 10, 2007
FM said:

Nice and must say timely review, I've been using Analytics.google.com and found it good, some time.
some time it's not an average site owner would like to look at, i mean you may lost your way through it.

Talking about Speed, it's been major sat back, till the day. however if it's improved in relaunch, it's great.


June 20, 2007
Lynn Cherny said:

I had lunch with a friend recently who told me that the Mindmaps fellow who built the new Google Analytics was just more used to Flex. So it may not be sinister or deep, although I agree with you on the overall benefits of the feel and design :-)

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Dashboard Storytelling

Everyone wants a dashboard and the promise of a world in which the intricacies of your business are clearly laid out on a single page. Dashboards can make running your business as easy as driving a car, where slight adjustments and careful attention to warnings mean smooth sailing on the road to success.

I'm not so convinced. For someone who is, check out the mysterious Dashboard Spy. He/she has a massive collection of dashboard screenshots and describes these precious morsels as "simple to understand and impressive to look at, these scorecards are becoming 'must-haves' for all enterprises."

If we already live in a dashboard-centric world, we might as well do them right. I see at least three areas where dashboards need improvement: depth, information display, and storytelling.

Depth. Stephen Few makes a worthwhile distinction between dashboards and something he calls "faceted analytical displays" (FADs):

  • A dashboard is a visual display of the most important information needed to achieve one or more objectives; consolidated and arranged on a single screen so the information can be monitored at a glance.
  • A faceted analytical display is a set of interactive charts (primarily graphs and tables) that simultaneously reside on a single screen, each of which presents a somewhat different view of a common dataset, and is used to analyze that information.

We might consider dashboards a static version of FADs (or we could consider FADs a versatile dashboard). If that's true (and I'm sure Stephen will step in to correct me), then who wants a plain dashboard? Why build something that only raises questions but doesn't give the user any ability to drill down, explore, tweak parameters, or otherwise try to answer those questions?

Information display. Like most reporting, dashboards suffer from poor information design. Here's our list of blogs that preach the right way and highlight the offenders. Here are two particularly misguided design approaches that I've seen recently...

Just because it is called a dashboard doesn't mean you need to take the concept literally (via Dashboard Spy)

Just because you can make it shiny doesn't mean you should. Crystal Xcelsius not only vigorously embraces pie charts, but they add a "reflective kidney bean" to further derail the information display.

Storytelling. Most dashboards are loose affiliations of charts—a hodgepodge of graphics on the same topic intended to offer a full view of a situation. It is the same problem so many people run into in creating PowerPoint presentations.

You want the information to easily slide into the viewer's brain and stick when it gets there. The best dashboards have story-like features such as:

  • Set the stage. What is the context? Who are the characters?
  • Focus on only the important elements and themes; don't try to be a comprehensive account of everything that happened. Ruthlessly cut extraneous content.
  • Offer recognizable characters to spare the reader's precious attention. There is a high cost to asking readers to learn from scratch. For dashboards this means terms, metrics, graphics, and metaphors that are familiar within the organization.
  • Create flow and cohesiveness from chapter to chapter. Themes and characters reappear chapter after chapter. A good dashboard isn't a bunch of disjointed charts, but a logical flow from one analytical examination to the next.
  • Levels of detail. Some elements of the story span the entire experience; other details provide the insights and seasoning to keep your interest.

Here's a good example of a dashboard (perhaps FAD) from Visual I-O that has many of these storytelling elements.

In contrast, the following dashboards (courtesy of Dashboard Spy) don't attempt to explain anything to the reader:

If you've seen a worse dashboard, sent it our way and we'll put together a gallery of the worst of the worst. Please redact any company-specific information.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

9 comments | Show all comments only the last 5 are shown


May 16, 2007
nixnut said:

Hello Zach,

Perhaps my example of senior managers is a wrong example. I was merely trying to point out that there are groups of dashboard users that would use a dashboard for monitoring and not for analysis. I think that the difference between monitoring and analysis is large enough that the principles of perception and cognition would lead to different designs.

Having a tool that is capable of designing visual displays to cover the full range from monitoring KPI's to sophisticated analysis is indeed something to aim for. Alas such tools are not available yet (at least none that I know of).

While such a tool would allow you to build both dashboards and FAds these are still different things imho. The dashboard would still display the KPI's (or their derived metrics) and the FAD would let you look at the numbers behind the values for these metrics, generate and look at different perspectives, possibly do some simulation etc.

I didn't say FADs are restricted to one data source. I said they would usually work with just one dataset. I suppose my wording wasn't too clear. By dataset I mean a set of data about one subject area. In a dashboard it's natural to display metrics from several subject areas that are not related (other than by organisation) or not related in ways that are useful for filtering. Filtering or brushing as you could do in a FAD would update all the facets in the FAD to reflect the selection made. But in a dashboard that would only update the relevant metrics and leave the rest untouched. If you are interested in playing around with a metric (or a set of related metrics) it would make more sense to me to design a FAD for that purpose and use the dashboard as a starting point to drill down to the FAD from the metric on the dashboard. More sense than trying to force every dashboard to be a FAD that is.

I hope this post makes more sense to you than my first :-)


May 18, 2007
ltweedie said:

Michael

I entirely agree that dashboards/FADs showing predicitive models are a whole area that is very unexplored!

I did something very similar to this in my thesis in 1997 "The Influence Explorer" (a quick websearch will bring up the relevent papers) where we sampled a response surface model (Nelders generalised linear models to be precise) and then visualised it using interactive histograms, scatterplots and various other tools.

My experience in showing it to users was that it was immensely powerful as a tool to communicate a model. Suddenly analysts were able to make their models real. So that these models became real shared problem representations.

You could explore the relationship between inputs and outputs fluidly and easily.

I still haven't really seen this done in many other places and yet it is such a simple idea to put into practice. Has it?

Lisa


May 18, 2007
ltweedie said:

Zach/nixnut

Surely adding interactivity is about how much complexity you add to a tool.

I would say the important thing is not whether to add interactivity to a dashboard but whether it is justified in the context. So there will be situations in which allowing a user to interactively track back through time or drill down on a piece of data would be central to the dashboard design.

My hunch is that Responsive (dynamic) interaction enables a user to quickly compare many graphs in a way that is just not the same in static view. I guess we need some research to back this up - anyone know of any?

In an analysis situation we want to provide unlimited freedom. In a dashboard situation we want to provide key information quickly and clearly.

Thus I argue that in a dashboard design one should consider what key activities are going on and assess whether interactivity is appropriate. I believe there will be situations when it is very pertinent.

Lisa


June 13, 2007
Ted Cuzzillo said:

I like the idea of a “faceted analytics display.” I’m sure FADs are important. I just hate to see Stephen Few opt for this term because inept designers have spoiled “dashboard.”

Dashboard is a valuable metaphor and should be defended. I’m afraid FAD will be forgotten.

Isn’t a FAD just a dashboard with extra features? When they added tachometers to auto dashboards, did dashboards become something else? If you add new software or a new peripheral to your computer, isn’t it still a computer?

Perhaps we could think of it as a simulation dashboard--but still a dashboard. Calling it a simulation dashboard is still stronger than calling it a FAD.

Using “simulation” might force a modifier onto dashboards that don’t interact. How about “dumb”? Then we’d call the really bad dashboards just “dumber.”


June 14, 2007
Zach said:

Ted, I agree completely. FAD isn't likely to stick. Simulation (or perhaps interactive) dashboard is a good modifier.

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Analytics Roundup: Fixing Excel and dashboards

Ask E.T.: Cleaning up Excel's poshlust graphics
Discussion of the Chart Cleaner along with other approaches to make Excel look good.

The Universe of Discourse : Excessive precision
Humorous take on one of the ways excessive precision can creep into reports.

Data Visualization: Intelligent Dashboard Design
The third in a series of columns that feature the winners of DM Review's 2005 data visualization competition.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

0 comments | Add a comment

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment






ANDs, ORs, and IFs: Comparing big lists in Excel

One problem we face when manipulating large amounts of data in Excel is checking to see if two lists of the same length contain the same items. For instance, we might be given a list of products that a company has for sale this month, running to thousands of items, then the following month, we get another list of products for sale and we need to see if there has been any change between those two lists. This isn’t too hard to deal with when you only have a hundred or so items, but it gets a little thorny when your list runs to tens of thousands.

What we do is line the two lists up, side by side, in sorted order.

Two lists

Use the simple “A1=B1” formula to compare pairs of items in the lists.

Formula for comparing elements in the lists

If the pairs are the same, this will be true, otherwise they’ll be false.

Results of comparing elements in the lists

Copy this formula down for all your rows. Then use the AND function and give it the entire range of comparison formulas.

Checking comparisons

This will only be true if every single one of the values in your list are exactly matches. If even one comparison is false, this big AND statement will evaluate to false.

This is a quick and dirty approach. For tougher problems, we use a slightly more complicated formula in the comparison where we evaluate it to 1 if the value is true, 0 if the value is false. This gives us more flexibility to combine comparisons, but that’s a topic for another post.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

19 comments | Show all comments only the last 5 are shown


May 5, 2007
Robbin Steif said:

To the person who wrote, this is too easy a post for Juice: you probably have a strong command of the topic, but not everyone does. And like someone else wrote: it's nice to be reminded of tricks again.

To the person who complained that then you have an error somewhere and you can find it: all you need to do is sort all the rows by the column with the proof in it, so that you can see where the problem is. You can sort Ascending, since F comes before T (how's that for easy advice?)

And finally, you can use a similar trick to dedupe email addresses, or other lists where the info will be identical. I am sure you Juicers are all over this one. But since the data are on two lines, you have to do it like this =a1=b1 And copy all the way down. Then you copy the results and repaste them right where they were, but PASTE SPECIAL - Values. Then you sort on the IF column


May 7, 2007
mike harding said:

well in the world of statistical genetics, excel doesn't have enough room in one tab. it would be useful to know how to do this in R or SAS.


May 9, 2007
AW said:

Mike,
In SAS, you'd want to use a sort-sort-merge or some inner joins.
R is on my schedule to learn in the next 2 months; so I got nothing for you there...


June 25, 2007
Brenda said:

Looks like it works for text string comparison as well!


October 11, 2007
Will said:

I think the use of MATCH is a much better option. Using the data as you have displayed it, in cell D3, you would have =IF(ISNA(VLOOKUP(C4,B$3:B$xxxxx,2,FALSE)),1,0), and then in cell D4 on downwards you would have =IF(ISNA(VLOOKUP(C4,B$3:B$xxxxx,2,FALSE)),D3+1,D3) where xxxxxx is the final row in your set. In cell D2, you have =MAX(D3:Dxxxxx). This will then tell you how many new datapoints you have.

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment