1. Skip to navigation
  2. Skip to content
  3. Skip to sidebar

Our Blog

[Insert witty opening here].

You see? In principle, when writing a blog post, I know it draws you (the reader) in to continue reading by starting with a story or something smart or a joke. Don’t overwhelm people right from the get-go. Start with metaphor or phrase that relates to the article.

That introduction relates to what I’m really interested in talking about: principles. We’re launching an exciting new resource today, and it has to do with principles, design principles that is. These resource will remind you to do things like use gradients appropriately or provide instruction. Their goal is to direct your design towards information presentation that focuses on the human element.

Engineers start with technology. MBAs start with funding. Designers start with people. The trick is to get interdisciplinary teams to raise their collective I.Q. by working in the overlap of those three areas. That’s where innovation flourishes.Moggridge

At Juice, we start with people and great consideration for that overlap. Therefore, we’re not only about what information to show but also how to show it. And behind those two basic ideas is an awful lot of thinking > developing > learning > and iteration. Through that process we’ve gathered a (rather long) list of principles that inform our decisions, and we hope it can help you with yours too. Rather than trying to be sure your application supports all principles, approach it more like a stack of flash cards and pull out the relevant ones. With experience, you’ll realize you’re doing these things naturally and understanding the drivers of design thinking is invaluable to introduce objectivity into application design.

There are two parts to this:

  1. View the list and explore the content on our Design Principles page.
  2. Engage in discussion on our Quora Design Principles Board.

This list will likely grow and shrink over time through the refinement process. The descriptions of each principle definitely will. Our goal is not to be exhaustive, but helpful.

There is a slight catch. So far, we’ve only fully a few of the many principles, which means we have a long way to go. We’re going to embrace process on this one with what might appear to be a (very intentional) turtle’s pace. Still, we’ve made the titles as concretely informative as we could before filling out all their content. Feel free to to run (err walk) right along side us or check in every now and then to evaluate your projects against the list. If you find these helpful or would like to share your experience or opinion on any of them we invite you to engage in the discussion, vote up and down the principles you find more or less useful. Share your insights why. Let us know which one you’d like to see next. Keep us honest, and the visualization community successful. Happy designing.

Topics:





Twitter’s wild popularity hasn’t obscured the fact that the service needs to eventually make money. The concept of “Twitter analytics” as a revenue stream has come up often enough to make my ears itch and my nose burn.

Twitter’s new business development lead explains that the company is “developing a range of analytics and metrics products and services built around the information contained in tweets”…and “trying to figure out what are the appropriate metrics around engagement and how to convey those.”

Web Strategist Jeremiah Owyang raises the concept of a Twitter CRM solution, in which Twitter would offer their own analytics system to brands, that will help them to track and manage the conversations.

The Twitter ecosystem has responded with a wide range of tools for analysis of Twitter data. Web analytics behemoth Omniture recently announced the integration of Twitter data into their platform. At the same time, web analytics consultant Eric T. Peterson has been vigorously marketing Twitalyzer, a tool to evaluate individuals’ use of Twitter and metrics of influence. Google’s Chrome Experiments released a cool visualization tool called Social Collider that reveals cross-connections between conversations on Twitter. Here are a few more Twitter analytics tools that I’ve run across:

Despite all the activity, I haven’t yet seen a solution that offers the kind of valuable analytics that a company could use to understand the Twitter conversation relevant to their business. The applications above are either focused on the measurement of individual Twitter users or offer a high-level tracking of words and phases in the general conversation. They treat tweets as transactions — How many? How valuable? Who’s listening? Who’s responding?

To me, the great and more rewarding challenge in Twitter analytics is to synthesize the substance of those conversations. Imagine if you went to a party and could overhear everything that everyone else was saying. Who talked the most and who had the greatest audience is less interesting than what topics people were discussing and what was said.

I wanted to take a shot at this type of Twitter analytics.


Analysis Approach

First I had to define a particular domain or topic area. For expediency, I focused on all the tweets that included the word “analytics.” Using the Twitter search API, I pulled the first 500 tweets for each day in March and parsed the results to pull out users, urls, and other characteristics of the tweets.

To analyze the words and phrases being used, I uploaded the resulting 11,300 tweets into Concentrate, our search analytics tool. Concentrate is optimized for search query text (i.e. short phrases without a lot of punctuation). Nevertheless, it has a number of features that make text analysis easier, including breaking out the most common words, phrases and patterns. It also allows for filtering by words to create frequency statistics.

There were two main questions I wanted to address:

  1. What topics are people discussing?
  2. What is the structure of the conversation?

Topics of Conversation

The content of the Twitter conversation can be analyzed as words, sites/links, people/groups, and company/products.

Words

I used Concentrate to find the most common words, then I dumped those words into Many Eyes to create this “Wordle-brand” word cloud. Many Eyes has a nice feature that takes out the “common English words.” Clearly Google dominates the conversation, and I even had to artificially reduced the value to make the other words legible.

Word cloud

Below are the top 10 (non-common) words that show up in the analytics conversation

Top words

Twitter has become a mechanism for sharing interesting links (I’ll get to data on that in a bit). Looking at the most popular sites and specific links gives a sense for what people in this community are reading and talking about.

Top sites and links

People and Groups

Twitter users have a few conventions for connecting tweets to people or groups:

  • ”#” (i.e. hashtag) associates the message with associated with a group, topic or event.
  • “RT” (or “via”) is to repeat or “retweet” something someone else has said.
  • ”@” associates a tweet with another user, whether retweeting their message or directing a comment to them.

Here are the most common groups and people referenced in the Twitter data.

Top people and groups

And the people with the most tweets using the word “analytics”

Top talkers

Companies and Products

I was also interested in what companies or products were referred to most frequently. It is no surprise that Google dominates the conversation. Microsoft gets on the board with the recently closing of their adCenter product. I think we can safely assume they won’t be showing up that often in the future.

Top companies


Conversation Structure

Beyond the specific content of the conversation, I was also curious about how people who are talking about analytics tend to use Twitter.

Types of Tweets

Eric T. Peterson has four things he considers “signal” (versus “noise”) in the Twitter conversation:

  • References to other people (defined by the use of “@” followed by text)
  • Links to URLs you can visit (defined by the use of “http://” followed by text)
  • Hashtags you can explore and participate with (defined by the use of “#” followed by text)
  • Retweets of other people, passing along information (defined by the use of “rt”, “r/t/”, “retweet” or “via”)

While I’m not fond of this definition, examining these different types of tweets (along with question-based tweets) provides a good lens into the nature of the conversation. The following chart shows the percentage of tweets that fall into each of those categories.

Tweet Types

It would be all the more interesting if you could follow the types of tweets across time and compare against other topic areas. I suspect that the URL linking within Twitter is on the rise and is turning Twitter into a Delicious-style bookmark sharing service — without the functionality to save, tag, annotate, and view the bookmarks at your leisure.

Given all the sharing of links, I wanted to get a clearer picture of what happens when a link becomes popular. The graphic below shows some of the top links during the month and the amount they showed up in tweets by day. The red bars represent days where ten or more tweets included the link. A couple links demonstrated popularity over a week or so, but the rest sizzled then disappeared in a day or two.

Link Evolution

Activity Distribution

Finally, I took a look at the distribution of users by the number of tweets including the word “analytics.” It was no surprise that the vast majority of the 7,700 twitterers only used the word once in March (of course this doesn’t tell us about their other twittering activity). Obviously there is a small population of people at the core of the discussion.

Activity Distribution


While you’d have to go into more depth to answer detailed questions, there are a number of interesting take-aways for me, including:

  • “Analytics” means “web analytics”, not business intelligence or general reporting about sales, operations, or marketing.
  • Google Analytics is the star of the party. Of course, the fact that the brand name includes “analytics” is an advantage, but I didn’t see a giant “Juice” in the word cloud.
  • Twitter is an echo-chamber. The content clusters around particular subjects, with people retweeting and sharing links about the big news of the day. There are a dozen or so stories that dominated the conversation over this time period.

What’s next?

There are a lot more views of this data that could be enlightening for a company interested having a real-time understanding of their marketplace. For example, it would be interesting to provide more insight into:

  • Who is at the center of these conversations?
  • What is the positive or negative tone of the discussion (Twitter actually offers this information as part of their API)?
  • How has is the conversation changing over time?
  • What is the best way to define the boundaries of a domain-specific conversation?

These are the types of questions that I’d like to see addressed in a more complete Twitter analytics tool.




Topics:
, ,



Recently we wanted to show how Concentrate, our new long-tail keyword tool, could give you a view of search patterns across travel websites.
As political junkies, we were inspired by this chart from our friends at the NY Times.


NY Times candidate word bubble chart

The first tool we tried, simply on principle, was Excel 2003. As expected, making a NY Times quality bubble chart in Excel 2003 is a hard problem. Here’s a draft of how far I got before giving in to label fatigue.

Excel NY Times bubble

The bubbles themselves aren’t tough, but getting the labels right is hard. I’d love to see a solution, so if any reader wants to tackle it eternal fame can be yours. Here is a CSV if you want to try.

travelpatterns.csv

Another of the tools we use at Juice is NodeBox, which we used to make this:

Concentrate pattern comparison

Here’s the code that made the graph.

The power of a programmatic approach like this is that by changing a line or two, you can get the following. Click for a larger version. Click the text for the code..

With great power comes a great need to exercise restraint. Otherwise you end up like these poor chaps. Must… flex… restraint… muscles…

Topics:
, , , ,



This is a follow up to “Target Long Tail Searches with Keyword Patterns”

To get a sense of the scale of the long tail in search, Dustin Woodard recently put together an analysis of U.S. search data collected by Hitwise over a 3 month period, during which they measured 14 million different search terms. How did these break down?

  • Top 100 terms: 5.7% of the all search traffic
  • Top 500 terms: 8.9% of the all search traffic
  • Top 1,000 terms: 10.6% of the all search traffic
  • Top 10,000 terms: 18.5% of the all search traffic

This means if you had a monopoly over the top 1,000 search terms across all search engines (which is impossible), you’d still be missing out on 89.4% of all search traffic. There’s so much traffic in the tail it is hard to even comprehend. To illustrate, if search were represented by a tiny lizard with a one-inch head, the tail of that lizard would stretch for 221 miles.

Yesterday, we described the concept of search patterns and how you can use them to summarize this type of long tail text data. Today, we will walk through a case study we put together to explain how Concentrate’s pattern discovery feature will help you find new competitive insights.

You can replicate this study yourself by signing up for the Plus version of Concentrate and loading competitive search data from providers like Hitwise, Compete, Keyword Discovery, or comScore. The input search data used in our analysis consisted of a sample of unique queries leading to clicks on top travel domains during Spring 2006, along with their frequency of occurrence (the chart is truncated after the 20th query):

Raw search data: most frequent queries by site

unique search queries for travel sites

We loaded the full dataset of queries into Concentrate to generate summary patterns for each of 5 top travel sites. After each file of unique queries and associated metrics is loaded, the application generates reports which include summary statistics based on the head (top 50) and tail queries for each site. This is a good way to start looking at the data if we want to get a sense of each site’s long tail search strategy:

Head vs. tail queries for top travel sites

head vs tail for travel searches

It appears that the long tail makes up the overwhelming majority of traffic for the travel planning and review sites, but is a much smaller percentage for transaction focused sites like Expedia and Travelocity. Measuring the size of the head and tail gives us a rough idea what is going on, but we need to dig deeper if we want to benchmark where we stand in various categories and produce actionable insights. Inspired by a recent New York Times infographic “Words They Used”, our data visualization guru, Chris Gemignani, downloaded the Pattern CSV file that Concentrate generated for each of these sites and created the following view of competition in the travel search sector:

Comparing travel searches by pattern

long tail query patterns from Concentrate

This chart compares the proportion of searches that go to each travel site for the top 25 patterns in the travel sector. The site getting the most traffic for each pattern is highlighted. Only searches that wound up at one of these five travel sites are considered.

The difference in search pattern profiles for these sites is striking. Tripadvisor leads the pack in the long tail, which makes sense given the huge amount of long tail user generated content on the site. TripAdvisor owns most of the pattern categories, but Yahoo Travel and Hotel-Guides take the lead in niche areas like maps and hotels. Traffic to Expedia and Travelocity is largely composed of navigational and branded queries (not shown). The only long tail patterns they have significant share for are “[x] ticket”, and “cheap [x]“.

The input data we used reflects referrals to these sites from a sample population of users who clicked on search engine result pages. Factors which will affect the number and type of search referrals a site received in this data include: how representative the sample is of the population of U.S. searchers as a whole, how much relevant content a site has for a given query pattern, and how well that content ranks in google and other search engines.

If a travel website repeated this study with Concentrate using current competitive data, then uploaded additional search data for their own site including other metrics beyond search frequency (see our demo using Google Analytics), the results might reveal that “things to do in [x]” queries lead to high quality visits and their site has a chance at winning more searches for that pattern. Based on this information they might decide to make a move on TripAdvisor in that content category. Mark Jackson describes some strategies to apply within the travel sector in an article at Search Engine Watch:
Should Your SEO Strategy Target the Head or the Long Tail?. Using Concentrate, a travel website could streamline the process by downloading thousands of real queries for this pattern sent to their competitor:

Some queries in TripAdvisor pattern: “things to do in [x]“

long tail travel search pattern

Take Action: Some ideas for next steps

Topics:
, , , , , , , ,



On Friday, we launched our new keyword tool: Concentrate. One of its key features is a scalable algorithm that automatically discovers patterns in large amounts of search data and clusters long tail queries into manageable groups. This post will explain how using Concentrate’s pattern discovery feature can simplify search data analysis and give you an edge on the competition.

To explain how valuable Concentrate’s pattern discovery can be, we put together a case study of the travel sector using the Plus version of Concentrate and the type of competitive search data available from commercial providers like Hitwise or Compete. We will go into the details tomorrow, but here is a sneak peek at the results. This chart shows the share of travel searches by site in Spring 2006 and was generated using reports downloaded from Concentrate pattern discovery:

Travel Sector Searches: Comparing sites by pattern share

long tail query patterns from Concentrate

The Long Tail of Search

Search analytics starts by looking at the most frequent search queries driving traffic to your site or that of your competitors (these are often called the “head queries”). For most sites, these queries are a fraction of your total search traffic and just the tip of the iceberg in terms of insight about your audience. Queries like “cheap hotels in liverpool ny” may only occur once or twice in a given month, but when aggregated with other rare phrases can make up the bulk of your traffic.

The concept of the long tail in business intelligence has been a topic of debate over the last few years. One area where the long tail is alive and well is in search. The landscape of user search queries is dominated by the long tail, and most studies indicate that referrals from these long tail phrases are more likely to lead to purchases on your site. Natural search isn’t the only area where the long tail turns out to be critical. Paid search efforts which ignore the long tail are potentially missing out on a large chunk of revenue. The challenge of the long tail is that dealing with massive amounts of query data quickly becomes unmanageable.

Traditional Search Reports: head queries for some top travel sites

traditional search keyword reports

If you have hundreds of pages of unique queries to sort through manually, forming a actionable view of that data is a painful process. This is why most people only look at the first few pages of queries.

Categorizing Queries using Patterns

Finding frequent search patterns is the key to making search data understandable. Patterns let you to treat groups of long tail searches like popular individual queries.

Our concept of patterns is similar to an example described by Brian Brown in a recent SEOMoz post. Patterns are templates for searches that have a similar structure. For instance, the pattern “jobs in [x]” represents searches for jobs in some location. The “[x]” is a wildcard that can stand for one or more words. These “masked terms” are often variants of a similar concepts, like locations or celebrity names. Depending on the nature of your site, up to 80% of your long tail search traffic could be summarized using just the top 20 query patterns.

Concentrate Pattern Summary View for TripAdvisor.com

Example of Concentrate search pattern view

The next iteration of Concentrate’s learning algorithms will replace many of these wildcards with named entity labels. For example: “hotels in [x]” will become “hotels in [City]“. See our FAQ for more details on special pattern categories like navigational queries. Tomorrow, we’ll cover the travel case study in detail.

Topics:
, , , , , , , ,



Here at top-secret Juice headquarters, some major new products are in the works, and we want to promote them with Google’s revenue powerhouse (also known as Google AdWords). Thus, after three weeks of self-imposed AdWords boot camp, I have emerged with a few scrapes and burns, along with some tips that I wish I had been armed with since the beginning.

The natural place to start learning about Google AdWords is the official Help Center, an expansive and neatly categorized resource. But what happens if your inhuman schedule or dwindling coffee supplies don’t allow you the luxury of navigating through the help center hierarchy or sifting through its search results? While you might be able to maintain a semblance of a campaign without answering those lingering questions, you run a high risk of letting potential viewers slip away, never seeing your ad, and wasting money on high CPCs (cost-per-click).

You are hereby invited to learn from my mistakes. I am forgoing the usual basic topics in favor of questions whose answers are more time-consuming and tedious to find. It took me a few weeks to get comfortable with AdWords and figure out these answers myself, but it will only take you a few minutes!

Read on to learn the answers to:

  1. How creative should I be with my ad text?
  2. How do I find out what keywords my competitors are using?
  3. Why has Google’s heartless algorithm condemned my keyword as inactive?
  4. How do I get bolded words in my ad?
  5. What is dynamic keyword insertion, and how do I use it?
  6. What is the difference between a campaign and an ad group?
  7. What is the difference between keywords and placements?

1. How creative should I be with my ad text?

When I was but an AdWords newbie, I held the misconception that creative ads were all that I needed to pull in clicks. Pop psychologists might credit my right brain, starved for attention in the left brain’s home turf (programming! algorithms! programming these algorithms!), for seizing upon the opportunity to design some artistic and imaginative ad copy:


The “Viva la Revolucion” ad was my baby. But it turned out to have a face only a mother could love, as evidenced by the zero people who clicked on it. To the stunned disappointment of my right brain, Google AdWords is just as algorithm-fueled as any of Google’s other products. In fact, Google AdWords runs much like the ubiquitous search engine does, treating your keywords, ads, and landing page similar to the way it treats the 1 trillion pages it crawls while looking for content.

2. How do I find out what keywords my competitors are using?

Google won’t tell you—it’s in their privacy policy. But services such as KeywordSpy will. KeywordSpy not only gives you lists of your competitors’ (and your potential) keywords, but provides data for each keyword about other metrics, including as ROI, price per click, and number of competitors.

3. Why has Google’s heartless algorithm condemned my keyword as inactive?

Sometimes, Google will refuse to show ads for certain keywords unless you pay an absurdly large CPC. The large CPC is meant to discourage you from following any of these bad habits:

  • You dumped a lot of unrelated (or weakly related) keywords into one gigantic ad group.
  • Try making many smaller ad groups, each with its own tightly-connected set of keywords. Ideally, every keyword in a given ad group is a synonym for all the other keywords in the ad group. This also helps tremendously with writing ads that use dynamic keyword insertion (see question #5), since forcing ads to accommodate keywords covering a wide range of topics and/or parts of speech makes the ads vague and unspecific. To find keywords that deserve synonym status, use Google Sets. It’s like a thesaurus on steroids.

  • Your keyword, ads, and landing page aren’t “relevant” enough to each other.
  • All members of the Holy Trinity of content (keywords, ads, and landing page) need to draw from the same words to be considered related. Try making sure that they line up.

  • The cost per click you set for that keyword falls below the minimum.
  • This is the nicer way of saying that you have to spend more money.

4. How do I get bolded words in my ad?

You can’t designate specific words to be bolded (or formatted in any way, for that matter). You can, however, make sure to include keywords (words the user types in that you have selected for your ads) in your ad title and/or body. Just as it bolds keywords in search results, Google bolds keywords in ads. Your keywords do not have to be exact matches with the words in your ad. In the example below, a search for the keyword phrase “report automation” produces an ad that not only bolds “report” and “automation,” but also their variants “reports” and “automating.”

5. What is dynamic keyword insertion, and how do I use it?

This technique (sometimes known as “wildcards”) is how eBay and Target can pull off “Buy _____ now” for every conceivable adjective-noun combination. It allows you to make the same ad apply to multiple keywords. The format is:

The word immediately following the colon (no spaces) indicates the word you want to be shown when the keyword is too long to fit in the ad. Since I chose that word to be “executive dashboards,” the ad prompted by a too-long keyword would look like this:

Here is the same ad with other keywords swapped in, thanks to dynamic keyword insertion:



You can tweak the capitalization of the keyword with Google’s guidance, in the form of this handy table and more.

6. What is the difference between a campaign and an ad group?

A campaign is made up of one or more ad groups. Each campaign has one budget (i.e., $10/day) that is shared between all of its ad groups. Each ad group can be customized with different ad variations, keywords, placements, days and times the ad is shown, etc. Therefore, most modifying and experimenting happens on the ad group level.

7. What is the difference between keywords and placements?

Keywords produce what people usually think of when they think of Google AdWords. When a user performs a Google search for a keyword you have selected, your ad appears on the side (or top, if your budget is very generous) of the results page. Placements occur in the “content network,” which is made of individual sites that get paid to show Google ads. If you sign up for a lot of placements, you’ll get a lot of clicks—but only because of the sheer volume of people seeing your ad. In some ways, placements are less targeted than keywords because people who clicked on your ad in the content network aren’t actively searching, as they are when they find your ad through natural searches. There are two types of placements:

  1. Placements You Select
  2. Google’s Placement Tool allows you to browse a gigantic list of sites organized by topic. Any of these sites could have your ad on it. The Placement Tool will also suggest sites and break down your potential audience by demographic.

  3. Placements Google Selects
  4. Google will select sites in the content network based on information from your current campaign. These sites may make up the bulk of your impressions and clicks on the content network and in general (in other words, clicks from the Google’s selected placements may outnumber both clicks from your selected placements and clicks from organic searches).

This list is by no means a comprehensive examination of AdWords, but at least now you can consider yourself three weeks wiser and three weeks closer to writing one that is.

Topics:
,



Last week, we shared a rendition of a Tufte graphic using just a few lines of Nodebox code. As our commenters pointed out, Python is great, but it may not be every business analyst’s carnal desire to learn a programming language just to generate some nifty graphs. I spent some time to push Chris’s Nodebox rendition into a PIL-based Windows tool that can generate the same sort of comparison graph from an Excel file on the fly.

The result is The Comparison Chart Generator 1.0. The installation instructions are relatively simple. Unzip the zip file, and run comparisionchartgenerator.exe.

Alternatively, we have a new excel chart that creates the same effect using only excel functionality. Download the Excel Tufte Line Chart here.

If you are using the Chart Generator, start with some data in an Excel (xls) or Comma Delimited (csv) format. The data for this graph has to be contained within the first sheet starting with cell A1, as in the following picture.

Excel Dialog

Select an input file. There are a couple example files bundled with the download.

Open File Dialog

After selecting a file, you’ll be prompted to modify a few of the basic options available for the chart.

Options Dialog

Finally, save the result as a jpeg.

Save File Dialog

Here is the same image found in Tufte’s textbook processed using the Comparison Chart Generator. It is generated using the csv example file bundled with the download.

Tufte-esque Chart by Comparison Chart Generator

Those of us who have undergone lasik eye-improvement surgery may still prefer the sharp crisp Nodebox results, but for the rest of us, this image looks pretty good. Let us know if this tool is useful. If there is enough of a positive response, we may consider expanding functionality for other fancy Tufte-esque charts.

If you do prefer Nodebox, I have an updated script here. This pushes the script up to 20 lines of code or so, but the extra 9 lines allow the labels to push themselves apart on their own. If you want to look at the source code for the Windows program, you can get it here. I used py2exe to compile it into an executable. The code, however, has not been thoroughly commented or cleaned as of yet, so edit it at your own risk.

Topics:
, , , ,



Have you run into this problem: you have a list of phone numbers and associated values which would be best shown geographically to see patterns, but there isn’t a clear way to put the data on a map. Maybe you’d like to see a map of customer service calls by call duration or inbound sales by average order size.

I wanted to share how to MacGyver a solution with a piece of twine, bubble gum, Excel, and a free online map tool. To me, this is a nice testament to the simple but powerful data visualizations that can be accomplished without programming skills or expensive applications.

1. Pull out area codes

First I pulled the area codes from my list of phone numbers using the formula below. This simply checks if the phone number starts with 1, then grabs the appropriate three digits for the area code.

=VALUE(IF(LEFT(E7,1)=”1″,MID(E7,2,3),MID(E7,1,3)))

2. Convert area codes into states

For my purposes, mapping the phone numbers by state was sufficient. Ideally, we would map the phone numbers to precise latitude and longitude coordinates by doing a reverse lookup of addresses then using the Excel geocoding tool.

First I needed a lookup table that could link my list of area codes to states. I wasn’t able to track down a good data table, so I grabbed the data from All Area Codes and cleaned it up. Here is a lookup table of area codes by state.

An aside: I have a pet peeve with people who sell data that feels like it should be publicly available. You’ll run across these businesses when looking for basic information about ZIP codes, MSAs, or area codes. Here is an example of one of these parasitic businesses.

Zip code product

3. Create your summary data set

I used a pivot table to summarize metrics by state.

4. Create colorized map of the US

Our friend Ducky Sherwood has generously put together a online tool called Mapeteria that will generate a colorized overlay of US states. In Ducky’s words: “Want to make a choropleth thematic map (i.e. coloured based on your data) for Canadian provinces, U.S. states, or French départements?” This overlay can be viewed in either Google Maps or Google Earth.

Here’s where it gets a little tricky. You will need to provide Mapeteria with a URL to a properly structured CSV file. Posting a CSV file to a web server isn’t trivial if you aren’t running your own web site. I found one free service called FileDEN that did the job (other suggestions?). Beware all the advertising—and in all likelihood they immediately sold my e-mail address at registration. Nevertheless, you can upload a file here and it will give you a URL which can be used to create your map.

Here’s an example of the results:

State Map

Topics:
, , , ,



This New York Times cancer graph is a beautiful piece of work.

NY Times cancer graphic

I wanted to see if we could reproduce it with everyday tools.

Excel reproduction of the NY Times cancer graphic

Click here to watch a screencast showing how it was done. Warning the screencast is a little long—14 minutes—and a little unpolished. One cut, no retakes, banzai analytics!

Derek raised an interesting question about how to find the fonts used by the New York Times. While I don’t think you can find a high quality free version of these fonts (Helvetica Neue, Univers?), Microsoft has made some very good new fonts for Vista and these are also available to Microsoft Office users through a compatibility pack. Here’s a link or google for “microsoft office compatibility pack“. I recommend using these fonts.

Here’s a version of the graph with these new fonts and more emphasis on getting the typography right.

Excel reproduction of the NY Times cancer graphic with better fonts

Topics:
, , , ,



Topics:
, , , , ,



Page 1 of 212