Bubble, bubble toil and trouble

Recently we wanted to show how Concentrate, our new long-tail keyword tool, could give you a view of search patterns across travel websites.As political junkies, we were inspired by this chart from our friends at the NY Times.

NY Times candidate word bubble chart

The first tool we tried, simply on principle, was Excel 2003. As expected, making a NY Times quality bubble chart in Excel 2003 is a hard problem. Here’s a draft of how far I got before giving in to label fatigue.

Excel NY Times bubble

The bubbles themselves aren’t tough, but getting the labels right is hard. I’d love to see a solution, so if any reader wants to tackle it eternal fame can be yours. Here is a CSV if you want to try.


Another of the tools we use at Juice is NodeBox, which we used to make this:

Concentrate pattern comparison

Here’s the code that made the graph.

The power of a programmatic approach like this is that by changing a line or two, you can get the following. Click for a larger version. Click the text for the code..

Display the largest item in each row as a red square

With great power comes a great need to exercise restraint. Otherwise you end up like these poor chaps. Must... flex... restraint... muscles...

Target Long Tail Searches with Keyword Patterns

On Friday, we launched our new keyword tool: Concentrate. One of its key features is a scalable algorithm that automatically discovers patterns in large amounts of search data and clusters long tail queries into manageable groups. This post will explain how using Concentrate’s pattern discovery feature can simplify search data analysis and give you an edge on the competition. To explain how valuable Concentrate’s pattern discovery can be, we put together a case study of the travel sector using the Plus version of Concentrate and the type of competitive search data available from commercial providers like Hitwise or Compete. We will go into the details tomorrow, but here is a sneak peek at the results. This chart shows the share of travel searches by site in Spring 2006 and was generated using reports downloaded from Concentrate pattern discovery:

Travel Sector Searches: Comparing sites by pattern share

long tail query patterns from Concentrate

The Long Tail of Search

Search analytics starts by looking at the most frequent search queries driving traffic to your site or that of your competitors (these are often called the "head queries"). For most sites, these queries are a fraction of your total search traffic and just the tip of the iceberg in terms of insight about your audience. Queries like "cheap hotels in liverpool ny" may only occur once or twice in a given month, but when aggregated with other rare phrases can make up the bulk of your traffic.

The concept of the long tail in business intelligence has been a topic of debate over the last few years. One area where the long tail is alive and well is in search. The landscape of user search queries is dominated by the long tail, and most studies indicate that referrals from these long tail phrases are more likely to lead to purchases on your site. Natural search isn’t the only area where the long tail turns out to be critical. Paid search efforts which ignore the long tail are potentially missing out on a large chunk of revenue. The challenge of the long tail is that dealing with massive amounts of query data quickly becomes unmanageable.

Traditional Search Reports: head queries for some top travel sites
traditional search keyword reports

If you have hundreds of pages of unique queries to sort through manually, forming a actionable view of that data is a painful process. This is why most people only look at the first few pages of queries.

Categorizing Queries using Patterns

Finding frequent search patterns is the key to making search data understandable. Patterns let you to treat groups of long tail searches like popular individual queries.

Our concept of patterns is similar to an example described by Brian Brown in a recent SEOMoz post. Patterns are templates for searches that have a similar structure. For instance, the pattern “jobs in [x]" represents searches for jobs in some location. The “[x]" is a wildcard that can stand for one or more words. These “masked terms" are often variants of a similar concepts, like locations or celebrity names. Depending on the nature of your site, up to 80% of your long tail search traffic could be summarized using just the top 20 query patterns.

Concentrate Pattern Summary View for
Example of Concentrate search pattern view

The next iteration of Concentrate’s learning algorithms will replace many of these wildcards with named entity labels. For example: “hotels in [x]" will become “hotels in [City]". See our FAQ for more details on special pattern categories like navigational queries. Tomorrow, we’ll cover the travel case study in detail.

Introducing Concentrate for Long Tail Search Analytics

We are pleased thrilled to introduce Concentrate™, an innovative long-tail keyword tool. Concentrate is for SEO and paid search professionals who want to make sense of search keyword data and make the most of search investments.

Check out the demo here. Or try out the free version here (you’ll need admin access to a Google Analytics account).

We built Concentrate because we saw a fundamental conflict in the world of search analysis: On the one hand, search keyword data is terrifically interesting and valuable. It can tell you what your visitors and customers want and how they think about you and your products.

Juice Analytics keywords

Unfortunately, search query data is also big, messy, and hard to get your hands around. In a typical month, the Juice site gets over 10,000 visits from over 7,000 unique keywords.

Even if I could somehow wrap my head around our top 100 keywords, I’d only understand 25% of the visits. For people spending money on search engine optimization or paid search campaigns, that’s a big blind-spot to accept.

We want you to understand and act on all your search data. Concentrate ingests data from sources that most sites already have available (e.g Google Analytics, Omniture, Coremetrics, Hitwise, Compete, etc.), enhances this data by finding common patterns and query types, and visualizes search phrases for exploration and analysis.

Over the next couple of weeks, we will share examples of some of the interesting things you can do with Concentrate, including:

Pattern identification to condense the long tail into keyword phrases with similar structures. For example, here are some common search patterns from a cooking web site (the “[x]” represents a wildcard).


Keyword visualization to show the connections between keywords and the relative performance of phrases. This wordtree shows the frequency of words within phrases (size) and average time spent on site (color).


Congratulations to Chris, Pete, and Sal for all their hard work, diligence, and creative problem solving to launch this solution.