Wordtree for Visual Text Exploration

Analytics can be all about having the right tool for the job. When your data is text, traditional analysis tools (e.g. Excel, OLAP tools) are like peeling a mango with a chainsaw.

There are a number of visual exploration tools specifically designed for text data, including:

  • Word clouds like Wordle (fun but superficial);
  • Network diagrams like Visual Thesaurus (good for individual words, not text);
  • Trend graphs like Baby Name Voyager or Google Trends;
  • Granular presentations for interacting and exploring individual phrases, e.g. We Feel Fine and Twistori
  • "Word trees" that let you navigate through lines of text to understand the most frequent words, relationships between words, and common phase and sentence structures.

It is quite difficult to find a Word Tree in the wild. The brilliant team at IBM's Many Eyes were the first to make Word Tree's generally available. The same ManyEyes team have also created an alternative approach for visual text exploration with a tool called Phrase Net.

Phrase Net

Recently, we built a slightly different take on the Word Tree in Concentrate, our tool which allows users to explore huge search query lists to see how people use search keywords. For geeky entertainment, we created a special Concentrate demo account with the lyrics of songs from Rolling Stone's 500 Greatest Songs of All Time. Click here to sign-in to the demo (Press submit and then choose WordTree at the top).

Here's how our Word Tree works:

  • The box at the center is your starting point. When you open a Word Tree, it will contain the most common word in the text data. You can edit this box to "re-center" the wordtree (name that tune):

Wordtree image

  • Stretched out on either side are words and phrases that are tied to that center word. The size of the words represents their relative frequency.

Wordtree image

  • Rolling over the words/phrases will highlight the connections to your center word and on the other side. You'll also see a pop-up box with examples of the phrases containing selected words.

Wordtree image

  • You can open or close branches by clicking on a word. Words with hidden branches are highlighted in orange. We also have an ability colorize the words based on a metric in your text data.

While these more advanced visualizations are a start, I suspect there is a lot of room for other tools and techniques to visually explore text data. I'd be curious to hear about other tools you've seen along these lines.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

4 comments


May 26, 2009
Jen said:

I had alot of fun playing with the masters.org word trees this year: http://www.masters.com/en_US/visualization/index.html It did feel very linear though ... seems like there needed to be more ways to explore. OTOH, since I truly was just playing without an objective in mind, it's tough to say what I would "need".


May 26, 2009
tim said:

You may already be aware of htese, but just in case:

www.notcot.com/archives/2008/04/stefanie_posave.php

neoformix.com/2008/StephaniePosavec.html

neoformix has lots of other text analyses scattered through the archives.


June 4, 2009
Aseem said:

One of the things that would be cool is to be able to color code the words in terms of an event (orders, getting to page x, email capture, etc) that way you could look to see/create new high conversion phrases - of course u could end up with really dumb combinations but it would be interesting.


June 4, 2009
Zach said:

Aseem, our wordtree actually does that -- it just requires that you have that additional metric for each word/phrase/sentence as you suggest.

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Gartner Identifies the "Last Mile" of BI

Over the past few years we've made the point that today's BI vendors stop short of joining data to decision makers at the point of decision and action. We like to call this problem the "last mile". As it turns out, Gartner does, too.

According to a recent article, Gartner analyst Kurt Schlegel states in the report "Overcoming the Gap Between Business Intelligence and Decision Support" that most companies still aren't able to link BI to "the last mile" of making decisions that actually help their businesses.

Gartner joins a short list of other prominent voices (Tableau, SAS) in the BI community that have already come on board with Juice on this concept. We're very glad to see others addressing the gap of making information really and truly useful for decision makers.

While we're at it, that's not the only theme that has seeped into the Gartner perspective: Gartner's global BI manager Ian Bertram says the fundamental problem with BI isn't about technology, it has to do with making BI work better for people. In other words, "BI isn't a technical problem, it's a social one"

So Gartner Folks, if you're out there and following our blog, we're excited to see you coming along side with us. And as long as you're listening, here's a few other ideas we'd love to see you consider as well:

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

2 comments


May 13, 2009
Brian Timoney said:

I happily thieved the "last mile" meme for a Geospatial conference last year, so add me to the list of shameless plagiarizers.

I've since simplified the concept and now bandy about "Screen Captures in the Boardroom" to illustrate the disconnect.

Brian


May 14, 2009
James Taylor said:

I think the last mile also needs to consider the fact that the person on the end of the chain may be a decision-deliverer not a decision-maker so the system may need to embed the decision-making not just deliver the information. I like the phrase Action Support system, in contrast to Decision Support, in this regard.
Check out http://jtonedm.com/2009/04/13/from-decision-support-to-action-support/

JT

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Breaking Free of the One-Page Dashboard Rule

Conventional wisdom says that an executive dashboard must fit on a single page or screen. The argument hinges on a pair of assertions about this constraint: it provides necessary discipline to focus on only the most critical information; and it enables the audience to see results "at a glance."

The "discipline" argument is made forcefully by Avinash Kaushik (among others).

"if your dashboard does not fit on one page, you have a report, not a dashboard...This rule is important because it encourages rigorous thought to be applied in selecting the golden dashboard metric."

I buy wholeheartedly into the value of constraints. However, defining a useful constraint as a "rule" assumes there is only one viable means to achieve the desired ends. Confining visual real estate is but one way to focus your thinking. There are others: How about limiting yourself to five key measures? How about demanding that a dashboard can be understood in 3 minutes by a new user? How about only presenting exceptions?

The argument that a one-page dashboard necessarily provides an view of your business "at a glance" is more self-deceiving. Well-known information-ista Stephen Few uses this rationale in his definition of a dashboard:

A visual display of the most important information needed to achieve one or more objectives; consolidated and arranged on a single screen so the information can be monitored at a glance. PDF

I check my speedometer "at a glance". I "glance" at a Heads-up Display (HUD) on a video game showing how much energy my character has remaining. These displays communicate but a single number that is already hovering on the corner of my consciousness. If we follow this advice literally, we'd show:

Acme Widgets Dashboard

Assuming one page gives you quick, easy comprehension is like assuming all red cars are fast. That's simply not true. It must be duly noted, however, that all red cars are cool.

Stretch Trabant image courtesy jetow@flickr.com

More often, people follow the one-page dashboard rule off a cliff like these folks.

dashboard

There are real problems with this definition:

Dashboard definition

  • In reality, the one-page rule leads to jamming information into the available space.
  • When everything must fit on a page, there isn't room to describe the connections between information or fashion a story from the data.
  • A good dashboard raises more questions than it can answer. Sticking to a static piece of paper limits any ability to find or present explanations.

Don't get me wrong: A one-page dashboard is often an effective way to create "a visual display of the most important information needed to achieve one or more objectives." But with streaming video, interactive visualizations, podcasts, Kindles, smart phones, video projectors...is it really necessary to limit ourselves to 8.5" x 11" piece of paper. Or might we open ourselves up to some more creative solutions to sharing the numbers; a short movie, a few slides, a short text narrative, or 140 characters.

I'd like to use this definition instead and will be back soon with some ideas on how to make your dashboards clear and concise.

Dashboard definition

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

10 comments | Show all comments only the last 5 are shown


May 11, 2009
Jose said:

I suggest that "dashboards" as defined by Few is only one of many views necessary to tell a story: it answers the 1) "What" - as in "Status, Trend/Change or Contribution ". we also need 2) support evidence "Who, When and Where" (Details) and Analysis 3) "Why and How".

In general, even with just a few metrics, I find that explaining just "what" is challenging enough in one page. Think of the financial page of a newspaper: it demands a combination of graphs, tabular info and narrative text. So when creating an information display, I try to organize different views according to their purpose - and make them display/print in one page.

The opportunity that computers give us over paper is the ability to link all different views with common filters - so that the user is able to iterate formulating and answering questions in the display (cycling through Schneiderman's information seeking mantra as many times as needed).


June 23, 2009
Chris Curran said:

Good points Zach, especially the one you make in the comments regarding understanding audience for a dashboard. In my experience, senior business leaders don't have the time or attention span for a desktop-based UI dashboard. So paper and/or blackberry/mobile must be considered, at least for the "overview first" level of information.

More on my blog at http://www.ciodashboard.com/


July 8, 2009
Stephen Few said:

Zach,

Your definition differs from mine because we seem to be talking about different things entirely. I define a dashboard as an information display that is used to "monitor" what's going on. You are referring to a display that is used for data analysis or telling a story (two very different forms of data presentation which can't be displayed in the same manner).

A display that's used for regularly monitoring what's going on in an effort to maintain situation awareness requires a much different design than one that's used for data analysis or storytelling. When you're monitoring information for situation awareness, you must see the pieces on a single screen or page to make all the necessary connections and comparisons that are needed to build the big picture in your head of what's going on.

If we want to cut through the confusion that exists regarding proper information display, we must be careful to define our terms carefully and declare our purposes clearly. Multiple pages or screens often work well for telling a story, which much be delivered one piece at a time in the proper sequence. Multiple screens can also work well for analysis as your focus changes from the pursuit of one question to another. Multiple screens do not work well when you need to make comparisons and connections, however, because if the things that must be connected aren't in front of your eyes at the same time, you're forced to rely on working memory, which is extremely limited. In other words, the restriction to a single screen in this case is not arbitrary, but based on scientific evidence of what's actually required to do the job.


July 17, 2009
jeffrey weir said:

I see there's a good thread that covers some of this at http://www.perceptualedge.com/discussion.htm under New Topic Proposals/Nomenclature for visualization, dashboards, analytic tools, etc.


July 18, 2009
Zach said:

Stephen,
I can appreciate the distinction between a monitoring tool and an analysis tool. However, I don't think that fully explains our difference of perspective. Even in the case of a monitoring application, a bunch of factors need to be simultaneously optimized to ensure it communicates effectively (e.g. readability, layout and structure, connections and comparison, information design). The one-page constraint elevates the importance of comparison above other factors that have significant impact on the overall success of the dashboard. The constraint has real impacts:
* Tiny fonts and graphics to squeeze in all the information
* Inability to lay out the information to reflect the structure of the business (i.e. show connections)
* Inability to position graphics in ways that support comparison
* All the relevant information has to be shown at once, rather than gradually revealing detail as the user expresses interest.

It is as if you told me that the goal of a new car model is to achieve 40 miles per gallon of gas. It is a fine goal, but it entails sacrifices to comfort, fun, and innovation. You'll never end up with an electric car.

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment