Analytics can be all about having the right tool for the job. When your data is text, traditional analysis tools (e.g. Excel, OLAP tools) are like peeling a mango with a chainsaw.
There are a number of visual exploration tools specifically designed for text data, including:
- Word clouds like Wordle (fun but superficial);
- Network diagrams like Visual Thesaurus (good for individual words, not text);
- Trend graphs like Baby Name Voyager or Google Trends;
- Granular presentations for interacting and exploring individual phrases, e.g. We Feel Fine and Twistori
- "Word trees" that let you navigate through lines of text to understand the most frequent words, relationships between words, and common phase and sentence structures.
It is quite difficult to find a Word Tree in the wild. The brilliant team at IBM’s Many Eyes were the first to make Word Tree’s generally available. The same ManyEyes team have also created an alternative approach for visual text exploration with a tool called Phrase Net.
Recently, we built a slightly different take on the Word Tree in Concentrate, our tool which allows users to explore huge search query lists to see how people use search keywords. For geeky entertainment, we created a special Concentrate demo account with the lyrics of songs from Rolling Stone’s 500 Greatest Songs of All Time. Click here to sign-in to the demo (Press submit and then choose WordTree at the top).
Here’s how our Word Tree works:
- The box at the center is your starting point. When you open a Word Tree, it will contain the most common word in the text data. You can edit this box to "re-center" the wordtree (name that tune):
- Stretched out on either side are words and phrases that are tied to that center word. The size of the words represents their relative frequency.
- Rolling over the words/phrases will highlight the connections to your center word and on the other side. You’ll also see a pop-up box with examples of the phrases containing selected words.
- You can open or close branches by clicking on a word. Words with hidden branches are highlighted in orange. We also have an ability colorize the words based on a metric in your text data.
While these more advanced visualizations are a start, I suspect there is a lot of room for other tools and techniques to visually explore text data. I’d be curious to hear about other tools you’ve seen along these lines.