Looking at Trees to Understand the Forest

David Simon (of The Wire fame) has sucked me into another brilliant television series with Generation Kill. It is the story of a Marine recon unit at the beginning of the Iraq war. At the heart of all the action, the seven-part miniseries offers an intimate and honest profiles of individual Marines.

The characters don’t so much displace stereotypes as reveal texture and insight about the unique qualities of individual Marines.

The series got me thinking once again about different ways to analyze data. Almost four years ago, I posted a couple blog posts (Part 1 and Part 2) making a case for analyzing and visualizing data at a granular level to uncover patterns and behaviors. Generation Kill is a case study in looking closely at the individual trees to understand the forest.

Analytics is a journey of exploration--a continuous series of iterations with the goal of deeper understanding based on better questions and more targeted analyses. Einstein said:

"To raise new questions, new possibilities, to regard old problems from a new angle, requires creative imagination and marks real advance in science."

How to arrive at new questions?

In the previous blog post, I described examples from online learning, credit cards usage, and football film study to show how granular analysis can spur new questions. I’ve stumbled across a series of new examples recently:

Surveys. Survey analysis is hard work--just ask Ken who recently presented results from Juice’s survey on the practice of information visualization in organizations. If a survey is mostly about understanding your audience, rolling up responses by questions can’t be the only approach (though it is the most common). Cross tabs ("displays the joint distribution of two or more variables") are one direction to go. Another approach is to look for people who share common characteristics or patterns in their responses.

Macrofocus’ SurveyVisualizer is the most innovative survey analysis tool I’ve seen and it emphasizes data at a granular level.

"All the analysis elements are always shown as grey lines in the background. This provides an overview of the ranges and spreads of the individual values for each node, and facilitates the detection of outliers." (from Visualization of Large-Scale Customer Satisfaction Surveys Using a Parallel Coordinate Tree)

Medical research. Research studies are conducted against carefully defined target and control populations with aggregate statistics across these populations required for conclusions. However, the ability to review the patterns of diagnoses and procedures at the individual patient-level can help test assumptions about the target population and refine the parameters of a study. Better model inputs; better results.

Speech analytics. Michel Guillet at Nexidia recently told me about their approach to speech data:

Nexidia’s speech analytics can mine thousands of hours of audio to categorize, correlate or spot trends. However, it is quite often in identifying and listening to a lone outlier that the application provides its most valuable insights. Some examples of outliers can be the very long call of a particular call type, the extremely abrupt one, the one with the most languages spoken or the one where no one is speaking at all. An outlier can change your hypotheses and put you in a different direction…perhaps a better one. Nexidia’s reporting and analysis tools offer many different methodologies including histograms, analysis of means charts and flexible filtration by meta-data to identify outliers in large amounts of data. In addition, Nexidia’s ad-hoc search functionality allows users to search an entire body of audio content at any time, which is often helpful to find the “smoking gun" or a single recording which can make or break an argument.

Of course you can’t be assured of a full or accurate picture when looking at granular data, but somewhere between standard aggregation-based analysis and granular views lies the truth.