More meaningful Big Data

From time to time we call in some data muscle to help on a project or to brainstorm about a problem.  Harold, Amanda and Sean at Five x Five have been awesome data resources, colleagues and valuable part of our little Atlanta Data Village.

Here is a link to their website as well as the meaning of Five x Five in case you don’t know.  Great name for a data company.  We finally convinced Amanda to write a blog and share some thoughts.  Enjoy!


Needle In A Haystack

“I have all of this data, but nobody knows what to do with it.”

We hear some version of this phrase, often with a heavy dose of exasperation, during preliminary meetings with nearly all of our clients.

The vast amount of data – be it transactional, customer-level, sku, property, etc. – available these days can be as much a source of stress on marketers as it can be valuable. “Big Data” is now practically a household term, but what do we DO with it? How do we make it manageable?

Smaller, manageable and meaningful

For us at FIVExFIVE, data is all about delivering meaningful insights through segmentation, prediction, and optimization.  The reality is that before we can get to any “fun analytical stuff”, we spend about 80% of our time on exploration and cleansing – discovering the “smaller, manageable, and meaningful” Big Data

Here is a small flavor of the steps to get your data ready.

Data Validation

  • What is a unit? (e.g. transaction, customer, product sku)
  • How many records do we see vs. expect to see? (Important when importing/exporting)
  • Is our data unique by unit? Unit + time? Unit + time + space? Etc.
  • What level do we need for our analysis? E.g., should “transaction” level data be aggregated to person? Store? Product? Property?

Pattern discovery

  • Displaying distributions to identify outliers, or vast differences in group sizes
  • Graphing networks to uncover clusters of data
  • Uncovering patterns in time, space, or multivariate relationships

Data Cleansing

  • Standardizing formats (e.g. date, region names)
  • Removing/Replacing invalid characters, or characters in numeric data
  • Determining the difference between “missing” and “zero” data
  • Code verification (e.g. frequency tables for dummy variable creation)

Data Analysis

  • Segmentation (grouping units into homogeneous groups)
  • Relationship between dependent variable and covariates, if prediction is the goal
  • Choosing statistical methods depending on relationships and distributions
  • Determining optimal mix of decisions to achieve goals (minimize, maximize, etc.)

Variable Reduction

  • Correlations to discover redundant or nearly redundant variables
  • Factors/Proxies

Once your data has been cleansed and processed, you then try to answer some of the BIG questions. What will be actionable, interpretable, and relevant? Its only after you figure out the relevant questions can you then begin to narrow down the sometimes thousands of columns to determine what really drive business results, satisfaction or profitability.

But wait, there’s more.

Even after you’ve made sense of the data and developed an analytical solution, you’re not finished.  You have to visualize/present the results in a way that the decision-makers value the analysis, make a decision, and want more.  Keeping it simple, and saying as much as possible with as little clutter or extraneous displays of data is an art. Trust us.  It isn’t easy for statisticians to admit this, but delivering beautiful, much appreciated visualizations is as much fun (and valuable) as the modeling terabytes of segmentation data.

Marrying all these disciplines and steps is what has to be done to turn your Big Data into the Best Data.

Have a perspective about “the process” that differs from ours? We’d love to hear your thoughts.  Drop us a note at