From time to time we call in some data muscle to help on a project or to brainstorm about a problem. Harold, Amanda and Sean at Five x Five have been awesome data resources, colleagues and valuable part of our little Atlanta Data Village.
Here is a link to their website as well as the meaning of Five x Five in case you don’t know. Great name for a data company. We finally convinced Amanda to write a blog and share some thoughts. Enjoy!
“I have all of this data, but nobody knows what to do with it.”
We hear some version of this phrase, often with a heavy dose of exasperation, during preliminary meetings with nearly all of our clients.
The vast amount of data – be it transactional, customer-level, sku, property, etc. – available these days can be as much a source of stress on marketers as it can be valuable. “Big Data” is now practically a household term, but what do we DO with it? How do we make it manageable?
Smaller, manageable and meaningful
For us at FIVExFIVE, data is all about delivering meaningful insights through segmentation, prediction, and optimization. The reality is that before we can get to any “fun analytical stuff”, we spend about 80% of our time on exploration and cleansing – discovering the “smaller, manageable, and meaningful” Big Data
Here is a small flavor of the steps to get your data ready.
- What is a unit? (e.g. transaction, customer, product sku)
How many records do we see vs. expect to see? (Important when importing/exporting)
Is our data unique by unit? Unit + time? Unit + time + space? Etc.
What level do we need for our analysis? E.g., should “transaction” level data be aggregated to person? Store? Product? Property?
Displaying distributions to identify outliers, or vast differences in group sizes
Graphing networks to uncover clusters of data
Uncovering patterns in time, space, or multivariate relationships
Standardizing formats (e.g. date, region names)
Removing/Replacing invalid characters, or characters in numeric data
Determining the difference between “missing” and “zero” data
Code verification (e.g. frequency tables for dummy variable creation)
Segmentation (grouping units into homogeneous groups)
Relationship between dependent variable and covariates, if prediction is the goal
Choosing statistical methods depending on relationships and distributions
Determining optimal mix of decisions to achieve goals (minimize, maximize, etc.)
Correlations to discover redundant or nearly redundant variables
Once your data has been cleansed and processed, you then try to answer some of the BIG questions. What will be actionable, interpretable, and relevant? Its only after you figure out the relevant questions can you then begin to narrow down the sometimes thousands of columns to determine what really drive business results, satisfaction or profitability.
But wait, there’s more.
Even after you’ve made sense of the data and developed an analytical solution, you’re not finished. You have to visualize/present the results in a way that the decision-makers value the analysis, make a decision, and want more. Keeping it simple, and saying as much as possible with as little clutter or extraneous displays of data is an art. Trust us. It isn’t easy for statisticians to admit this, but delivering beautiful, much appreciated visualizations is as much fun (and valuable) as the modeling terabytes of segmentation data.
Marrying all these disciplines and steps is what has to be done to turn your Big Data into the Best Data.
Have a perspective about “the process” that differs from ours? We’d love to hear your thoughts. Drop us a note at firstname.lastname@example.org.