After all these years, you still don’t understand me

This past week, I caught an interview on The Daily Show with Robert O’Harrow Jr., author of "No Place to Hide." The book is a potentially frightening report on personal information collection by corporations and the federal government. Mr. O’Harrow offered a scary description of the dangers that await ordinary citizens caught in this shadowy experiment (e.g. jailed for a crime you have yet to commit, a la Minority Report).

As is typical, Jon Stewart asked a very insightful question (I paraphrase):

Amazon is always wrong with their recommendations; what makes us think that the government will be able to do anything with all this data?

That’s precisely the question that comes to my mind when I hear stories of data collection. From what I’ve seen, gathering data is easy enough. It is making sense of the data that is hard. The challenge is to find relevant patterns of behavior, then determining causation with important outcomes.

Jeff Jonas, now chief scientist at IBM Entity Analytics, invented a data-mining technology used widely in the private sector and by the government. He sympathizes, he said, with an analyst facing an unknown threat who gathers enormous volumes of data "and says, ’There must be a secret in there.’ "

But pattern matching, he argued, will not find it. Techniques that "look at people’s behavior to predict terrorist intent," he said, "are so far from reaching the level of accuracy that’s necessary that I see them as nothing but civil liberty infringement engines." - from Hagerstown Free Army blog, Intercepting Irony

Getting beyond gathering data to actual insight is a surprisingly common problem in the corporate world. There is a common progression that I’ve seen:

  • company wants to be data-driven,
  • company puts hooks into its customer-facing systems to gather data,
  • data piles into data warehouse,
  • first generation data warehouse proves unusable,
  • new, better data warehouse is commissioned,
  • new, better data warehouse comes online (a year and a few million dollars later)
  • value of new datawarehouse is diminished by new business direction,
  • money runs out for analytics projects

Even the companies that have the stamina to squeeze value from their customer data aren’t quite as sophisticated as we imagine (fear?) them to be. The reigning king of data-driven decision making, Capital One, drops a credit card mailing to me on a weekly basis even though I haven’t responded in 10 years.

Mr. O’Harrow is probably right to sound the alarms about what could be accomplished with the growing mounds of personal information -- or how personal data may be misinterpreted. That said, I’m skeptical that any organization, in particular the US government, is likely to effectively use such a big pile of data.