1. Skip to navigation
  2. Skip to content
  3. Skip to sidebar

Our Blog

We don’t tend to agree with Microsoft when it comes to data analysis and presentations. In fact, we’ve even been critical of them for misrepresenting data, excessive visual “flair”, missed opportunities to improve Excel, forgetting their power users, subpar presentation tools, and wasteful slide masters.

With all these past differences, I was a little surprised to find that we do share some common ground. Check out the comments (from an article in Internet News) by Peter Klein, CFO for Microsoft’s Business Division in describing the world of business intelligence:

“I’ve talked to a lot of customers about business intelligence and the one thing that they tell me is it’s really hard to use,” said Peter Klein, during at the Credit Suisse conference.

“‘I’m not getting the value out of the investment that I made,’” Klein said customers had complained. “‘I have invested a lot in my back-end systems, and today 10 percent or less of my employees actually touch it, or get access to the data. I’ve got six different BI solutions across multiple different departments, none of which talk to each other. And they’re hard to use, so I’ve got to send people to training for two weeks to learn how to use it.

Finally, we are speaking the same language. Now, I’m curious to see what they are going to do about it.

Topics:
, , ,



There are two kinds of people in this world: those who put things into two categories and those who don’t. Maybe this isn’t the best representation of the complexities of the human race, but it does give me a cheap lead-in to compare two types of problem solutions: “high tech,” focused on tools, and “high touch,” focused on interpersonal communications.

I was reminded of these two approaches by a recent interesting article in Wired that expresses an opinion about why America’s performance in Iraq has been disappointing. The basic premise of this article is that America has entered into this engagement in a “technology networked” fashion, drowning it in technology; the more, the better.

The article suggests that the US forces would make more progress if they were to spend more time on a “socially networked” approach. For instance, instead of remote controlling a drone from 100 miles away, spend more time drinking chai with local leaders. Not the absence of technology, but the incorporation of technology into a socially based environment.

“If I know where the enemy is, I can kill it. My problem is I can’t connect with the local population.” This was a quote from one division commander. Change a couple of words and you end up with a statement that many of us would find all too familiar:

“If I know where the inefficiency is, I can fix it. My problem is I can’t connect with my data.”

Aren’t we witnessing this in spades right now in the BI space? There’s no lack of number of tools and number of features in these tools. The challenge is figuring out who the real insurgents are and how you deal with them. If you’ve been reading the Juice blog for very long, you have a pretty good feeling for how we approach what we believe is a social problem (high touch) and not a technical one (high tech).

The good news is that the US forces are changing their approach to socialize more with the Iraqi people—hopefully leading to a better Iraq. Is there good news for the BI space? We’d like to hear from you on how you’re making sure you focus enough on the social “high touch” aspects of our space. What’s your insurgent data? How can you get to know it better?

Topics:
,



Monte Carlo. It’s a car. It’s a song. It’s a casino. It’s a city in Monaco (near France, somewhere). It’s also a method of statistical simulation that is used to better understand the probabilistic distribution of known set of data. Great. So what?

I recently attended the Atlanta session of the FogBugz World Tour to hear from Joel Spolsky about the latest version of his bug tracking software, FogBugz 6.0. Joel Spolsky has been writing about software development, management, business, and the Internet at joelonsoftware.com since 2000. Three books have sprung from content on his site: User Interface Design for Programmers, Joel on Software, and Smart and Gets Things Done. They are good reads for software developers and business folks alike; smart, funny, and focused on the human face of software development.

Before starting his company, Joel was a Program Manager at Microsoft on the Excel team to provide programmability to Excel.

The new release of FogBugz has quite a few nice features to make bug tracking much easier, but I was most intrigued by the new capability called Evidence-Based Scheduling, or EBS for short. This new capability uses a Monte Carlo simulation approach to determine probabilities associated with the expected “ship dates” of the software release project.

The core premise behind this capability is that unlike in the financial realm, in the software world, future results can accurately be based on past performance. FogBugz remembers the original estimate and the actuals for each task for each developer, and for all the projects they’ve been assigned to. Here’s where it gets interesting. Based on this information, FogBugz runs a series of Monte Carlo scenarios where it randomly generates different plausible results for each member of the development team. It then assembles the results to create a distribution for probability of completion for each developer—and more importantly, for the entire project. The result of this analysis is a “Ship Date” probability curve that shows potential ship dates on the X-axis and probabilities of achieving those dates on the Y-axis. Ship Date depends not only on estimates for remaining components, but also on the probability that the individual developers will be able to individually meet their commitments. A steeper curve means the developer estimates are more confident and a flatter curve means estimates are less confident.

The classic software development process involves balancing three things: time, money, and features. Most projects of reasonable duration are not going to be able to effectively add staff mid-stream—at least not to accelerate delivery timeframes. If you assume the primary factor in determining “money” is tied to people, time and features are your only real variables. FogBugz 6.0 lets you experiment with changing these two factors. This is a useful tool. Consider a scenario where the business people come to the development lead because they need a project to be complete by a specific date. This tool gives the project lead enough information to understand the probability of making a specific date. Additionally, it lets you test out what will happen if you remove a particular feature. With this new information, the development lead has the visibility to provide back additional guidance to the project sponsor about the probability of making the requested date, as well as the effect that changing release date and scope have on the probability of on-time delivery.

So how does it know? The ship dates are based on each developer’s history of estimation accuracy supported by developer time sheets. Yes, that’s right. I said time sheets—the bane of all developers. But stick with me, it’s not as bad as you might think. Each developer (or responsible team lead) can turn on a timer that automatically tracks time spent on each of their cases.

Fogbugz plots a linear fit through the data points for each developer and then uses the calculated R2 value to determine how consistent the developer’s estimates have historically been. Based on this calculation, probability distributions for remaining tasks for each team member are determined, which leads the ship date probabilities. It’s then easy to see the long pole in the project (Milton Ritchie in this example). This doesn’t necessarily mean the most work, but only reflects the probability of on-time completion—and correspondingly, the most risk to the project.

Anyone who has followed Joel’s writings knows that he is keen on the idea of improving the process of software development. And, we all know that estimating an accurate time to completion, or “ship date,” for any software project is a pain in the rear even after all these years of “practice.” The unique approach that Fog Creek takes is not to predict the specific completion time, but rather to predict the probability of completing the project across a range of dates.

So, project estimating will likely never be a simple turn and churn process. But with this release of FogBugz, Fog Creek Software has shown great outside-the-box-thinking that could significantly improve how we deliver software projects.

Topics:
, , ,



Today’s post is brought to you by Andrew White of Gartner from an article in
their 2007 CRM conference brochure:

What’s the single biggest benefit of practicing MDM?

There are multiple drivers that help enterprises decide to
embark on an MDM [1] program. Implementing a CDI-focused [2] MDM program
will help implementations of CRM [3] achieve a higher return by
enabling better cross-marketing and selling.

Implementing PIM [4] within MDM will help supply chains fulfill
orders more timely [sic] and introduce new products more quickly. Embedding
MDM in an SOA [5] environment contributes to business (process) agility
through support of more rapidly developed composite applications; and
others help cut costs by supporting better procurement practices.

Way to cut though to the heart of the issue, guys. Let’s see if we
can decode what they’re saying:

Knowing more about your customers will help you find
more products that existing customers want. It will help develop those
products too. And let’s not forget your web apps. They’ll be easier to
develop and easier for other companies to integrate with if you have
your data well organized.

It’s nice to be able to decode this, but semantically, there’s
nothing there. This response amounts to “Trust us, it’s great!”

[1] Master Data Management is another salvo in the eternal battle between centralization and decentralization in organizations. The wheel turns; today it’s MDM, in 5 years it will be called Centralized Metadata Integration.

[2] Customer Data Integration means centralizing how you track customer-related information

[3] Customer Relationship Management systems track interactions with
your customers

[4] Product Information Management is CDI for products–see how easy
this is getting?

[5] Service Oriented Architecture is a way of building computer
services as little pieces rather than big integrated applications

Topics:
, ,



I don’t know what you call it, but I know it when I see it. A couple months back I wrote about IBM’s sweet $80 million contract to develop ARIS (Achievement Reporting and Innovation System) for the New York City public schools. At the time I used some harsh words to describe this fleecing: swindle…preying on clients’ lack of expertise…Dr. Evil…wasted time and effort.

News comes to me from Leonie Haimson, Executive Director of Class Size Matters, that the $80 million price tag is, well, a starting point. She pointed me to a recent article that describes the creeping costs:

The education department’s new $80 million student-tracking computer system just got more expensive – and some parents are questioning whether that’s the best use of the money.

To ensure that children’s test scores and other private data don’t get into the wrong hands, the city began accepting bids this week from companies that specialize in safeguarding information, which experts say could add several million dollars to the system’s price.

“What’s not lost on parents of kids in overcrowded schools is that with the money being spent on this, we could build and staff several more schools,” said Tim Johnson, president of the Chancellor’s Parent Advisory Council.

Parents are also wondering whether the system’s mounting cost is worth it – and why education officials didn’t anticipate the extra cost sooner. —New York Daily News

It does seem odd that a $80 million system wouldn’t come pretty well stocked with security, particularly from a blue-chip vendor like IBM. On top of that, Leonie hints at other costs that aren’t being directly counted toward the implementation of this system:

This initiative has mushroomed into a huge expense that threatens to overwhelm the entire school system, with all the SAFS, data inquiry teams, tests, and even the community district superintendents gobbled up to interpret and try to “coach” schools in the use of the massive data that will be spewed out. The DOE wants to charge much of this to the “contracts for excellence” and our CFE dividend, though it’s a real stretch to see if any of this falls under the specific programs outlined by the state.

Good luck to Leonie, Patrick Sullivan and the others who are stepping up to question this white elephant project.

Topics:
, ,



Google Analytics has been rebuilt and the result redefines the frontiers of doing analytics on the web. Avinash Kaushik has the definitive early review.

Google Analytics v2

I had the privilege of attending the launch and playing with the early release. Here are a few things I noticed.

  • Speak my language: Google has put a lot of effort into replacing specialized terms with everyday ones. This makes the application usable by a broad base of people and is one way to fight GUI Jock-itis.
  • Speed kills: The interface is easily reconfigurable and fast. I’ve long argued that interface speed is a substitute for configuration options. I’m curious to play with the tool and get a better sense if this is true.
  • Flex rules: Much of the componentry for viewing data in Google Analytics is built in Adobe Flex. This is similar to Google Finance, and not at all like GMail or Google Reader, which use the GWT. We believe this has profound implications for analytical tools on the web and will dig into this in later posts.
Topics:
, , , ,



Swivel has been in the news lately–it’s “YouTube for data!”(tm). Michael Arrington, whose blog TechCrunch is the chronicle of Web 2.0, wrote a slavering piece.

Academic types are going to go nuts over this. I spent a summer in college running regression analysis models on economic data. Being able to simply upload data to Swivel and then begin to slice and dice the data would have saved a lot of time. …And being able to compare our data to what others were doing in related fields could have yielded results that we would never have aimed for. Big companies, small companies, thinktanks and non-classified government organizations are going to be similarly dazzled.

There is a big difference between marketing and delivery. The initial views of the site are discouraging. They encourage that great bug-a-boo of analytics, confounding correlation with causation. Having a bunch of data series up and generating correlations is not a recipe for insight. (Equally annoying are the persistent comments reminding people of this).

Even so, we should applaud the Swivel team for implementing a platform for sharing data. However, the success or mediocrity of Swivel may ultimately lie in the hands of the users. The early returns don’t look promising. Here are a few examples from the current state of Swivel analysis:

The most popular graphics to date relates wine sipping to violent crime. In fact, the top 24 most viewed graphs are  permutations of this silly correlation analysis. If it is meant in jest, lots of Swivel users didn’t get the memo. A graph showing wine consumption vs. violent crime elicited a laundry list of theories about why this “phenomena” occurs.

Wine and Crime

Here’s another graph that seems to maximize confusion. I read it that there was a lot of variety in ways to die back in the 70s, but in the mid-90s it was pretty much a crap-shoot. Maybe Swivel will challenge users to become better at conveying information–or it may validate crap like this.

Ways of Dying

Here’s what I like best about this one: it’s touted as a “great example of Swivel correlation meters.” The concept of “correlations of ways of dying” is humorous (personally, I imagine I will partake in just one of these ways of dying).

Ways of Dying Correlation

More importantly, Swivel allows for easy sharing of independent data sets:

* A public place for interesting data sets. Searchable, tagged…for our mockups

* Easy tool to overlay related data sets

* Popularization of data analysisAll the Swivel hype drowns out two important developments in the future of web-based analytics.

Counterpoints

Jon Udell on Google Spreadsheets and GFinance integration

PeterMe: Wherein I Finally Write About My Idea Conference– Peter Me Fernanda Viegas – Democratizing Visualization. This might have been, for me, the single most exciting project discussed at IDEA (with the possible exception of StoryCorps). Fernanda gave a sneak peak at Many Eyes, a service soon to be released by IBM Research that allows people to visualize data — either their own, or publicly accessible data. It also turns these visualizations into social artifacts — people can comment on one another’s visualizations. There is so much potential for this — my glib take on it is “It’s Youtube for Data Viz!”

Topics:



Ok, “raging” is too strong a word. But there is an growing debate about how to transform companies into data-driven, analytics-led organizations. This debate is worth cherishing: strong opinions are novel for a community that is more comfortable relying on facts that philosophy.

I wanted to summarized and provide impressions of the discussion because I think it sheds light on many of the challenges facing organizations in this area. We can all agree on the importance of analytics to drive smarter decisions; there is less agreement on how to implement analytics.

The discussion was initiated by Tom Davenport’s article published in the Harvard Business Review entitled “Competing on Analytics” published Jan 1 2006 (spin-off article here). It received enough notice that there was a follow-on conference. Tom interviewed 32 companies who were relatively advanced in using analytics (defined as statistical and predictive analytics). Research was “carried out independently” but sponsored by SAS and Intel.

The first reaction of many involved in this industry was to appreciate the attention. A few bloggers/writers were happy to summarize his work:

Then came the trouble: a few practicioners of business analytics looked this gift horse in the mouth and questioned Davenport’s well-intentioned but ultimately misguided assumptions about what it takes to be an “analytics competitor.” In particular, there was a sense that he had lost touch (or perhaps never been in touch) with the realities of implementing analytics capabilities in a complex organization.

Neil Raden helps frame the essence of the debate:

There are two schools of thought when it comes to the value of BI in general. One is that it is best used by “quantitative” types and other analytical business people, who can spot trends and analyze patterns to assist in the big decisions and set and direct strategy. The other position is that BI is at its best when helping a broad range of people and processes at an operational level, marginally improving performance, repeatedly and often.

Here are some of the primary arguments provided by the competing schools of thought:

Centralized analytics. The Davenport camp of analytics focuses on centralization of resources and data, top-down decisions, and breadth of analytical capabilities.

* Top level commitment and vision. Davenport says you know you are competing on analytics when “your senior executive team not only recognizes the importance of analytics capabilities but also makes their development and maintenance a primary focus.”

* Centralized analytical capabilities ensure a cross-organizational (therefore balanced) analytical conclusions. Jim Novo forcefully (if a bit angrily) argues:

“if a silo wants to keep an analytical “lead” in it’s own little box to do the navel-gazing, silo-focused analysis that impacts it’s own little box, then that’s OK. Just know that this analysis, while meaningful to the little box, cannot be used or trusted anywhere else in the company and so is of very little value in a macro way.”

Furthermore, this centralization implies a team of quant experts who are responsible for analytics organization-wide.

* Required data centralization, standardization, control, and integration. Davenport argues that “the difficulty is primarily in ensuring data quality, integrating and reconciling it across different systems, and deciding what subsets of data to make easily available in data warehouses.”

* Omnipresence. A curious portion of the requirements for “analytics competing” relies on quantity-related phrases like: “copious data”, “seizing every opportunity to generate information”, hiring “a lot of people with the very best analytical skills”, “employ analytics in almost every function and department”, and “building your capabilities for several years.”

Decentralized. These people, Juice included, sense that building analytics capabilities is more about picking the high impact opportunities, scaling with proven value, and working through the organizational challenges that data-driven decisions can create.

* Good analytics is agile and local

[Centralized design] is another naïve assumption, because many organizations are not only decentralized—they’re dysfunctional. Separate units within organizations often need autonomy because they are just so different from the rest of the organization. In addition, as an organization becomes more “agile,” which is a definite trend, decision-making, even for the big decisions, will become more decentralized. Imagine how difficult it will be to buy or sell pieces of a company if the “brain,” the centralized analytical capability, stays with the parent and there is no local expertise?

Davenport admits that some of his not-yet-analytics-competitors face an environment with “very high levels of functional or business-unit autonomy, making it difficult to mount a cohesive approach to analytics across the enterprise.” Well, that structure likely makes sense for many reasons — and changing for analytics is letting the tail wag the dog.

* Focus analytics to the places that matter. Davenport poo-poos those companies who’s “efforts have been primarily local—that is, limited to particular functions or units, such as marketing.” However, if you are targeting the right areas of your companies — the areas that make a difference in your competitive environment — then targeted analytics are just what you need. Analytics should be built around the key leverage points of the organization. Breadth of analytics implies both lack of focus and wasted resources.

We made the point:

Analytics is hard. Analytics takes resources. It takes effort for an organization to create and assimilate learnings from analytics…UPS focuses their analytics on knowing where packages are, Marriott focuses on revenue management. If you try to do everything, you won’t do anything well.

* Simplicity. Analytics doesn’t have to complex. In fact, analytics are often better when they are simple and accessible so audiences at many levels absorb and integrate the meaning into their decisions. Raden puts it another way:

“When it comes to quantitative modeling in business, there is a recurrent paradox—the more complex the model, the less faith people put in it. People take advice from people like themselves.”

* Culture matters most. The biggest challenge is building a culture that embraces (and even demands) data to support decisions. This seems ignored by the Davenport crowd. For some executives, there is a viseral reaction to tools that appear to displace their years of hard-won expertise. or those of us who have been working on the ground helping companies move in this direction,

I’d be remiss to leave out another school of thought: the relativists. They recognize that it all depends on the unique situation of the organization and that there are important and valid points on all sides. These people (like Nishith from Open Source Analytics) would rather find the common ground. They recognize the role of centralized analytics (Raden: “Centralized data mining/predictive modeling groups are capable of discovering valuable insights that can then be encapsulated into reusable algorithms, scores, or rules”) but recognize it isn’t practical or realistic for most businesses. But if we listen to them, our best debate in this business goes away.

Topics:



A friend of mine with years of analytics and management experience at big companies wrote recently. He puts his finger squarely on a real issue with enterprise data warehouses.

“I wanted to provide some comments on the enterprise wide data warehouse and the challenges it presents at large corporations. Jim Novo certainly seems to support the roll up approach (I’m on his mailing list) but I agree with Juice that it is too slow, too costly, and results in restricted analytics the way most large companies build them. Most of the large data warehouses I’ve seen only include data variables that are key to managing a business TODAY as the warehouses are too big and costly to store data variables with a low usage frequency. They also attempt to cleanse the data by classifying. This makes life easier an analyst with statistical experience but a limited knowledge of the business. However you’re losing information.  Problem: You do not know what will be important in the future. Distributed databases at a line of business or product level tend to store more raw data. Sure, the amount of space used would be the same if you simply put into the warehouse but that is not the way decisions are made. Decision makers look at the frequency of use of the data variables (TODAY) and the cost to include them. Also, the analysts who are disconnected to the business lines do not understand the raw data. “

“Let me give you a real world example. Our data classifies claims into a limited number of claim reason categories. When a new type of claim is developing, the person classifying the claims (claims rep) does not have a category to select so they just select what works best to fit into the pre-defined categories. Information is lost due to the restrictions of the allowed categories within the data warehouse. If the notations from the claims system would have been stored (an unforeseen variable) in the warehouse and text mining analytics being done, the word “mold” would have been found associated with claims at an alarming rate. This would have allowed for early recognition of the issue. It cost us a lot of money in mold claims due to the missing data but who would have thought to include the notes due to the size and costs? Well, we have them now.”

Topics:



Yesterday I presented to an B-eye-network audience our perspective on why business intelligence is broken and what can be done to fix it. The full PDF-version (4mb) of the presentation can be downloaded.

A sampling of the fun:

“Chart-based encryption — data goes in, no information comes out”

Chart-based encryption

On the excessive emphasis on reporting over analysis…

Herding

“Technologists are looking to build an atomic-baloney slicer”…”Nobody ever got fired for adding more requirements”

Waiting

“Data analysis isn’t just for the data analysts anymore”

Typing is to...

“Have you ever working with a reporting tool that outputted to PDF?”Sheep

Hopefully we stirred the pot a little with this presentation. A recording of the B-eye-network event should be available soon.

Topics:



Page 2 of 41234