America's Next Top Predictive Modeler
By Zach Gemignani
October 5, 2006
Find more about:
analytics
A new dramality series features NetFlix super-model Cinematch that gives real people the opportunity to prove they can make it in the high-stress, high-stakes world of super-predictive modeling. The contest follows a group of data nerds of various backgrounds, shapes and sizes, who vie for a $1 million prize. (Thanks America's Top Model)
NetFlix, the online DVD rental service, has put together a contest that is the analytics equivalent of the X-prize. Finally a chance for data junkies to step out of their windowless offices onto the proverbial catwalk.
The contest asks participants to try to improve the accuracy of the company's existing Cinematch recommendation engine by 10%. According to the NetFlix prize web site, the Cinematch recommendation engine is designed to:
"...predict whether someone will enjoy a movie based on how much they liked or disliked other movies. We use those predictions to make presonal movie recommendations based on each customer's unique tastes."
Entering is simple: sign up your "team" and download the anonymized data set of NetFlix users and their ratings by movie. From what I read on the forums, the data set is actually quite shallow, so the initial thinking is that an improved system will require linking to additional data sources like IMDB.com to add richness to the modeling.





4 comments
Pete said:
The response for Netflix on the outside data question is a bit strange. They've encouraged people to use outside data if it is openly available, but decline to let people use data from the Netflix web site itself. The way I read this, Wikipedia is in, IMDB and even Netflix itself are out as data sources.
http://www.netflixprize.com/community/viewtopic.php?id=98
Chris said:
This is a little bit peculiar as well. http://www.netflixprize.com/community/viewtopic.php?id=74.
The part that stuck out for me was: "The source code is helpful in demonstrating how a system arrives at its answers <i>but is rarely sufficient.</i>[emphasis mine] And we'll probably want to see the system run on a much smaller dataset just to understand its function. We assume that most systems will have a learning and a prediction component though that need not be the case."
I think if you win the prize with a non-standard prediction approach you've basically signed yourself up for a couple of weeks teaching the team and helping them reimplement your approach in the language of their choice. Not the world's worst problem. ;-)
Pete said:
I finally started a blog, and the first post covers some statistics on the netflix prize leaderboard:
http://www.datawrangling.com/
The blog itself is a bit clunky right now, I'll try to install all the usual plugins and clean the formatting up later this week.
My test Netflix submission just used the basic SVD approach, I'll start posting leaderboard results from algorithms I've been toying with to the blog as I get them submitted.
Rick McCoy said:
I have to say, their are many ways of piracy, with so many people lurking in the wind from all over the globe on this project, it shows that the industry will have problems ahead. its a shame to make people work for free in an industry that makes billions of dollars.
"AI" artificial intelligence can not determine the human factor and i'm not impressed with the idea that a program profiles me when others make so many other choices.
when you go to a movie, you only have a few choices, not unlimited, many people watch every movie and have no opinion.
even more so, NFLX will have trouble in a couple short years as online movies overwhelm the rental industry.
so even if their was a program that is/could command a high respect of the human elemant of the idea of profiling, an award of final appeal will probably not be paid.
only thing that can happen is, the world is scammed and billions of man hours are lost when in fact, billions of dollars should be paid, and any part of anyones ideas should be rewarded with payment, not a pat on the back.
imagine this; 20,000 teams and only $50,000 paid out so far, even if the distribution was equal, thats less than $2.50 per person per year. the final award should at least be ten times that amount and more widely distributed amongst many teams achieving higher than 8 percent.
new ideas are needed, but not like this.
said:
Add a comment