Tom H. C. Anderson - Next Gen Market Research

More Than Market Research - Gain The Information Advantage

Tom H. C. Anderson - Next Gen Market Research header image 6

Netflix Model Improvement Prize Awarded

September 23rd, 2009 · 3 Comments

What Can be Learned From Netflix Crowd Sourcing Experiment?

Monday Belkor’s Pragmatic Chaos was awarded the $1,000,000 Netflix prize after a 3 year global competition to improve Netflix’s movie picking model. Interestingly, team Belkor led by AT&T Research engineers, in the end was actually a merger of three previously separate teams from the US, Austria, Canada and Israel.

Belkor was one of over 50,000 contestants from 186 countries competing. The raw data, 100 million movie ratings by Netflix users, was used to improve Netflix Cinematch model by at least 10% accuracy.
The rules themselves which included provisions like a 30 day window for opposing teams to make a last-ditch effort to beat Belkor’s final submitted solution lended themselves to some game theory.

This meant that after a three year competition the most activity was seen in the final hours of the contest, with as Neflix CEO pointed out ” Teams that had previously battled it out independently joined forces to surpass the 10% barrier. New submissions arrived fast and furious in the closing hours and the competition had more twists and turns than ‘The Crying Game,’ ‘The Usual Suspects’ and all the ‘Bourne’ movies wrapped into one.” In the end it came down to submission timing with team Belkor beating team Ensemble (also a composite team) by only 10 minutes.

We at Anderson Analytics had originally formed a team to compete but after taking a closer look at the rules and data (we spent some time analyzing the likelihood of winning vs. time invested), decided to take a pass. However, I am very curious to see what the winning formula looked like. I believe that utilizing text mining as part of model improvement could probably increase accuracy well above 10% if allowed by contest rules.

Here’s what I feel are the key takeaways from the contest/crowd sourcing experiment:

  • Composite teams/Collaboration - When team Belkor won the progress prize several other teams joined together to make a run at them. The success of these composite teams forced Belkor to also merge with other teams in order to win.
  • Cultural Diversity - Different ideas from different people in different countries. This may have been key for those small changes/improvements in the end.
  • Virtual Works- Team members tended to work from different locations communicating mainly via email. Belkor team members had never met in person before accepting the prize Monday.
  • Transparency - Importantly results were published during prize milestones.
  • Private - Looks like private business has better thinkers than Academia

I do wonder why improvement was only about 10%. Was it a self fulfilling prophecy and/or due to the game rules? In part I believe Netflix in setting the rules and limiting the data to that which was used by their model also limited the possible improvement in the model.

Anyway, apparently it’s not too late if you’re interested in competing. Netflix announced a new challenge, asking coders to predict movie preferences of users who don’t rate a lot of movies - or don’t rate movies at all. This time, contestants will have to construct “taste profiles” for users based on age, gender, zip codes and past movie selections. The contest pays $500,000 to the leader after six months, another $500,000 to the leader after 18 months.

@TomHCAnderson

[Post to Twitter] 

Tags: Academia · Anderson Analytics · Datamining · Modeling · Text Analytics · Tom H. C. Anderson · innovation

3 responses so far ↓

  • 1 Tom H C Anderson // Sep 23, 2009 at 12:15 pm

    Here is a link to Team Belkors site: http://www.research.att.com/~volinsky/netflix/bpc.html

  • 2 ubu.roi // Sep 25, 2009 at 6:16 am

    The thing is that they improved accuracy by 10%. But what does that mean? What is the absolute number? And how did they gauge accuracy in the first place?

  • 3 J C // Nov 10, 2009 at 12:20 pm

    I am curious about the measure of accuracy also… How was the baseline measure calculated and how did they determine there was a 10% improvement?

Leave a Comment