Friday, January 09, 2009

The "Harry Potter" problem in recommendations

Greg Linden covered the Harry Potter problem in a blog post on recommendation technologies a few years ago:
A very sharp and experienced developer named Eric wrote the first version of similarities that made it out to the Amazon website. It was great working with Eric. I learned much from him over the years.

The first version of similarities was quite popular. But it had a problem, the Harry Potter problem.

Oh, yes, Harry Potter. Harry Potter is a runaway bestseller. Kids buy it. Adults buy it. Everyone buys it.

So, take a book, any book. If you look at all the customers who bought that book, then look at what other books they bought, rest assured, most of them have bought Harry Potter.


-http://glinden.blogspot.com/2006/03/early-amazon-similarities.html


When I worked on the personalization team we were still struggling with the problem- there are definite ways to identify a Harry Potter problem, but you have to remember to apply them. Adding to that, within certain genres there are Harry Potter books/music albums that are only runaway successes within those genres. If you compared those books to the general list of books that amazon sells, they wouldn't look like books that everyone has bought. Taking it a step further, if then if you narrow the scope to only related books you'll find that they are crazy popular.

The biggest side effect of the Harry Potter problem is that it weakens recommendations. For instance, I've bought the O'Reilly regex pocket book and the O'Reilly Python Cookbook and Ruby Cookbook. From those three books, you can pretty easily peg me as a web nerd and safely recommend a Steve Souder's website performance book. Those are very strongly correlated purchases in a narrow band of interest. However, because I'm a geek, I've also bought Neal Stephenson's latest book, Anathem. As have a few hundred thousand OTHER geeks. We could say that Anathem is a nerd's Harry Potter.

So I received an email today from amazon with a list of recommended books, most of which were based off Anathem and Daniel Silva's latest book, Moscow Rules (great book but also a bit of a Harry Potter widely-bought book). As you might guess, the recommendations were really bad. I wish that email had a link that I could click that would say "never recommend any of these books to me again please" -I could go to each detail page and mark that, but it would take a massive amount of time.

Labels: ,

1 Comments:

Blogger Jerr Dunlap said...

Interesting problem - If I could paraphrase, it's the relevance of six degrees of freedom problem. I landed here from your sailing interest and love your photography but I've not seen sailing here. Thanks for bringing up the question I had in mind. I'm looking forward to following your blog - Great discussion!
- Jerr

Sunday, November 01, 2009 10:21:00 AM  

Post a Comment

<< Home


[/home] [blog home]

07/01/2002 - 08/01/2002 08/01/2002 - 09/01/2002 09/01/2002 - 10/01/2002 10/01/2002 - 11/01/2002 11/01/2002 - 12/01/2002 02/01/2005 - 03/01/2005 03/01/2005 - 04/01/2005 04/01/2005 - 05/01/2005 05/01/2005 - 06/01/2005 06/01/2005 - 07/01/2005 07/01/2005 - 08/01/2005 08/01/2005 - 09/01/2005 09/01/2005 - 10/01/2005 11/01/2005 - 12/01/2005 12/01/2005 - 01/01/2006 01/01/2006 - 02/01/2006 02/01/2006 - 03/01/2006 03/01/2006 - 04/01/2006 04/01/2006 - 05/01/2006 05/01/2006 - 06/01/2006 06/01/2006 - 07/01/2006 07/01/2006 - 08/01/2006 08/01/2006 - 09/01/2006 09/01/2006 - 10/01/2006 10/01/2006 - 11/01/2006 11/01/2006 - 12/01/2006 12/01/2006 - 01/01/2007 01/01/2007 - 02/01/2007 02/01/2007 - 03/01/2007 03/01/2007 - 04/01/2007 05/01/2007 - 06/01/2007 06/01/2007 - 07/01/2007 07/01/2007 - 08/01/2007 08/01/2007 - 09/01/2007 09/01/2007 - 10/01/2007 10/01/2007 - 11/01/2007 11/01/2007 - 12/01/2007 12/01/2007 - 01/01/2008 01/01/2008 - 02/01/2008 02/01/2008 - 03/01/2008 03/01/2008 - 04/01/2008 04/01/2008 - 05/01/2008 05/01/2008 - 06/01/2008 06/01/2008 - 07/01/2008 07/01/2008 - 08/01/2008 08/01/2008 - 09/01/2008 09/01/2008 - 10/01/2008 11/01/2008 - 12/01/2008 12/01/2008 - 01/01/2009 01/01/2009 - 02/01/2009 02/01/2009 - 03/01/2009 03/01/2009 - 04/01/2009 04/01/2009 - 05/01/2009