Chapter 10 I k-nearest neighbors Picking good features To figure out recommendations, you had users rate
categories of movies. What if you had them rate pictures
of cats instead? Then you’d find users who rated those
pictures similarly. This would probably be a worse recommendations
engine, because the “features” don’t have a lot to do with taste in
movies!
Or suppose you ask users to rate movies so you can give them
recommendations—but you only ask them to rate
Toy Story ,
Toy Story 2 , and
Toy Story 3 . This won’t tell you a lot about the users’ movie tastes!
When you’re working with KNN, it’s really important to pick the right
features to compare against. Picking the right features means
• Features that directly correlate to the movies you’re trying to
recommend
• Features that don’t have a bias (for example, if you ask the users to
only rate comedy movies, that doesn’t tell you whether they like
action movies)
Do you think ratings are a good way to recommend movies? Maybe I
rated
The Wire more highly than
House Hunters , but I actually spend
more time watching
House Hunters . How would you improve this
Netflix recommendations system?
Going back to the bakery: can you think of two good and two bad
features you could have picked for the bakery? Maybe you need to make
more loaves after you advertise in the paper. Or maybe you need to
make more loaves on Mondays.
There’s no one right answer when it comes to picking good features. You
have to think about all the different things you need to consider.
EXERCISE 10.3 Netflix has millions of users. The earlier example looked at the five
closest neighbors for building the recommendations system. Is this
too low? Too high?