Feature extraction
In the grapefruit example, you compared fruit based on how
big they are and how red they are. Size and color are the
features
you’re comparing. Now suppose you have three fruit. You can extract
the features.
Then you can graph the three fruit.
From the graph, you can tell visually that fruits A and B are similar.
Let’s measure how close they are. To find the distance between two
points, you use the Pythagorean formula.
192
Chapter 10
I
k-nearest neighbors
Here’s the distance between A and B, for example.
The distance between A and B is 1. You can find the rest of the
distances, too.
The distance formula confirms what you saw visually: fruits A and B
are similar.
Suppose you’re comparing Netflix users, instead. You need some
way to graph the users. So, you need to convert each user to a set of
coordinates, just as you did for fruit.
193
Building a recommendations system
Once you can graph users, you can measure the distance between them.
Here’s how you can convert users into a set of numbers. When users
sign up for Netflix, have them rate some categories of movies based on
how much they like those categories. For each user, you now have a set
of ratings!
Priyanka and Justin like Romance and hate Horror. Morpheus likes
Action but hates Romance (he hates when a good action movie gets
ruined by a cheesy romantic scene). Remember how in oranges versus
grapefruit, each fruit was represented by a set of two numbers? Here,
each user is represented by a set of five numbers.
A mathematician would say, instead of calculating the distance in two
dimensions, you’re now calculating the distance in
five
dimensions. But
the distance formula remains the same.
|