Building a spam filter
Spam filters use another simple algorithm called the
Naive Bayes
classifier
. First, you train your Naive Bayes classifier on some data.
Suppose you get an email with the subject “collect your million dollars
now!” Is it spam? You can break this sentence into words. Then, for
each word, see what the probability is for that word to show up in a
spam email. For example, in this very simple model, the word
million
only appears in spam emails. Naive Bayes figures out the probability
that something is likely to be spam. It has applications similar to KNN.
201
Introduction to machine learning
For example, you could use Naive Bayes to categorize fruit: you have
a fruit that’s big and red. What’s the probability that it’s a grapefruit?
It’s another simple algorithm that’s fairly effective. We love those
algorithms!
Predicting the stock market
Here’s something that’s hard to do with machine learning: really
predicting whether the stock market will go up or down. How do
you pick good features in a stock market? Suppose you say that if the
stock went up yesterday, it will go up today. Is that a good feature? Or
suppose you say that the stock will always go down in May. Will that
work? There’s no guaranteed way to use past numbers to predict future
performance. Predicting the future is hard, and it’s almost impossible
when there are so many variables involved.
Recap
I hope this gives you an idea of all the different things you can do with
KNN and with machine learning! Machine learning is an interesting
field that you can go pretty deep into if you decide to:
• KNN is used for classification and regression and involves looking
at the k-nearest neighbors.
• Classification = categorization into a group.
• Regression = predicting a response (like a number).
|