Bayes classifier: The Bayes rule is used to model the probability of each value of the target variable for a given set of feature variables. Similar to mixture modeling in clustering (cf. Sect. 6.5 in Chap. 6), it is assumed that the data points within a class are generated from a specific probability distribution such as the Bernoulli distribution or the multinomial distribution. A naive Bayes assumption of class-conditioned feature independence is often (but not always) used to simplify the modeling.
Logistic regression: The target variable is assumed to be drawn from a Bernoulli distribution whose mean is defined by a parameterized logit function on the feature variables. Thus, the probability distribution of the class variable is a parameterized function of the feature variables. This is in contrast to the Bayes model that assumes a specific generative model of the feature distribution of each class.
The first type of classifier is referred to as a generative classifier, whereas the second is referred to as a discriminative classifier. In the following, both classifiers will be studied in detail.
10.5.1 Naive Bayes Classifier
The Bayes classifier is based on the Bayes theorem for conditional probabilities. This the-orem quantifies the conditional probability of a random variable (class variable), given known observations about the value of another set of random variables (feature variables). The Bayes theorem is used widely in probability and statistics. To understand the Bayes theorem, consider the following example, based on Table 10.1:
Example 10.5.1 A charitable organization solicits donations from individuals in the pop-ulation of which 6/11 have age greater than 50. The company has a success rate of 6/11 in soliciting donations, and among the individuals who donate, the probability that the age is greater than 50 is 5/6. Given an individual with age greater than 50, what is the probability that he or she will donate?
Consider the case where the event E corresponds to (Age > 50), and event D corresponds to an individual being a donor. The goal is to determine the posterior probability P (D|E). This quantity is referred to as the “posterior” probability because it is conditioned on the observation of the event E that the individual has age greater than 50. The “prior” probability P (D), before observing the age, is 6/11. Clearly, knowledge of an individual’s age influences posterior probabilities because of the obvious correlations between age and donor behavior.
Bayes theorem is useful for estimating P (D|E) when it is hard to estimate P (D|E) directly from the training data, but other conditional and prior probabilities such as P (E|D), P (D), and P (E) can be estimated more easily. Specifically, Bayes theorem states the following:
P (D
|
E) =
|
P (E|D)P (D)
|
.
|
(10.16)
|
|
|
|
P (E)
|
|
Dostları ilə paylaş: |