Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	218/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 214 215 216 217 218 219 220 221 ... 423

1-Data Mining tarjima

y_i ∼ Probability distribution with mean f (		·	) ∀i ∈ {1 n}.	(11.12)

W	X_i

This function f (·) is referred to as the mean function, and its inverse f ⁻¹ (·) is referred to as the link function. Although the same mean/link function can be used with diﬀerent probability distributions, the selected mean/link functions and probability distributions are usually paired carefully to maximize eﬀectiveness and interpretability of the model. If the observed responses are discrete (e.g., binary), it is possible to use a discrete probability distribution for y_i (e.g., Bernoulli), as long as its mean is f (W · X_i). An example of this scenario is logistic regression. Some common examples of mean functions with their associ-ated probability distribution assumptions are illustrated in the table below:

358 CHAPTER 11. DATA CLASSIFICATION: ADVANCED CONCEPTS

Link function

Mean function

Distribution assumption

Identity

W · X

Normal

Inverse

−1/(

)

Exponential, Gamma

Log

exp(

)

Poisson

Logit

1/[1 + exp(−

)]

Bernoulli, Categorical

Probit

Φ(

)

Bernoulli, Categorical

The link function regulates the nature of the response variable and its usability in a specific application. For example, the log, logit, and probit link functions are typically used to model the relative frequency of a discrete or categorical outcome. Because of the probabilistic modeling of the response variable, a maximum likelihood approach is used to determine the optimal parameter set W , where the product of the probabilities (or probability densities) of the response variable outcomes is maximized. After estimating the parameters in W , the expected response value of a test instance T is estimated as f (W · T ). Furthermore, the probability distribution of the response variable (with mean f (W · T )) may be used for detailed analysis.

An important special case of GLM is least-squares regression. In this case, the probabil-ity distribution of the response y_i is the normal distribution with mean f (W · X_i) = W · X_i and constant variance σ2. The relationship f (W · X_i) = W · X _i follows from the fact that the link function is the identity function. The likelihood of the training data is as follows:

n n
Likelihood({y₁ . . . y_n}) = Probability(y_i) =

i=1 i=1

n

i=1

(y_i − f (

))2

exp

^Xi

^√2πσ

−

2σ²

(y_i −

)²

exp

X_i

^√2πσ

−

2σ²

			n
	exp		n	(y_i − W · X_i)²
∝		−	i=1	(y_i − W · X_i)²	.
				2σ²	.
				2σ²

In this special case, the maximum likelihood approach can be shown to be equivalent to the least-squares approach because the logarithm of the likelihood yields the scaled objective function of linear regression. Another specific example of the process of maximum likeli-hood estimation with the logit function and Bernoulli distribution is discussed in detail in Sect. 10.6 of Chap. 10. In this case, the discrete binary variable y_i is modeled from a Bernoulli distribution with mean function f (W · X_i) = 1/[1 + exp(−W · X_i)]:

with probability 1/[1

+ exp(

−

)]

y_i =

^Xi

(11.13)

with probability 1/[1

+ exp(W · X_i)].

Note that3 the mean of y_i still satisfies the mean function according to the table above. This special case of GLMs is referred to as logistic regression. Logistic regression can also be used for k-way categorical response values. In that case, a k-way categorical distribution is used, and its mean function maps to a k-dimensional vector to represent each outcome of the categorical variable. An added restriction is that the components of the k-dimensional vector must add to 1. Probit regression is a sister family of models to logit regression, in which the cumulative density function (CDF) Φ(·) of a standard normal distribution

³A slightly diﬀerent convention of y_i ∈ {−1, +1} is used in Chap. 10 for notational convenience. In that case, the mean function would need to be adjusted to ¹⁻^exp⁽^−W ^·X⁾ .

1+exp(−W ·X)

11.5. REGRESSION MODELING WITH NUMERIC CLASSES

359

is used instead of the logit function. Ordered probit regression can model ordered integer values within a range (e.g., ratings) for the response variable by using the quantiles of a standard normal distribution. The key insight of GLM is to choose the link function and distribution assumption judiciously depending on the nature of the observed response in a specific application. Generalized linear models can be viewed as a unification of large classes of regression models, such as linear regression, logistic regression, probit regression, and Poisson regression.

11.5.4 Nonlinear and Polynomial Regression

Linear regression cannot capture nonlinear relationships such as those in Fig. 11.1b. The basic linear regression approach can be used for nonlinear regression by using derived input features . For example, consider a new set of m features denoted by h₁(X_j ) . . . h_m(X_j ) for the jth data point. Here, h_i(·) represents a nonlinear transformation function from the d-dimensional input feature space to 1-dimensional space. This results in a new n × m input data matrix. By applying linear regression on this derived data matrix, one is able to model relationships of the following form:

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 214 215 216 217 218 219 220 221 ... 423