Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə243/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   239   240   241   242   243   244   245   246   ...   423
1-Data Mining tarjima

E[X] ≥ αP (X > α)

(12.4)

The above inequality can be rearranged to obtain the final result.


The Markov inequality is defined only for probability distributions of nonnegative values and provides a bound only on the upper tail. In practice, it is often desired to bound both tails of probability distributions over both positive and negative values.


Consider the case where X is a random variable that is not necessarily nonnegative. The Chebychev inequality is a useful approach to derive symmetric tail bounds on X. The Chebychev inequality is a direct application of the Markov inequality to a nonnegative (square deviation-based) derivative of X.





Theorem 12.2.2 (Chebychev Inequality) Let X be an arbitrary random

variable.

Then, for any constant α, the following is true:




P (|X − E[X]| > α) ≤ V ar[X|/α2

(12.5)

Proof: The inequality |X − E[X]| > α is true if and only if (X − E[X])2 > α2. By defining



  • = (X − E[X])2 as a (nonnegative) derivative random variable from X, it is easy to see that E[Y ] = V ar[X]. Then, the expression on the left-hand side of the theorem statement is the same as determining the probability P (Y > α2). By applying the Markov inequality to the random variable Y , one can obtain the desired result.

The main trick used in the aforementioned proof was to apply the Markov inequality to a nonnegative function of the random variable. This technique can generally be very useful for proving other kinds of bounds, when the distribution of X has a specific form (such as





12.2. SYNOPSIS DATA STRUCTURES FOR STREAMS

395

the sum of Bernoulli random variables). In such cases, a parameterized function of the ran-dom variable can be used to obtain a parameterized bound. The underlying parameter can then be optimized for the tightest possible bound. Several well-known bounds such as the Chernoff bound and the Hoeffding inequality are derived with the use of this approach. Such bounds are significantly tighter than the (much weaker) Markov and Chebychev inequali-ties. This is because the parameter optimization process implicitly creates a bound that is optimized for the special form of the corresponding probability distribution of the random variable X.


Many practical scenarios can be captured with the use of special families of random variables. A particular case is one in which a random variable X may be expressed as a sum of other independent bounded random variables. For example, consider the case where the data points have binary class labels associated with them and one wishes to use a stream sample to estimate the fraction of examples belonging to each class. While the fraction of points in the sample belonging to a class provides an estimate, how can one bound the probabilistic accuracy of this bound? Note that the estimated fraction can be expressed as a (scaled) sum of independent and identically distributed (i.i.d.) binary random variables, depending on the binary class associated with each sample instance. The Chernoff bound provides an excellent bound on the accuracy of the estimate.


A second example is one where the underlying random variables are not necessarily binary, but are bounded. For example, consider the case where the stream data points correspond to individuals of a particular age. The average age of the individuals is estimated with the use of an average of the points in the reservoir sample. Note that the age can be (realistically) assumed to be a bounded random variable from the range (0, 125). In such cases, the Hoeffding bound can be used to determine a tight bound on the estimate.


First, the Chernoff bound will be introduced. Because the expressions for the lower tail and upper tail are slightly different, they will be addressed separately. The lower-tail Chernoff bound is introduced below.


Theorem 12.2.3 (Lower-Tail Chernoff Bound) Let X be a random variable that can be expressed as the sum of n independent binary (Bernoulli) random variables, each of which takes on the value of 1 with probability pi.





Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   239   240   241   242   243   244   245   246   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin