Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə147/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   143   144   145   146   147   148   149   150   ...   423
1-Data Mining tarjima

8.2. EXTREME VALUE ANALYSIS

241







0.4


































0.35





































0.35


































0.3


































DENSITY


































DENSITY


































0.25











































BUT NOT






















0.3


































0.25


















































































OUTLIERS



















PROBABILITY


































PROBABILITY
































































0.2










EXTREME VALUES












































































0.2













































































































0.15





































0.15









































































0.1

LOWER



















UPPER










0.1



































































LOWER
















UPPER



















TAIL



















TAIL




























DENSITY






















DENSITY THRESHOLD
















0.05































0.05














































THRESHOLD










0

−4

−3

−2

−1

0

1

2

3

4

5




0

−4

−3

−2

−1

0

1

2

3

4

5







−5




−5






















VALUE


































VALUE



















(a) Symmetric distribution (b) Asymmetric distribution


Figure 8.2: Tails of a symmetric and asymmetric distribution


distributions. Some asymmetric distributions, such as an exponential distribution, may not even have a tail at one end of the distribution.


A model distribution is selected for quantifying the tail probability. The most commonly used model is the normal distribution. The density function fX (x) of the normal distribution with mean μ and standard deviation σ is defined as follows:





fX (x) =




1




−(x−μ)2

(8.1)
















e




.
















2·σ2






















2·π

·




σ ·
















A standard normal distribution is one in which the mean is 0, and the standard deviation σ is 1. In some application scenarios, the mean μ and standard deviation σ of the distribution may be known through prior domain knowledge. Alternatively, when a large number of data samples is available, the mean and standard deviation may be estimated very accurately. These can be used to compute the Z-value for a random variable. The Z-number zi of an observed value xi can be computed as follows:



zi = (xi − μ)/σ.

(8.2)

Large positive values of zi correspond to the upper tail, whereas large negative values correspond to the lower tail. The normal distribution can be expressed directly in terms of the Z-number because it corresponds to a scaled and translated random variable with a mean 0 and standard deviation of 1. The normal distribution of Eq. 8.3 can be written directly in terms of the Z-number, with the use of a standard normal distribution as follows:






1




2











Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   143   144   145   146   147   148   149   150   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin