Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə202/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   198   199   200   201   202   203   204   205   ...   423
1-Data Mining tarjima

Accuracy: The accuracy is the fraction of test instances in which the predicted value matches the ground-truth value.




  1. Cost-sensitive accuracy: Not all classes are equally important in all scenarios while comparing the accuracy. This is particularly important in imbalanced class problems, which will be discussed in more detail in the next chapter. For example, consider an application in which it is desirable to classify tumors as malignant or nonmalignant where the former is much rarer than the latter. In such cases, the misclassification of the former is often much less desirable than misclassification of the latter. This is frequently quantified by imposing differential costs c1 . . . ck on the misclassification of the different classes. Let n1 . . . nk be the number of test instances belonging to each class. Furthermore, let a1 . . . ak be the accuracies (expressed as a fraction) on the subset of test instances belonging to each class. Then, the overall accuracy A can be computed as a weighted combination of the accuracies over the individual labels.



k
A = i=1 ciniai (10.77)


k
i=1 cini

The cost sensitive accuracy is the same as the unweighted accuracy when all costs c1 . . . ck are the same.


Aside from the accuracy, the statistical robustness of a model is also an important issue. For example, if two classifiers are trained over a small number of test instances and compared, the difference in accuracy may be a result of random variations, rather than a truly statis-tically significant difference between the two classifiers. Therefore, it is important to design statistical measures to quantify the specific advantage of one classifier over the other.


Most statistical methodologies such as holdout, bootstrap, and cross-validation use b > 1 different randomly sampled rounds to obtain multiple estimates of the accuracy. For the purpose of discussion, let us assume that b different rounds (i.e., b different m-way partitions) of cross-validation are used. Let M1 and M2 be two models. Let Ai,1 and Ai,2 be the respective accuracies of the models M1 and M2 on the partitioning created by the ith round of cross-validation. The corresponding difference in accuracy is δai = Ai,1 − Ai,2. This results in b estimates δa1 . . . δab. Note that δai might be either positive or negative, depending on which classifier provides superior performance on a particular round of cross-validation. Let the average difference in accuracy between the two classifiers be ΔA.








b

δai







ΔA =

i=1

(10.78)




b



















The standard deviation σ of the difference in accuracy may be estimated as follows:









Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   198   199   200   201   202   203   204   205   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin