Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə128/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   124   125   126   127   128   129   130   131   ...   423
1-Data Mining tarjima

Sim(Ti, Tj ) =

|Ti ∩ Tj |

.

(7.1)










|Ti ∪ Tj |







Subsequently, two data points Ti and Tj are defined to be neighbors, if the similarity Sim(Ti, T j ) between them is greater than a threshold θ. Thus, the concept of neighbors implicitly defines a graph structure on the data items, where the nodes correspond to


210 CHAPTER 7. CLUSTER ANALYSIS: ADVANCED CONCEPTS


the data items, and the links correspond to the neighborhood relations. The notation Link(Ti , T j ) denotes a shared nearest-neighbor similarity function, which is equal to the number of shared nearest neighbors between Ti and Tj .


The similarity function Link(T i, Tj ) provides a merging criterion for agglomerative algo-rithms. The algorithm starts with each data point (from the initially chosen sample) in its own cluster and then hierarchically merges clusters based on a similarity criterion between clusters. Intuitively, two clusters C1 and C2 should be merged, if the cumulative number of shared nearest neighbors between objects in C1 and C2 is large. Therefore, it is possible to generalize the notion of link-based similarity using clusters as arguments, as opposed to individual data points:





GroupLink(Ci, Cj ) =

Link(Tu, Tv).

(7.2)




Tu∈Ci,Tv ∈Cj




Note that this criterion has a slight resemblance to the group-average linkage criterion discussed in the previous chapter. However, this measure is not yet normalized because the expected number of cross-links between larger clusters is greater. Therefore, one must normalize by the expected number of cross-links between a pair of clusters to ensure that the merging of larger clusters is not unreasonably favored. Therefore, the normalized linkage criterion V (Ci, Cj ) is as follows:



V (

Ci

,

Cj

) =


Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   124   125   126   127   128   129   130   131   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin