Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə89/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   85   86   87   88   89   90   91   92   ...   423
1-Data Mining tarjima

SUPER COORDINATE1

ITEMSETS MAPPING TO SUPER COORDINATE1

SUPER COORDINATE2

ITEMSETS MAPPING TO SUPER COORDINATE2

SUPER COORDINATE3

ITEMSETS MAPPING TO SUPER COORDINATE3

SUPER COORDINATEr

ITEMSETS MAPPING TO SUPER COORDINATEr

Figure 5.3: Illustration of the signature table


approximate pattern of buying behavior for that itemset. Thus, it is crucial to perform the clustering of the items into signatures so that two criteria are satisfied:



  1. The items within a cluster Si are correlated. This ensures a more discriminative mapping, which provides better indexing performance.




  1. The aggregate support of the items within each cluster is similar. This is necessary to ensure that the signature table is balanced.

To construct the signature table, a graph is constructed so that each node of the graph corresponds to an item. For every pair of items that is frequent, an edge is added to the graph, and the weight of the edge is a function of the support of that pair of items. In addition, the weight of a node is the support of a particular item. It is desired to deter-mine a clustering of this graph into K partitions so that the cumulative weights of edges across the partitions is as small as possible and the partitions are well balanced. Reduc-ing the weights of edges across partitions ensures that correlated sets of items are grouped together. The partitions should be as well balanced as possible so that the itemsets mapping to each super-coordinate are as well balanced as possible. Thus, this approach transforms the items into a similarity graph, which can be clustered into partitions. A variety of clus-tering algorithms can be used to partition the graph into groups of items. Any of the graph clustering algorithms discussed in Chap. 19, such as METIS, may be used for this pur-pose. The bibliographic notes also contain pointers to some methods for signature table construction.


After the signatures have been determined, the partitions of the data may be defined by using the super-coordinates of the itemsets. Each itemset belongs to the partition that is defined by its super-coordinate. Unlike the case of inverted lists, the itemsets are explicitly stored within this list, rather than just their identifiers. This ensures that the secondary data structure does not need to be accessed to explicitly recover the itemsets. This is the reason that the signature table can be used to recover the itemsets themselves, rather than only the identifiers of the itemsets.


The signature table is capable of handling general similarity queries that cannot be efficiently addressed with inverted lists. Let x be the number of items in which an itemset matches with a target Q, and y be the number of items in which it differs with the target Q. The signature table is capable of handling similarity functions of the form f (x, y) that





  1. CHAPTER 5. ASSOCIATION PATTERN MINING: ADVANCED CONCEPTS

satisfy the following two properties, for a fixed target record Q:





Δf (x, y)

≥ 0

(5.2)










Δx




Δf (x, y)

≤ 0

(5.3)










Δy




This is referred to as the monotonicity property. These intuitive conditions on the function ensure that it is an increasing function in the number of matches and decreasing in the hamming distance. While the match function and the hamming distance obviously satisfy these conditions, it can be shown that other functions for set-wise similarity, such as the cosine and the Jaccard coefficient, also satisfy them. For example, let P and Q be the sets of items in two itemsets, where Q is the target itemset. Then, the cosine function can be expressed as follows, in terms of x and y:






Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   85   86   87   88   89   90   91   92   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin