Data Mining: The Textbook


Partition is fundamentally no different from that of



Yüklə 17,13 Mb.
səhifə86/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   82   83   84   85   86   87   88   89   ...   423
1-Data Mining tarjima

Partition is fundamentally no different from that of Eclat. The Eclat algorithm partitions the lattice based on common prefixes, calling them equivalence classes, and then uses a breadth -first approach [537] over each of these smaller sublattices in main mem-ory. This type of lattice partitioning was adopted from parallel versions of Apriori, such as the Candidate Distribution algorithm [54], where a similar choice exists between lattice partitioning and data partitioning. Because a breadth-first approach is used for search on each sublattice, such an approach has significantly higher memory requirements than a pure depth-first approach. As stated in [534 ], Eclat explicitly decouples the lattice decomposition phase from the pattern search phase. This is different from a pure depth -first strategy in which both are tightly integrated. Depth-first algorithms do not require an explicitly decou-pled approach for reduction of memory requirements. Therefore, the lattice- partitioning in Eclat, which was motivated by the Candidate Distribution algorithm [54], seems to have been specifically designed with a breadth-first approach in mind for the second (pattern search) phase. Both the conference [537] and journal versions [534] of the Eclat algorithm state that a breadth-first (bottom-up) procedure is used in the second phase for all experi-ments. FP-growth [252] and DepthProject [4] were independently proposed as the first depth-first algorithms for frequent pattern mining. MAFIA was the first vertical method to use a pure depth-first approach [123]. Other later variations of vertical algorithms, such as GenMax and dEclat [233 , 538 ], also incorporated the depth-first approach. The notion of diffsets [538, 233], which uses incremental vertical lists along the enumeration tree hierar-chy, was also proposed in these algorithms. The approach provides memory and efficiency advantages for certain types of data sets.

Numerous measures for finding interesting frequent patterns have been proposed. The χ2 measure was one of the first such tests, and was discussed in [113]. This measure satisfies the upward closure property. Therefore, efficient pattern mining algorithms can be devised. The use of the min-hashing technique for determining interesting patterns without support counting was discussed in [180]. The impact of skews in the support of individual items has been addressed in [517]. An affinity-based algorithm for mining interesting patterns in data with skews has been proposed in the same work. A common scenario in which there is significant skew in support distributions is that of mining negative association rules [447]. The collective strength model was proposed in [16], and a level-wise algorithm for finding all strongly collective itemsets was discussed in the same work. The collective strength model can also discover negative associations from the data. The work in [486] addresses the problem of selecting the right measure for finding interesting association rules.


Sampling is a popular approach for finding frequent patterns in an efficient way with memory-resident algorithms. The first sampling approach was discussed in [ 493], and theo-retical bounds were presented. The work in [446 ] enables the application of memory-based frequent pattern mining algorithms on large data sets by using ensembles on data partitions. The problem of finding quantitative association rules, and different kinds of patterns from quantitative data is discussed in [476]. The CLIQUE algorithm can also be considered an association pattern mining algorithm on quantitative data [58].


132 CHAPTER 4. ASSOCIATION PATTERN MINING





4.9 Exercises




1. Consider the transaction database in the table below:































tid

Items


































1


Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   82   83   84   85   86   87   88   89   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin