Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	78/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 74 75 76 77 78 79 80 81 ... 423

1-Data Mining tarjima

4.4.	FREQUENT ITEMSET MINING ALGORITHMS
119
Algorithm FP-growth(FP-Tree of frequent items: F PT , Minimum Support: minsup,
	Current Suﬃx: P )
begin
if	F PT is a single path
	then determine all combinations C of nodes on the
	path, and report C ∪ P as frequent;

else (Case when F PT is not a single path)

for each item i in F PT do begin

report itemset P_i = {i} ∪ P as frequent;
Use pointers to extract conditional prefix paths from F PT containing item i;
Readjust counts of prefix paths and remove i;

Remove infrequent items from prefix paths and reconstruct conditional FP-Tree F PT _i;

if (F PT _i = φ) then FP-growth(F PT _i, minsup, P_i); end

end

Figure 4.12: The FP-growth algorithm with an FP-Tree representation of the transaction database expressed in terms of frequent 1-items

level of consolidation at higher level nodes in the trie-like FP -Tree structure for a particular data set. Diﬀerent data structures may be more suitable for diﬀerent data sets.

Because projected databases are repeatedly constructed and scanned during recursive calls, it is crucial to maintain them in main memory. Otherwise, drastic disk-access costs will be incurred by the potentially exponential number of recursive calls. The sizes of the projected databases increase with the original database size. For certain kinds of databases with limited consolidation of repeated transactions, the number of distinct transactions in the projected database will always be approximately proportional to the number of transactions in the original database, where the proportionality factor f is equal to the (fractional) minimum support. For databases that are larger than a factor 1/f of the main memory availability, projected databases may not fit in main memory either. Therefore, the limiting factor on the use of the approach is the size of the original transaction database. This issue is specific to almost all projection-based methods and vertical counting methods. Memory is always at a premium in such methods and therefore it is crucial for projected transaction data structures to be designed as compactly as possible. As we will discuss later, the Partition framework of Savasere et al. [446] provides a partial solution to this issue at the expense of running time.

4.4.4.5 Relationship Between FP-Growth and Enumeration-Tree Methods

FP-growth is popularly believed to be radically diﬀerent from enumeration-tree methods. This is, in part, because FP-growth was originally presented as a method that extracts frequent patterns without candidate generation. However, such an exposition provides an incomplete understanding of how the search space of patterns is explored. FP-growth is an instantiation of enumeration-tree methods. All enumeration-tree methods generate candi-date extensions to grow the tree. In the following, we will show the equivalence between enumeration-tree methods and FP-growth.

120 CHAPTER 4. ASSOCIATION PATTERN MINING

Figure 4.13: Enumeration trees are identical to FP-growth recursion trees with reverse lex-icographic ordering

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 74 75 76 77 78 79 80 81 ... 423