4.4.
|
FREQUENT ITEMSET MINING ALGORITHMS
|
119
|
Algorithm FP-growth(FP-Tree of frequent items: F PT , Minimum Support: minsup,
|
|
|
Current Suffix: P )
|
|
begin
|
|
if
|
F PT is a single path
|
|
|
then determine all combinations C of nodes on the
|
|
|
path, and report C ∪ P as frequent;
|
|
else (Case when F PT is not a single path)
for each item i in F PT do begin
report itemset Pi = {i} ∪ P as frequent;
Use pointers to extract conditional prefix paths from F PT containing item i;
Readjust counts of prefix paths and remove i;
Remove infrequent items from prefix paths and reconstruct conditional FP-Tree F PT i;
if (F PT i = φ) then FP-growth(F PT i, minsup, Pi); end
end
Figure 4.12: The FP-growth algorithm with an FP-Tree representation of the transaction database expressed in terms of frequent 1-items
level of consolidation at higher level nodes in the trie-like FP -Tree structure for a particular data set. Different data structures may be more suitable for different data sets.
Because projected databases are repeatedly constructed and scanned during recursive calls, it is crucial to maintain them in main memory. Otherwise, drastic disk-access costs will be incurred by the potentially exponential number of recursive calls. The sizes of the projected databases increase with the original database size. For certain kinds of databases with limited consolidation of repeated transactions, the number of distinct transactions in the projected database will always be approximately proportional to the number of transactions in the original database, where the proportionality factor f is equal to the (fractional) minimum support. For databases that are larger than a factor 1/f of the main memory availability, projected databases may not fit in main memory either. Therefore, the limiting factor on the use of the approach is the size of the original transaction database. This issue is specific to almost all projection-based methods and vertical counting methods. Memory is always at a premium in such methods and therefore it is crucial for projected transaction data structures to be designed as compactly as possible. As we will discuss later, the Partition framework of Savasere et al. [446] provides a partial solution to this issue at the expense of running time.
4.4.4.5 Relationship Between FP-Growth and Enumeration-Tree Methods
FP-growth is popularly believed to be radically different from enumeration-tree methods. This is, in part, because FP-growth was originally presented as a method that extracts frequent patterns without candidate generation. However, such an exposition provides an incomplete understanding of how the search space of patterns is explored. FP-growth is an instantiation of enumeration-tree methods. All enumeration-tree methods generate candi-date extensions to grow the tree. In the following, we will show the equivalence between enumeration-tree methods and FP-growth.
120 CHAPTER 4. ASSOCIATION PATTERN MINING
Figure 4.13: Enumeration trees are identical to FP-growth recursion trees with reverse lex-icographic ordering
Dostları ilə paylaş: |