Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə301/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   297   298   299   300   301   302   303   304   ...   423
1-Data Mining tarjima

k = 1;
Fk = { All Frequent 1-item elements };
while Fk is not empty do begin
Generate Ck+1 by joining pairs of sequences in Fk, such that
removing an item from the first element of one sequence matches the sequence obtained by removing an item from the last element of the other;

Prune sequences from Ck+1 that violate downward closure; Determine Fk+1 by support counting on (Ck+1, T ) and retaining


sequences from Ck+1 with support at least minsup; k = k + 1;

end;
return(k F );


i=1 i
Figure 15.1: The GSP algorithm is related to the Apriori algorithm. The reader is encour-aged to compare this pseudocode with the Apriori algorithm described in Fig. 4.2 of Chap. 4

section provides a broad overview of how enumeration tree algorithms can be generalized to sequential pattern mining.


The GSP and Apriori algorithms are similar, except that the former needs to be designed for finding frequent sequences rather than sets. First, the notion of the length of candidates needs to be defined in sequential pattern mining. This notion needs to be defined more care-fully, because the individual elements in the sequence are sets rather than items. The length of a candidate or a frequent sequence is equal to the number of items (not elements) in the



candidate. In other words, a k-sequence) Y 1 . . . Yr

is a sequence with length

r

|Yi| = k.




i=1




Thus, Bread, Butter, Cheese}, {Cheese, Eggs}

is a 5-candidate, even though it contains




only 2 elements. This is because this sequence contains 5 items in total, including a repe-tition of “Cheese” in two distinct elements. A (k − 1)-subsequence of a k-candidate can be generated by removing an item from any element in the k-sequence. The Apriori property continues to hold for sequences because any (k − 1)-subsequence of a k-sequence will have support at least equal to that of the latter. This sets the stage for a candidate generate-and-test approach, together with downward closure pruning, which is analogous to Apriori.

The GSP algorithm starts by generating all frequent 1-item sequences by straightfor-ward counting of individual items. This set of frequent 1-sequences is represented by F1. Subsequent iterations construct Ck+1 by joining pairs of sequence patterns in Fk. The join process is different from association pattern mining because of the greater complexity in the definition of sequences. Any pair of frequent k-sequences S1 and S2 can be joined, if removing an item from the first element in one frequent k-sequence S1 is identical to the sequence obtained by removing an item from the last element in the other frequent sequence S2. For example, the two 5-sequences S1 = Bread, Butter, Cheese}, {Cheese, Eggs} and S2 = Bread, Butter }, {M ilk, Cheese, Eggs} can be joined because removing “Cheese from the first element of S1 will result in an identical sequence to that obtained by remov-ing “Milk” from the last element of S2. Note that if S2 were a 5-candidate with 3 elements corresponding to S2 = Bread, Butter }, {Cheese, Eggs}, {M ilk} , then a join can also be performed. This is because removing the last item from S2 creates a sequence with 2




Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   297   298   299   300   301   302   303   304   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin