Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə64/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   60   61   62   63   64   65   66   67   ...   423
1-Data Mining tarjima

{Bread, M ilk}, {Eggs, M ilk}, {Cheese, M ilk}, {Eggs, Y ogurt}, and {M ilk, Y ogurt}. The only 3-itemset reported at a support level of 0.3 is {Eggs, M ilk, Y ogurt}. On the other hand, if the minimum support level is set to 0.2, it corresponds to an absolute support value of only 1. In such a case, every subset of every transaction will be reported. Therefore, the use of lower minimum support levels yields a larger number of frequent patterns. On the other hand, if the support level is too high, then no frequent patterns will be found. Therefore, an appropriate choice of the support level is crucial for discovering a set of frequent patterns with meaningful size.

When an itemset I is contained in a transaction, all its subsets will also be contained in the transaction. Therefore, the support of any subset J of I will always be at least equal to that of I. This property is referred to as the support monotonicity property.


Property 4.2.1 (Support Monotonicity Property) The support of every subset J of





  • is at least equal to that of the support of itemset I.




sup(J) ≥ sup(I) ∀J ⊆ I

(4.1)

The monotonicity property of support implies that every subset of a frequent itemset will also be frequent. This is referred to as the downward closure property.


Property 4.2.2 (Downward Closure Property) Every subset of a frequent itemset is also frequent.


The downward closure property of frequent patterns is algorithmically very convenient because it provides an important constraint on the inherent structure of frequent patterns. This constraint is often leveraged by frequent pattern mining algorithms to prune the search process and achieve greater efficiency. Furthermore, the downward closure property can be used to create concise representations of frequent patterns, wherein only the maximal frequent subsets are retained.


Definition 4.2.4 (Maximal Frequent Itemsets) A frequent itemset is maximal at a given minimum support level minsup, if it is frequent, and no superset of it is frequent.

In the example of Table 4.1, the itemset {Eggs, M ilk, Y ogurt} is a maximal frequent item-set at a minimum support level of 0.3. However, the itemset {Eggs, M ilk} is not maxi-mal because it has a superset that is also frequent. Furthermore, the set of maximal fre-quent patterns at a minimum support level of 0.3 is {Bread, M ilk}, {Cheese, M ilk}, and {Eggs, M ilk, Y ogurt}. Thus, there are only 3 maximal frequent itemsets, whereas the num-ber of frequent itemsets in the entire transaction database is 11. All frequent itemsets can be derived from the maximal patterns by enumerating the subsets of the maximal frequent



4.3. ASSOCIATION RULE GENERATION FRAMEWORK

97

Figure 4.1: The itemset lattice


patterns. Therefore, the maximal patterns can be considered condensed representations of the frequent patterns. However, this condensed representation does not retain information about the support values of the subsets. For example, the support of {Eggs, M ilk, Y ogurt} is 0.4, but it does not provide any information about the support of {Eggs, M ilk}, which is 0.6. A different condensed representation, referred to as closed frequent itemsets , is able to retain support information as well. The notion of closed frequent itemsets will be studied in detail in Chap. 5.

An interesting property of itemsets is that they can be conceptually arranged in the form of a lattice of itemsets. This lattice contains one node for each of the 2|U| sets drawn from the universe of items U . An edge exists between a pair of nodes, if the corresponding sets differ by exactly one item. An example of an itemset lattice of size 25 = 32 on a universe of 5 items is illustrated in Fig. 4.1. The lattice represents the search space of frequent patterns. All frequent pattern mining algorithms, implicitly or explicitly, traverse this search space to determine the frequent patterns.


The lattice is separated into frequent and infrequent itemsets by a border, which is illus-trated by a dashed line in Fig. 4.1. All itemsets above this border are frequent, whereas those below the border are infrequent. Note that all maximal frequent itemsets are adjacent to this border of itemsets. Furthermore, any valid border representing a true division between frequent and infrequent itemsets will always respect the downward closure property.


4.3 Association Rule Generation Framework


Frequent itemsets can be used to generate association rules, with the use of a measure known as the confidence. The confidence of a rule X ⇒ Y is the conditional probability that a transaction contains the set of items Y , given that it contains the set X. This probability is estimated by dividing the support of itemset X ∪ Y with that of itemset X.


Definition 4.3.1 (Confidence) Let X and Y be two sets of items. The confidence conf (X ∪ Y ) of the rule X ∪ Y is the conditional probability of X ∪ Y occurring in a


98 CHAPTER 4. ASSOCIATION PATTERN MINING





Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   60   61   62   63   64   65   66   67   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin