Data Mining: The Textbook
In all the previous methods, a preprocess-once query-many paradigm is used; therefore, the querying process is limited by the initial minimum support chosen during the preprocess-ing phase. Although such an approach has the advantage of providing online capabilities for query responses, it is sometimes not effective when the constraints result in removal of most of the itemsets. In such cases, a much lower level of minimum support may be required than could be reasonably selected during the initial preprocessing phase. The advantage of pushing the constraints into the mining process is that the constraints can be used to prune out many of the intermediate itemsets during the execution of the frequent pattern mining algorithms. This allows the use of much lower minimum support levels. The price for this flexibility is that the resulting algorithms can no longer be considered truly online algorithms when the data sets are very large. Consider, for example, a scenario where the different items are tagged into different categories, such as snacks, dairy, baking products, and so on. It is desired to determine patterns, such that all items belong to the same category. Clearly, this is a constraint on the discovery of the underlying patterns. Although it is possible to first mine all the patterns, and then filter down to the relevant patterns, this is not an efficient solution. If the number of patterns mined during the preprocessing phase is no more than 106 and the level of selectivity of the constraint is more than 10−6, then the final set returned may be empty, or too small. Numerous methods have been developed in the literature to address such constraints directly in the mining process. These constraints are classified into different types, depend-ing upon their impact on the mining algorithm. Some examples of well-known types of constraints, include succinct, monotonic, antimonotonic, and convertible. A detailed descrip-tion of these methods is beyond the scope of this book. The bibliographic section contains pointers to many of these algorithms. 5.4 Putting Associations to Work: Applications Association pattern mining has numerous applications in a wide variety of real scenarios. This section will discuss some of these applications briefly. 5.4.1 Relationship to Other Data Mining Problems The association model is intimately related to other data mining problems such as classifica-tion, clustering, and outlier detection. Association patterns can be used to provide effective solutions to these data mining problems. This section will explore these relationships briefly. Many of the relevant algorithms are also discussed in the chapters on these different data mining problems. 5.4.1.1 Application to Classification The association pattern mining problem is closely related to that of classification. Rule-based classifiers are closely related to association-rule mining. These types of classifiers are discussed in detail in Sect. 10.4 of Chap. 10, and a brief overview is provided here. Consider the rule X ⇒ Y , where X is the antecedent and Y is the consequent. In asso-ciative classification, the consequent Y is a single item corresponding to the class variable, and the antecedent contains the feature variables. These rules are mined from the training data. Typically, the rules are not determined with the traditional support and confidence measures. Rather, the most discriminative rules with respect to the different classes need to CHAPTER 5. ASSOCIATION PATTERN MINING: ADVANCED CONCEPTS be determined. For example, consider an itemset X and two classes c1 and c2. Intuitively, the itemset X is discriminative between the two classes, if the absolute difference in the confidence of the rules X ⇒ c1 and X ⇒ c2 is as large as possible. Therefore, the mining process should determine such discriminative rules. Interestingly, it has been discovered, that even a relatively straightforward modification of the association framework to the classification problem is quite effective. An example of such a classifier is the CBA framework for Classification Based on Associations. More details on rule-based classifiers are discussed in Sect. 10.4 of Chap. 10. 5.4.1.2 Application to Clustering Because association patterns determine highly correlated subsets of attributes, they can be applied to quantitative data after discretization to determine dense regions in the data. The CLIQUE algorithm, discussed in Sect. 7.4.1 of Chap. 7, uses discretization to trans-form quantitative data into binary attributes. Association patterns are discovered on the transformed data. The data points that overlap with these regions are reported as subspace clusters. This approach, of course, reports clusters that are highly overlapping with one another. Nevertheless, the resulting groups correspond to the dense regions in the data, which provide significant insights about the underlying clusters. 5.4.1.3 Applications to Outlier Detection Association pattern mining has also been used to determine outliers in market basket data. The key idea here is that the outliers are defined as transactions that are not “covered” by most of the association patterns in the data. A transaction is said to be covered by an association pattern when the corresponding association pattern is contained in the trans-action. This approach is particularly useful in scenarios where the data is high dimensional and traditional distance-based algorithms cannot be easily used. Because transaction data is inherently high dimensional, such an approach is particularly effective. This approach is discussed in detail in Sect. 9.2.3 of Chap. 9. 5.4.2 Market Basket Analysis The prototypical problem for which the association rule mining problem was first proposed is that of market basket analysis. In this problem, it is desired to determine rules relating buying behavior of customers. The knowledge of such rules can be very useful for a retailer. For example, if an association rule reveals that the sale of beer implies a sale of diapers, then a merchant may use this information to optimize his or her shelf placement and promotion decisions. In particular, rules that are interesting or unexpected are the most informative for market basket analysis. Many of the traditional and alternative models for market basket analysis are focused on such decisions. 5.4.3 Demographic and Profile Analysis A closely related problem is that of using demographic profiles to make recommendations. An example is the rule discussed in Sect. 4.6.3 of Chap. 4. Yüklə 17,13 Mb. Dostları ilə paylaş: |