CHAPTER 5. ASSOCIATION PATTERN MINING: ADVANCED CONCEPTS
5.4.7 Other Applications for Complex Data Types
Frequent pattern mining algorithms have been generalized to more complex data types such as temporal data, spatial data, and graph data. This book contains different chapters for these complex data types. A brief discussion of these more complex applications is provided here:
Temporal Web log analytics: The use of temporal information from Web logs greatly enriches the analysis process. For example, certain patterns of accesses may occur frequently in the logs and these can be used to build event prediction models in cases where future events may be predicted from the current pattern of events.
Spatial co-location patterns: Spatial co-location patterns provide useful insights about the spatial correlations among different individuals. Frequent pattern mining algo-rithms have been generalized to such scenarios. Refer to Chap. 16.
Chemical and biological graph applications: In many real scenarios, such as chemical and biological compounds, the determination of structural patterns provides insights about the properties of these molecules. Such patterns are also used to create classi-fication models. These methods are discussed in Chap. 17.
Software bug analysis: The structure of computer programs can often be represented as call graphs. The analysis of the frequent patterns in the call graphs and key deviations from these patterns provides insights about the bugs in the underlying software.
Many of the aforementioned applications will be discussed in later chapters of this book.
5.5 Summary
In order to use frequent patterns effectively in data-driven applications, it is crucial to create concise summaries of the underlying patterns. This is because the number of returned patterns may be very large and difficult to interpret. Numerous methods have been designed to create a compressed summary of the frequent patterns. Maximal patterns provide a concise summary but are lossy in terms of the support of the underlying patterns. They can often be determined effectively by incorporating different kinds of pruning strategies in frequent pattern mining algorithms.
Closed patterns provide a lossless description of the underlying frequent itemsets. On the other hand, the compression obtained from closed patterns is not quite as significant as that obtained from the use of maximal patterns. The concept of “almost closed” itemsets allows good compression, but there is some degree of information loss in the process. A different way of compressing itemsets is to cluster itemsets so that all itemsets can be expressed within a prespecified distance of particular representatives.
Query processing of itemsets is important in the context of many applications. For example, the itemset lattice can be used to resolve simple queries on itemsets. In some cases, the lattice may not fit in main memory. For these cases, it may be desirable to use disk resident data structures such as the inverted index or the signature table. In cases where the constraints are arbitrary or have a high level of selectivity, it may be desirable to push the constraints directly into the mining process.
Frequent pattern mining has many applications, including its use as a subroutine for other data mining problems. Other applications include market basket analysis, profile
5.6. BIBLIOGRAPHIC NOTES
|
151
|
analysis, recommendations, Web log analysis, spatial data, and chemical data. Many of these applications are discussed in later chapters of this book.
5.6 Bibliographic Notes
The first algorithm for maximal pattern mining was proposed in [82]. Subsequently, the DepthProject [4] and GenMax [233] algorithms were also designed for maximal pattern min-ing. DepthProject showed that the depth-first method has several advantages for determining maximal patterns. Vertical bitmaps were used in MAFIA [123] to compress the sizes of the underlying tid lists. The problem of closed pattern mining was first proposed in [417] in which an Apriori-based algorithm, known as A-Close, was presented. Subsequently, numer-ous algorithms such as CLOSET [420], CLOSET+ [504], and CHARM [539] were proposed for closed frequent pattern mining. The last of these algorithms uses the vertical data for-mat to mine long patterns in a more efficient way. For the case of very high-dimensional data sets, closed pattern mining algorithms were proposed in the form of CARPENTER and COBBLER, respectively [413, 415]. Another method, known as pattern-fusion [553], fuses the different pattern segments together to create a long pattern.
The work in [125] shows how to use deduction rules to construct a minimal representa-tion for all frequent itemsets. An excellent survey on condensed representations of frequent itemsets may be found in [126]. Numerous methods have subsequently been proposed to approximate closures in the form of δ -freesets [107]. Information-theoretic methods for item-set compression have been discussed in [470].
The use of clustering-based methods for compression focuses on the itemsets rather than the transactions. The work in [515] clusters the patterns on the basis of their similarity and frequency to create a condensed representation of the patterns. The submodularity property used in the greedy algorithm for finding the best set of covering itemsets is discussed in [403].
The algorithm for using the itemset lattice for interactive rule exploration is discussed in [37]. The concepts of simple redundancy and strict redundancy are also discussed in this work. This method was also generalized to the case of profile association rules [38]. The inverted index, presented in this chapter, may be found in [441]. A discussion of a market basket-specific implementation, together with the signature table, may be found in [41]. A compact disk structure for storing and querying frequent itemsets has been studied in [359].
A variety of constraint -based methods have been developed for pattern mining. Succinct constraints are the easiest to address because they can be pushed directly into data selection. Monotonic constraints need to be checked only once to restrict pattern growth [406, 332], whereas antimonotonic constraints need to be pushed deep into the pattern mining process. Another form of pattern mining, known as convertible constraints [422], can be addressed by sorting items in ascending or descending order for restraining pattern growth.
The CLIQUE algorithm [58 ] shows how association pattern mining methods may be used for clustering algorithms. The CBA algorithm for rule-based classification is dis-cussed in [358]. A survey on rule-based classification methods may be found in [115]. The frequent pattern mining problem has also been used for outlier detection in very long transactions [ 263]. Frequent pattern mining has also been used in the field of bioinformat-ics [413, 415]. The determination of localized associations [27] is very useful for the problem of recommendations and collaborative filtering. Methods for mining long frequent patterns in the context of bioinformatics applications may be found in [413, 415, 553]. Association rules can also be used to discover spatial co-location patterns [388]. A detailed discussion
Dostları ilə paylaş: |