A
|
C
|
|
A
|
C
|
|
A
|
|
|
A
|
|
|
B
|
|
|
B
|
C
|
|
+
|
|
JOIN
|
POSSIBILITY 1
|
|
|
|
|
|
|
|
|
|
|
|
A
|
|
|
A
|
C
|
|
A
|
|
|
A
|
|
|
B
|
C
|
|
B
|
C
|
|
|
|
|
POSSIBILITY 2
|
|
|
Figure 17.12: Candidates generated using node-based join of two graphs
578 CHAPTER 17. MINING GRAPH DATA
17.4.1 Node-Based Join Growth
In the case of node-based joins, the size of a frequent subgraph in Fk refers to the number of nodes k in it. The singleton graphs in F1 contain a single node. These are node labels that are present in at least minsup graphs in the graph database G. For two graphs from Fk to be joined, a matching subgraph with (k − 1) nodes must exist between the two graphs. This matching subgraph is also referred to as the core. When two k-subgraphs with (k − 1) common nodes are joined to create a candidate with (k + 1) nodes, an ambiguity exists, as to whether or not an edge exists between the two non-matching nodes. Therefore, two possible graphs are generated, depending on whether or not an edge exists between the nodes that are not common between the two. An example of the two possibilities for generating candidate subgraphs is illustrated in Fig. 17.12. While this chapter does not assume that edge labels are associated with graphs, the number of possible joins will be even larger when labels are associated with edges. This is because each possible edge label must be associated with the newly created edge. This will result in a larger number of candidates. Furthermore, in cases where there are isomorphic matchings of size (k − 1) between the two frequent subgraphs, candidates may need to be generated for each such mapping (see Exercise 8). Thus, all possible (k − 1) common subgraphs need to be discovered between a pair of graphs, in order to generate the candidates. Thus, the explosion in the number of candidate patterns is usually more significant in the case of frequent subgraph discovery, than in the case of frequent pattern discovery.
17.4.2 Edge-Based Join Growth
In the case of edge-based joins, the size of a frequent subgraph in Fk refers to the number of edges k in it. The singleton graphs in F1 contain a single edge. These correspond to edges between specific node labels that are present in at least minsup graphs in the database G. In order for two graphs from Fk to be joined, a matching subgraph with (k − 1) edges needs to be present in the two graphs. The resulting candidate will contain exactly ( k + 1) edges. Interestingly, the number of nodes in the candidate may not necessarily be greater than that in the individual subgraphs that are joined. In Fig. 17.13, the two possible candidates that are constructed using edge-based joins are illustrated. Note that one of the generated candidates has the same number of nodes as the original pair of graphs. As in the case of node-based joins, one needs to account for isomorphism in the process of candidate gener-ation. Edge-based join growth tends to generate fewer candidates in total and is therefore generally more efficient. The bibliographic notes contain pointers to more details about these methods.
17.4.3 Frequent Pattern Mining to Graph Pattern Mining
The similarity between the aforementioned approach and Apriori is quite striking. The join-based growth strategy can also be generalized to an enumeration tree -like strategy. However, the analogous candidate tree can be generated in two different ways, corresponding to node-and edge-based extensions, respectively. Furthermore, tree growth is more complex because of isomorphism. GraphApriori uses a breadth-first candidate-tree generation approach as in all Apriori-like methods. It is also possible to use other strategies, such as depth-first methods, to grow the tree of candidates. As discussed in Chap. 4, almost all frequent pattern mining algorithms, including1 Apriori and FP-growth, should be considered enumeration-
See the discussion in Sect. 4.4.4.5 of Chap. 4.
17.5. GRAPH CLUSTERING
|
|
|
|
579
|
|
|
Dostları ilə paylaş: |