Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə337/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   333   334   335   336   337   338   339   340   ...   423
1-Data Mining tarjima

17.2. MATCHING AND DISTANCE COMPUTATION IN GRAPHS

563

Algorithm SubgraphMatch(Query Graph: Gq, Data Graph: G,


Current Partially Matched Node Pairs: M)

begin
if (|M| = |Nq|) then return successful match M; (Case when |M| < |Nq |)



  • = Set of all label matching node pairs from (Gq, G) not in M;

Prune C using heuristic methods; (Optional efficiency optimization) for each pair (iq, i) ∈ C do


if M ∪ {(iq , i)} is a valid partial matching then SubgraphMatch(Gq , G, M ∪ {(iq, i)});


endfor

end

Figure 17.4: The basic template of Ullman’s algorithm

step, all possible label matching node-pairs between Gq and G, which are not already in M, are used to construct the candidate match set C.


Because the number of candidate match extensions can be large, it is often desirable to prune them heuristically by using specific properties of the data graph and query graph. Some examples of such heuristics will be presented later. After the pruned set C has been generated, node-pairs (iq , i) ∈ C are selected one by one, and it is checked whether they can be added to M to create a valid (partial) matching between the two graphs. For M ∪ {(iq , i)} to be a valid partial matching, if iq ∈ Nq is incident on any already matched node jq in Gq, then i must also be incident on the matched counterpart j of jq in G, and vice versa. If a valid partial matching exists, then the procedure is called recursively with the partial matching M ∪ {(iq , i)}. After iterating through all such candidate extensions with corresponding recursive calls, the algorithm backtracks to the next higher level of the recursion.


It is not difficult to see that the procedure has exponential complexity in terms of its input size, and it is especially sensitive to the query graph size. This high complexity is because the depth of the recursion can be of the order of the query graph size, and the number of recursive branches at each level is equal to the number of matching node-pairs. Clearly, unless the number of candidate extensions is carefully controlled with more effective candidate generation and pruning, the approach will be extremely slow.


17.2.1.1 Algorithm Variations and Refinements


Although the basic matching algorithm was originally proposed by Ullman, this template has been used extensively by different matching algorithms. The different algorithms vary from one another in terms of how the size of the candidate matched pairs is restricted with careful pruning. The use of carefully selected candidate sets has a significant impact on the efficiency of the algorithm. Most pruning methods rely on a number of natural constraints that are always satisfied by two graphs in a subgraph isomorphism relationship. Some common pruning rules are as follows:





  1. Ullman’s algorithm: This algorithm uses a simple pruning rule. All node-pairs (iq, i) are pruned from C in the pruning step if the degree of i is less than iq. This is because the degree of every matching node in the query subgraph needs to be no larger than the degree of its matching counterpart in the data graph.

564 CHAPTER 17. MINING GRAPH DATA






  1. Yüklə 17,13 Mb.

    Dostları ilə paylaş:
1   ...   333   334   335   336   337   338   339   340   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin