Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə339/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   335   336   337   338   339   340   341   342   ...   423
1-Data Mining tarjima

17.2. MATCHING AND DISTANCE COMPUTATION IN GRAPHS

565

Algorithm MCG(Graphs: G1, G2, Current Partially Matched Pairs: M,


Current Best Match: Mbest)
begin

  • = Set of all label matching node pairs from (G1, G2) not in M; Prune C using heuristic methods; (Optional efficiency optimization) for each pair (i1, i2) ∈ C do

if M ∪ {(i1, i2)} is a valid matching


then Mbest =MCG(G1, G2, M ∪ {(i1, i2)}, Mbest);


endfor
if (|M| > |Mbest|) then return(M) else return(Mbest);


end

Figure 17.5: Maximum common subgraph (MCG) algorithm


As in the case of the subgraph isomorphism algorithm, the candidate matching node-pairs are explored recursively. The same steps of candidate extension and pruning are used in the MCG algorithm, as in the case of the subgraph isomorphism problem. However, some of the pruning steps used in the subgraph isomorphism algorithm, which are based on subgraph assumptions, can no longer be used. For example, in the MCG algorithm, a matching node-pair (i1 , i2) in M no longer needs to satisfy the constraint that the degree of a node in one graph is greater or less than that of its matching node in the other. Because of the more limited pruning in the maximum common subgraph problem, it will explore a larger search space. This is intuitively reasonable, because the maximum common subgraph problem is a more general one than subgraph isomorphism. However, some optimizations such as expanding only to connected nodes, and sequencing optimizations such as processing rare labels earlier, can still be used.

The largest common subgraph found so far is tracked in Mbest. At the end of the procedure, the largest matching subgraph found so far is returned by the algorithm. It is also relatively easy to modify this algorithm to determine all possible MCGs. The main difference is that all the current MCGs can be dynamically tracked instead of tracking a single MCG.


17.2.3 Graph Matching Methods for Distance Computation

Graph matching methods are closely related to distance computation between graphs. This is because pairs of graphs that share large subgraphs in common are likely to be more similar. A second way to compute distances between graphs is by using the edit distance. The edit distance in graphs is analogous to the notion of the edit distance in strings. Both these methods will be discussed in this section.


17.2.3.1 MCG-based Distances


When two graphs share a large subgraph in common, it is indicative of similarity. There are several ways of transforming the MCG size into a distance value. Some of these distance definitions have also been demonstrated to be metrics because they are nonnegative, sym-metric, and satisfy the triangle inequality. Let the MCG of graphs G1 and G2 be denoted by M CS(G1, G2) with a size of |M CS(G1, G2)|. Let the sizes of the graphs G1 and G2


566 CHAPTER 17. MINING GRAPH DATA


be denoted by |G1| and |G2|, respectively. The various distance measures are defined as a function of these quantities.





  1. Unnormalized non-matching measure: The unnormalized non-matching distance mea-sure U (G1, G2) between two graphs is defined as follows:





Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   335   336   337   338   339   340   341   342   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin