Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	339/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 335 336 337 338 339 340 341 342 ... 423

1-Data Mining tarjima

17.2. MATCHING AND DISTANCE COMPUTATION IN GRAPHS

565

Algorithm MCG(Graphs: G₁, G₂, Current Partially Matched Pairs: M,

Current Best Match: M_best)
begin

= Set of all label matching node pairs from (G₁, G₂) not in M; Prune C using heuristic methods; (Optional eﬃciency optimization) for each pair (i₁, i₂) ∈ C do

if M ∪ {(i₁, i₂)} is a valid matching

then M_best =MCG(G₁, G₂, M ∪ {(i₁, i₂)}, M_best);

endfor
if (|M| > |M_best|) then return(M) else return(M_best);

end

Figure 17.5: Maximum common subgraph (MCG) algorithm

As in the case of the subgraph isomorphism algorithm, the candidate matching node-pairs are explored recursively. The same steps of candidate extension and pruning are used in the MCG algorithm, as in the case of the subgraph isomorphism problem. However, some of the pruning steps used in the subgraph isomorphism algorithm, which are based on subgraph assumptions, can no longer be used. For example, in the MCG algorithm, a matching node-pair (i₁ , i₂) in M no longer needs to satisfy the constraint that the degree of a node in one graph is greater or less than that of its matching node in the other. Because of the more limited pruning in the maximum common subgraph problem, it will explore a larger search space. This is intuitively reasonable, because the maximum common subgraph problem is a more general one than subgraph isomorphism. However, some optimizations such as expanding only to connected nodes, and sequencing optimizations such as processing rare labels earlier, can still be used.

The largest common subgraph found so far is tracked in M_best. At the end of the procedure, the largest matching subgraph found so far is returned by the algorithm. It is also relatively easy to modify this algorithm to determine all possible MCGs. The main diﬀerence is that all the current MCGs can be dynamically tracked instead of tracking a single MCG.

17.2.3 Graph Matching Methods for Distance Computation

Graph matching methods are closely related to distance computation between graphs. This is because pairs of graphs that share large subgraphs in common are likely to be more similar. A second way to compute distances between graphs is by using the edit distance. The edit distance in graphs is analogous to the notion of the edit distance in strings. Both these methods will be discussed in this section.

17.2.3.1 MCG-based Distances

When two graphs share a large subgraph in common, it is indicative of similarity. There are several ways of transforming the MCG size into a distance value. Some of these distance definitions have also been demonstrated to be metrics because they are nonnegative, sym-metric, and satisfy the triangle inequality. Let the MCG of graphs G₁ and G₂ be denoted by M CS(G₁, G₂) with a size of |M CS(G₁, G₂)|. Let the sizes of the graphs G₁ and G₂

566 CHAPTER 17. MINING GRAPH DATA

be denoted by |G₁| and |G₂|, respectively. The various distance measures are defined as a function of these quantities.

Unnormalized non-matching measure: The unnormalized non-matching distance mea-sure U (G₁, G₂) between two graphs is defined as follows:

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 335 336 337 338 339 340 341 342 ... 423