Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	90/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 86 87 88 89 90 91 92 93 ... 423

1-Data Mining tarjima

Cosine(P, Q) =

|P| · |Q|

(2 · x + y − |Q|) · |Q|

x
J accard(P, Q) = _x ₊ _y

These functions are increasing in x and decreasing in y. These properties are important because they allow bounds to be computed on the similarity function in terms of bounds on the arguments. In other words, if γ is an upper bound on the value of x and θ is a lower bound on the value of y, then it can be shown that f (γ, θ) is an upper (optimistic) bound on the value of the function f (x, y). This is useful for implementing a branch-and-bound method for similarity computation.

Let Q be the target itemset. Optimistic bounds on the match and hamming distance from Q to the itemsets within each super-coordinate are computed. These bounds can be shown to be a function of the target Q, the activation threshold, and the choice of signatures. The precise method for computing these bounds is described in the pointers found in the bibliographic notes. Let the optimistic bound on the match be x_i and that on distance be y_i for the ith super-coordinate. These are used to determine an optimistic bound on the similarity f (x, y) between the target and any itemset indexed by the ith super-coordinate. Because of the monotonicity property, this optimistic bound for the ith super-coordinate is B_i = f (x_i, y_i). The super-coordinates are sorted in decreasing (worsening) order of the optimistic bounds B_i. The similarity of Q to the itemsets that are pointed to by these super-coordinates is computed in this sorted order. The closest itemset found so far is dynamically maintained. The process terminates when the optimistic bound B_i to a super-coordinate is lower (worse) than the similarity value of the closest itemset found so far to the target. At this point, the closest itemset found so far is reported.

5.3.2 Pushing Constraints into Pattern Mining

The methods discussed so far in this chapter are designed for retrieval queries with item-specific constraints. In practice, however, the constraints may be much more general and cannot be easily addressed with any particular data structure. In such cases, the constraints may be need to be directly pushed into the mining process.

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 86 87 88 89 90 91 92 93 ... 423