Data Mining: The Textbook



Yüklə 17,13 Mb.
səhifə90/423
tarix07.01.2024
ölçüsü17,13 Mb.
#211690
1   ...   86   87   88   89   90   91   92   93   ...   423
1-Data Mining tarjima

Cosine(P, Q) =







x




















































|P| · |Q|










=










x




















































(2 · x + y − |Q|) · |Q|










x
J accard(P, Q) = x + y

These functions are increasing in x and decreasing in y. These properties are important because they allow bounds to be computed on the similarity function in terms of bounds on the arguments. In other words, if γ is an upper bound on the value of x and θ is a lower bound on the value of y, then it can be shown that f (γ, θ) is an upper (optimistic) bound on the value of the function f (x, y). This is useful for implementing a branch-and-bound method for similarity computation.


Let Q be the target itemset. Optimistic bounds on the match and hamming distance from Q to the itemsets within each super-coordinate are computed. These bounds can be shown to be a function of the target Q, the activation threshold, and the choice of signatures. The precise method for computing these bounds is described in the pointers found in the bibliographic notes. Let the optimistic bound on the match be xi and that on distance be yi for the ith super-coordinate. These are used to determine an optimistic bound on the similarity f (x, y) between the target and any itemset indexed by the ith super-coordinate. Because of the monotonicity property, this optimistic bound for the ith super-coordinate is Bi = f (xi, yi). The super-coordinates are sorted in decreasing (worsening) order of the optimistic bounds Bi. The similarity of Q to the itemsets that are pointed to by these super-coordinates is computed in this sorted order. The closest itemset found so far is dynamically maintained. The process terminates when the optimistic bound Bi to a super-coordinate is lower (worse) than the similarity value of the closest itemset found so far to the target. At this point, the closest itemset found so far is reported.


5.3.2 Pushing Constraints into Pattern Mining


The methods discussed so far in this chapter are designed for retrieval queries with item-specific constraints. In practice, however, the constraints may be much more general and cannot be easily addressed with any particular data structure. In such cases, the constraints may be need to be directly pushed into the mining process.




Yüklə 17,13 Mb.

Dostları ilə paylaş:
1   ...   86   87   88   89   90   91   92   93   ...   423




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin