Data Mining: The Textbook

Yüklə 17,13 Mb.

səhifə	247/423
tarix	07.01.2024
ölçüsü	17,13 Mb.
	#211690

1 ... 243 244 245 246 247 248 249 250 ... 423

1-Data Mining tarjima

E(y)	≤	G(y) +	n_f · e	.	(12.23)

		m

Here, e represents the base of the natural logarithm.

12.2. SYNOPSIS DATA STRUCTURES FOR STREAMS

405

Proof: The expected number of spurious items hashed to the cells belonging to item y is about3 n_f /m, if all spurious items are hashed uniformly at random to the diﬀerent cells. This result uses pairwise independence of hash functions because it relies on the fact that the mapping of y to a cell does not aﬀect the distribution of another spurious item in its cells. The probability of the number of spurious items exceeding n_f · e/m in any of the w cells belonging to y is given by at most e−1 by the Markov inequality. For E(y) to exceed the upper bound of Eq. 12.23, this violation needs to be repeated for all the w cells to which y is mapped. The probability of a violation of Eq. 12.23 is therefore e−w. The result follows.

In many cases, it is desirable to directly control the error level and the error probability

δ. By setting m = e/ and w = ln(1/δ), it is possible to bound the error with a user-defined tolerance n_f · and probability at least 1 − δ. Two natural generalizations of the point query can be implemented as follows:

If the stream elements have arbitrary positive frequencies associated with them, the only change required is to the update operation, where the counts are incremented by the relevant frequency. The frequency bound is identical to Eq. 12.23, with n_f representing the sum of the frequencies of the stream items.

If the stream elements have either positive or negative frequencies associated with them, then a further change is required to the query procedure. In this case, the median of the counts is reported. The corresponding error bound of Eq. 12.23 now needs to be modified. With a probability of at least 1−e−w/4, the estimated frequency E(y) of item y lies in the following ranges:

G(y)

−

3n_f

· e

≤

E(y)

≤

Yüklə 17,13 Mb.

Dostları ilə paylaş:

1 ... 243 244 245 246 247 248 249 250 ... 423