213
The SHA algorithms
HyperLogLog
Along the same lines is another algorithm called HyperLogLog.
Suppose Google wants to count the number of
unique
searches
performed by its users. Or suppose Amazon wants to count the number
of unique items that users looked at today. Answering these questions
takes a lot of space! With Google, you’d have to keep a log of all the
unique searches. When a user searches for something, you have to see
whether it’s already in the log. If not, you have to add it to the log. Even
for a single day, this log would be massive!
HyperLogLog approximates the number of unique elements in a set.
Just like bloom filters, it won’t give you an exact answer, but it comes
very close and uses only a fraction of the memory a task like this would
otherwise take.
If you have a lot of data and are satisfied with approximate answers,
check out probabilistic algorithms!
Dostları ilə paylaş: