2.2 Extraction Information from the Web using Patterns
23
Currently, both
Google and Yahoo! allow a limited amount of automatic
queries per day. At the moment of writing this thesis,
Google allows only 1,000
queries a day, where each query returns at most 10 search results. Hence if for a
given query expression the maximum of 1,000 search results are available, we need
to formulate 100 queries using the
Google
API
.
Yahoo! currently is more generous,
allowing 5,000 automated queries per day, where at most 100 search results are
returned per query.
Hence, this search engine use restriction requires us to analyze our approach
not only in terms of time and space complexity, but also in terms of the order of
number of queries, which we termed the
Google Complexity.
Definition [Google Complexity].
For a web information extraction algo-
rithm using a search engine, we refer to the required number of queries as the
Google complexity.
Dostları ilə paylaş: