38
...
[placeholder] [pattern] [known instance]
|
{z
}
query
...
|
{z
}
search result
Figure 3.1. Query, pattern, search result and placeholder.
identification phase requires 2
|T | queries, as each pair in
T is queried twice. Each
pattern evaluated in the second phase requires
|V | queries to determine the score.
Hence, assuming that the
m most frequently identified patterns are evaluated, the
total Google complexity of the algorithm is 2
|T | +
m|V |.
3.2 Identifying Instances
As discussed in the first chapters of this thesis, the ontology population task differs
from the general information extraction task in a number of aspects. These differ-
ences also have their consequences in choosing a strategy in identifying instances
from texts.
As we use the web as a corpus, we can assume that the instances are redun-
dantly available. For our task it is not necessary to recognize each of the encoun-
tered occurrences [McDowell & Cafarella, 2006]. In recognizing instances, the
focus should be on the precision. If we extract erroneous instances and use them
in newly constructed queries, this will potentially lead to the extraction of more
erroneous instances. Hence, we opt for a strategy with high precision, while the
bootstrapping mechanisms should lead to a high recall for the ontology considered.
As we opt for a pattern-based approach, we know the context of the potential
instance (i.e. the queried expression) and its placeholder (either preceding or fol-
lowing the search query). We define the maximal distance to the query in terms of
the number of words. Figure 3.1 and Tables 3.4 and 3.5 illustrate this task. Within
the search results in Table 3.4, the task is to identify professions like
scientist and
chemist at a placeholder following the queried expressing. In Table 3.5 a challenge
is to recognize
Rachel Carson as a Person, contrary to Project Manager.
Now the problem is to identify instances at the placeholder.
The Instance Identification Problem. Given is an initial ontology with relation
r
on the classes
c
Dostları ilə paylaş: