44
Class
Regular Expression
examples
Year
(1
|2)
· (0
− 9)
3
1992, 2345
Gender
he
|she
|son
|..
male, female
Person
((A-Z)
·(a-z)
+
)
2
Johnny Cash, George Baker
Person
(A-Z)
·(a-z)
+
(A-Z). (A-Z)
·(a-z)
+
George W. Bush, Anton F. Philips
Table 3.8. Classes and possible recognition rules
Widmer, 2007]. It is notable that such a knowledge-driven approach to recognize
instances is class-dependent. For example, recognizing instances of
Movie is done
differently from recognizing instances of the class
Year.
When designing rules to recognize instances at the placeholders in the search
results, we focus on the structure of the instances and their context [De Meulder &
Daelemans, 2003].
- Context. The left and right context for a term can be expressed as regular
expressions. For example, a term in an enumeration may have a comma as
its left context and the word
and as its right context.
- Structure. Rules describing the structure focus on the number of words and
the use of capitals and punctuation marks. For example, a person’s name can
be recognized as two or three capitalized words.
The rules describing the structure of instances can be described using a reg-
ular expression.
We formulate regular expressions and a maximum distance from the queried
expression to identify instances from texts. Table 3.8 gives example regular ex-
pressions to recognize the structure of instances. Instances of the class
Year is for
example specified as a four digit term preceded by the name of a month. For in-
stances of the class
Gender, the instances are indirectly recognized. The text is for
example scanned for the word
son, which corresponds to the instance male.
The algorithm to identify instances using such a rule-based approach is
sketched in Table 3.9. We first scan the text for an occurrence of the instance
(described by
M) encapsulated by a left context (
c
Dostları ilə paylaş: