D3.3 Very basic grammar for I Revision 0.1
_________________________________________________________________________________
DeepThought IST-2000-30161 Page 5 (of 55)
0. INTRODUCTION
The version of the Italian Grammar we are referring to in this report is the
Italian grammar v.
0.3
, updated to 31/07/2003.
The version of the Matrix used by the italian grammar v. 0.3 is the most recent
Matrix 0.4
.
Types and rules quoted in this report have been partially modified, in order to stress only
their
relevant elements, accordingly to the context (not relevant ones are simply omitted).
0.1. SPPP
A short preliminary remark is needed about the integration of the LKB system with a shallow
pre-processor. From the first week of March, a protocol (SPPP, “
Simple Pre-Processing
Protocol”), developed by Stephan Oepen interfaces the LKB system and some pre-
processing modules (tokenizer, morphological analyzer and POS tagger) provided by Celi -
Sophia2.1
:
•
the Italian grammar point to an external executable that communicates with the LKB
through standard in- and output; as part of the grammar loading,
that process is
created and connected to a Lisp stream.
•
preprocess-sentence-string() is extended to use the external engine if
*sppp-stream*
is non-nil.
•
all communication is in XML, the LKB sends a string (one sentence at a time, for now)
as an XML document and reads the preprocessor output as another XML document
from the stream.
Doing XML in both directions, if nothing else, has the advantage of
declaring which encoding gets used (the default is UTF-8).
•
.user-input. in
parse() now is a list of morphologically analyzed
and POS-tagged
tokens; modifying
add-morphs-to-morphs() for that to work was the only change in the
parser proper.
An example of the Sophia-LKB interface is the following (for the sentence “
il letto cigola”, “the
bed is squeaking”):
→
→
Dostları ilə paylaş: