D3.3 Very basic grammar for I Revision 0.1
_________________________________________________________________________________
DeepThought IST-2000-30161 Page 6 (of 55)
Generally:
input
is a string (terminated by formfeed)
output
is a sequence of tokens with
•
id
: token id (optional; irrelevant, but could be returned later);
•
from
: beginning character position (zero-based);
•
to
: ending character position;
•
form
: actual surface form;
•
path
: identifier of path through lattice (optional; not used yet);
•
format
: surface string properties: InitialCapital, AllLower, UpperAndLower, AllUpper.
each
token
has one or more morphological analyses with:
•
stem
Dostları ilə paylaş: