Title of the article

Yüklə 3,17 Mb.

səhifə	44/92
tarix	02.01.2022
ölçüsü	3,17 Mb.
	#2212

1 ... 40 41 42 43 44 45 46 47 ... 92

3.4.Negative Data

We noted above that negative data is also necessary for evaluation. Since we are interested in models that discriminate more precisely the strings from L (the Dutch syllables), the negative data for the following experiments will be biased toward L.

Three negative testing sets were generated and used: First, a set R_Mcontaining strings with syllabic form [C]^0...3V[C]^0...4, based on the empirical observation that the Dutch mono-syllables have up to three onset (word initial) consonants and up to four coda (word final) consonants. The second group consists of three sub-sets of R_M: {R¹_M , R²_M , R_M^{3 +} }, with fixed distances of the random strings to any existing Dutch word at 1, 2, and 3+ phonemes, respectively (measured by edit distance (Nerbonne, Heeringa & Kleiweg, 1999)). Controlling for the distance to any training word allows us to assess more precisely the performance of the model. And finally, a third group: random strings built of concatenations of n-grams picked randomly from Dutch monosyllables. In particular, two sets - R²_N and R³_N - were randomly developed, based on bigrams and trigrams, correspondingly.

The latter groups are the most "difficult" ones, and especially R³_N , because it consists of strings that are closest to Dutch. They are also useful for the comparison of SRN methods to n-gram modeling. The corresponding n-gram models will always wrongly recognize these random strings as words from the language. Where the connectionist predictor recognizes them as non-words, it outperforms the corresponding n-gram models, which are considered as benchmark models for prediction tasks such as phonotactics learning.

Yüklə 3,17 Mb.

Dostları ilə paylaş:

1 ... 40 41 42 43 44 45 46 47 ... 92