U sing an SRN trained on phoneme prediction as a word recognizing device shifts the focus from phoneme prediction to sequence classification. We wish to see whether it can classify sequences of phonemes into well-formed words on the one hand and ill-formed non-words on the other. To do this we need to translate the phoneme (prediction) values into sequence values. We do this by taking the sum of the phoneme error values for the sequence of phonemes in the string, normalized to correct for length effects. But to translate this sum into a classification, we again need to determine an acceptability threshold, and we use a variant of the same empirical optimization described above. The threshold arrived at for this purpose is slightly lower than the optimal threshold from the previous algorithm. This means that the network accepts more phonemes, which, however, is compensated for by the fact that a string is accepted only if all its phonemes are predicted. In string recognition it is better to increase the phoneme acceptance rate, because the chance to detect a non-word is larger when more tokens are tested.
Figure 2. SRN error (in %) as a function of the threshold θ. The False Negative Error increases as the threshold increases because more and more admissible phonemes are incorrectly rejected. At the same time, the False Positive Error decreases because fewer unwanted successors are falsely accepted. The mean of those two errors is the network error, which finds its minimum 6.0% at threshold θ* = 0.0175. Notice that the optimal threshold is limited to a small range. This illustrates how critical the exact setting of threshold is for good performance.
Since the performance measure here is the mean percentage of correctly recognized monosyllables and correctly rejected random strings, we incorporate both in seeking the optimal threshold. The negative data is as described above in 3.4. Concerning the positive data, this approach allows us to test the generalization capacity of the model, so that the training L1M and testing L2M subsets may be used here - the first for training the model and evaluating it during training, and the second to test the generalization capacity of the trained network.
Once we determine the optimal sequence-acceptance threshold (0.016), we obtain 5% error on the positive training dataset L1M and the negative strings from RM , where the error varied 0.5% depending on the random data set generated.
The model was tested further on the second group of negative data sets. As expected, strings which are more unlike Dutch resulted in smaller error. Performance on random strings from RN3 + is almost perfect. In the opposite case, the strings close to real words (from R1N ) resulted in larger error.
The generalization capabilities of the network were tested on the L2M positive data, unseen during training. The error on this test set was about 6%. An explanation of the increase of the error will be presented later, when the error will be studied by varying its properties.
Another interesting issue is how SRN performance compares to other known models, e.g. n-grams. The trained SRN definitely outperformed bigrams and trigrams, which was shown by testing the trained SRNs on the non-words from R2N and R3N sets, yielding 19% and 35% error, respectively. This means that the SRN correctly rejected four out of five non-word strings composed of correct bigrams and two out of three non-word strings made of trigrams. To clarify, note that bigram models would have 100% error on R2N , and trigram models 100% error on R3N .
Dostları ilə paylaş: |