2.1.Learning Phonotactics with SRNs
Dell, Juliano & Govindjee (1993) showed that words could be described not only with symbolic approaches, using word structure and content, but also by a connectionist approach. In this early study of learning word structure with neural nets (NNs), the authors trained SRNs to predict the phoneme that follows the current input phoneme, given context information. The data sets contained 100 - 500 English words. An important issue in their paper is the analysis and modeling of a number of speech-error phenomena, which were taken as strong support for parallel distributed processing (PDP) models, in particular SRNs. Some of these phenomena were: phonological movement errors (reading list - leading list), manner errors (department - jepartment), phonotactic regularity violations (dorm - dlorm), consonant-vowel category confusions and initial consonant omissions (cluster-initial consonants dropping as when `stop' is mispronounced [tp]).
Aiming at segmentation of continuous phonetic input, Shillcock et al. (1997) and Cairns et al. (1997) trained SRNs with a version of the BPTT learning algorithm on English phonotactics. They used 2 million phonological segments derived from a transcribed speech corpus and encoded with a vector containing nine phonological features. The neural network was presented a single phoneme at a time and was trained to produce the previous, the current and the next phonemes. The output corresponding to the predicted phoneme was matched against the following phoneme, measuring cross-entropy; this produced a varying error signal with occasional peaks corresponding to word boundaries. The SRN reportedly learned to reproduce the current phoneme and the previous one, but was poor at predicting the following phoneme. Correspondingly, the segmentation performance was quite modest, predicting only about one-fifth of the word boundaries correctly, but it was more successful in predicting syllable boundaries. It was significantly improved by adding other cues such as prosodic information. This means that phonotactics might be used alone for syllable detection, but polysyllabic word detection needs extra cues.
In another connectionist study on phonological regularities, Rodd (1997) trained SRNs on 602 Turkish words; the networks were trained to predict the following phonemes. Analyzing the hidden layer representations developed during the training, the author found that hidden units came to correspond to graded detectors for natural phonological classes such as vowels, consonants, voiced stops and front and back vowels. This is further evidence that NN models can capture important properties of the data they have been trained on without any prior knowledge, based only on statistical co-occurrences.
Learning the graphotactics and phonotactics of Dutch monosyllables with connectionist models was first explored by Tjong Kim Sang (1995) and Tjong Kim Sang & Nerbonne (1999), who trained SRNs to predict graphemes/phonemes based on preceding segments. The data was orthogonally encoded, that is, for each phoneme or grapheme there was exactly one neuron activated at the input and output layers (see below 3.1). To test the knowledge learned by the network, Tjong Kim Sang and Nerbonne tested whether the activation of the neurons corresponding to the expected symbols are greater than a threshold determined as the lowest activation for some correct sequence encountered during the training data. This resulted in almost perfect acceptance of unseen Dutch words (generalization), but also in negligible discrimination with respect to (ill-formed) random strings. The authors concluded that “SRNs are unfit for processing our data set” (Tjong Kim Sang & Nerbonne, 1999).
These early works on learning phonotactics with SRNs prompted the work reported here. First, Stoianov et al. (1998) demonstrated that the SRNs in Tjong Kim Sang and Nerbonne's work were learning phonotactics rather better than those authors had realized. By analyzing the error as a function of the acceptance threshold, Stoianov et al. (1998) were able to demonstrate the existence of thresholds successful at both the acceptance of well-formed data and the rejection of ill-formed data (see below 3.6.2 for a description of how we determine such thresholds). The interval of high-performing thresholds is narrow, which is why earlier work had not identified it (see Figure 2 on how narrow the window is). More recently, Stoianov & Nerbonne (2000) have studied the performance of SRNs from a cognitive perspective, attending to the errors produced by the network and to what extent it correlates with the performance of humans on related lexical decision tasks. The current article ties these two strands of work and presents it systematically.
Dostları ilə paylaş: |