5.Conclusions
Phonotactic constraints restrict the way phonemes combine in order to form words. These constraints are empirical and can be abstracted from the lexicon - either by extracting rules directly, or via models of that lexicon. Existing language models are usually based on abstract symbolic methods, which provide good tools for studying such knowledge. But linguistic research from a connectionist perspective can provide a fresh perspective about language because the brain and artificial neural networks share principles of computations and data representations.
Connectionist language modeling, however, is a challenging task. Neural networks use distributed processing and continuous computations, while languages have a discrete, symbolic nature. This means that some special tools are necessary if one is to model linguistic problems with connectionist models. The research reported in this paper attempted to provide answers to two basic questions: first, whether phonotactic learning is possible at all in connectionist systems, which had been doubted earlier (Tjong Kim Sang, 1995; Tjong Kim Sang, 1998). In the case of a positive answer, the second question is how NN performance compares to human ability. In order to draw this comparison, we needed to extract the phonotactic knowledge from a network which has learned the sequential structure. We proposed several ways of doing this.
Section 3 studied the first question. Even if there are theoretical results demonstrating that NNs have the needed finite-state capacity for phonotactic processing, there are practical limitations, so that we needed experimental support to demonstrate the practical capability of SRNs to learn phonotactics. A key to solving the problems of earlier investigators was to focus on finding a threshold that optimally discriminated the continuous neuron activations with respect to phoneme acceptance and rejection simultaneously. The threshold range at which the network achieves good discrimination is very small (see Figure 2), which illustrates how critical the exact setting of the threshold is. We also suggested that this threshold might be computed interactively, after processing each symbol, which is cognitively plausible, but we postpone a demonstration of this to another paper.
The network performance on word recognition - word acceptance rate of 95% and random string rejection rate of 95% at a threshold of 0.016 - competes with the scores of symbolic techniques such as Inductive Logic Programming and Hidden Markov Models (Tjong Kim Sang, 1998), both of which reflect low-level human processing architecture with less fidelity.
Section 4 addressed the second question of how other linguistic knowledge encoded into the networks can be extracted. Two approaches were used. Section 4.1 clustered the weights of the network, revealing that the network has independently become sensitive to established phonetic categories.
We went on to analyze how various factors which have been shown to play a role in human performance find their counterparts in the network's performance. Psycholinguistics has shown, for example, the ease and the time with which spoken words are recognized are monotonically related to the frequency of words in language experience (Luce et al., 1990). The model likewise reflected the importance of neighborhood density in facilitating word recognition, which we speculated stems from the supportive evidence which more similar patterns lend to the words in their neighborhood. Whenever network and human subjects exhibit a similar sensitivity to well-established parameters, we see a confirmation of the plausibility of the architecture chosen.
Finally, the distribution of the errors within the words showed another linguistically interesting result. In particular, the network tended to err more often at the transition onset-nucleus - which is also typical for transitions between adjacent words in the speech stream and used for speech segmentation. Analogically, we can conclude from this that the nucleus-coda unit - the rhyme - is a significant linguistic unit for the Dutch language, a result suggested earlier for English (Kessler & Treiman, 1997).
We wind up this conclusion with one disclaimer and a repetition of the central claim. We have not claimed that SRNs are the only (connectionist) model capable of dynamic processing, nor that they are biologically the most plausible neural network. Our central claim is to have demonstrated that relatively simple connectionist mechanisms have the capacity to model and learn phonotactic structure.
Dostları ilə paylaş: |