Title of the article


Discussion and Conclusion



Yüklə 3,17 Mb.
səhifə41/92
tarix02.01.2022
ölçüsü3,17 Mb.
#2212
1   ...   37   38   39   40   41   42   43   44   ...   92

3.Experiments


The challenge in connectionist modeling is not only developing theoretical frameworks, but also obtaining the most from the network models during experimentation. This section focuses on experiments on learning the phonotactics of Dutch syllables with Simple Recurrent Networks and discusses a number of related problems. It will be followed by a study on the network behavior from a linguistic point of view.

3.1.Some implementation decisions


SRNs were presented in section 2. A first implementation decision concerns how sounds are to be represented. A simple orthogonal strategy is to choose a vector of n neurons to represent n phonemes, to assign each phoneme (e.g. //) to a neuron (e.g., neuron 5 in a sequence of 45), and then to activate that one neuron and deactivate all the others whenever the phoneme is to be represented (so a // is represented by four deactivated neurons, a single activated one, and then forty more deactivated neurons). This orthogonal strategy makes no assumptions about phonemes being naturally grouped into classes on the basis of linguistic features such as consonant/vowel status, voicing, place of articulation, etc. An alternative strategy exploits such features by assigning each feature to a neuron and then representing a phoneme via a translation of its feature description into a sequence of corresponding neural activations.

In phonotactics learning, the input encoding method might be feature-based or orthogonal, but the output decoding should be orthogonal in order to obtain a simple prediction of successors, and to avoid a bias induced from the peculiarities of the feature encoding scheme used. The input encoding chosen was also orthogonal, which also requires the network discover natural classes of phonemes by itself.

The orthogonal encoding implies that we need as many neurons as we have phonemes, plus one for the end-of-word '#' symbol. That is, the input and output layers will have 45 neurons. However, it is usually difficult to choose the right size of the hidden layer for a particular learning problem. That size is rather indirectly related to the learning task and encoding chosen (as a subcomponent of the learning task). A linguistic bias in the encoding scheme, e.g., feature-based encoding, would simplify the learning task and decrease the number of the hidden neurons required learning it (Stoianov, 2001). Intuition tells us that hidden layers that are too small lead to an overly crude representation of the problem and larger error. Larger hidden layers, on the other hand, increase the chance that the network wanders aimlessly because the space of possibilities it needs to traverse is too large. Therefore, we sought an effective size in a pragmatic fashion. Starting with a plausible size, we compared its performance to nets with double and half the number of neurons in the hidden layer. We repeated in the direction of the better behavior, keeping track of earlier bounds in order to home in on an appropriate size. In this way we settled on a range of 20-80 neurons in the hidden layer, and we continued experimentation on phonotactic learning using only nets of this size.

However, even given the right size of the hidden layer, the training will not always result in an optimal weight set W* since the network learning is nondeterministic - each network training process depends on a number of stochastic variables, e.g., initial network weights and an order of presentation of examples. Therefore, in order to produce more successful learning, several SRNs with different initial weights were trained in a pool (group).

The back-propagation learning algorithm is controlled by two main parameters - a learning coefficient η and a smoothing parameter α. The first one controls the speed of the learning and is usually set within the range (0.1…0.3). It is advisable to choose a smaller value when the hidden layer is larger. Also, this parameter may vary in time, starting with a larger initial value that decreases progressively in time (as suggested in Kuan, Hornik & White (1994) for the learning algorithm to improve its chances at attaining a global minimum in error). Intuitively, such a schedule helps the network approximately to locate the region with the global minima and later to make more precise steps in searching for that minimum (Haykin, 1994; Reed & Marks, II 1999). The smoothing parameter α will be set to 0.7, which also allows the network to escape from local minima during the search walk over the error surface.

The training process also depends on the initial values of the weights. They are set to random values drawn from a region (-r...+r). It is also important to find a proper value for r, since large initial weight values will produce chaotic network behavior, impeding the training. We used r = 0.1.

The SRNs used for this problem are schematically represented in Fig. 1, where the SRN reaction to an input sequence /n/ after training on an exemplary set containing the sequences /nt#/, /nts#/, /ntrk#/ is given. For this particular database, the network has experienced the tokens '#', /s/ and // as possible successors to /n/ during training and therefore it will activate them in response to this input sequence.


Yüklə 3,17 Mb.

Dostları ilə paylaş:
1   ...   37   38   39   40   41   42   43   44   ...   92




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin