Title of the article

Yüklə 3,17 Mb.

səhifə	39/92
tarix	02.01.2022
ölçüsü	3,17 Mb.
	#2212

1 ... 35 36 37 38 39 40 41 42 ... 92

2.Simple Recurrent Networks

T
his section will briefly present Simple Recurrent Networks (Elman, 1988; Robinson & Fallside, 1988) and will review earlier studies of sequential, especially phonotactic learning. Detailed descriptions of the SRN processing mechanisms and the Back-Propagation Through Time learning algorithm that is used to train the model are available elsewhere (Stoianov, 2001; Haykin, 1994), and will be reviewed only superficially.

Figure 1. Learning phonotactics with the SRNs. If the training data set contains the words /nt#/, /nts#/ and /ntrk#/ then after the network has processed a left context /n/, the reaction to an input token /t/ will be active neurons corresponding to the symbol '#' and the phonemes /s/, and //.
Simple Recurrent Networks (SRNs) were invented to encode simple artificial grammars, as an extension of the Multilayer Perceptron (Rumelhart, Hinton & Williams, 1986) with an extra input - a context layer that holds the hidden layer activations at the previous processing cycle. After training, Elman (1988) conducted investigations on how context evolves in time. The analysis showed graded encoding of the input sequence: similar items presented to the input were clustered at close, but different, shifting positions. That is, the network discovered and implicitly represented in a distributed way the rules of the grammar generating the training sequences. This is noteworthy, because the rules for context were not encoded, but rather acquired through experience. The capacity of SRNs to learn simple artificial languages was further explored in a number of studies (Cleeremans, Servan-Schreiber & McClelland, 1989; Gasser, 1992).

SRNs have the structure shown in Figure 1. They operate as follows: Input sequences S^I are presented to the input layer, one element S^I(t) at a time. The purpose of the input layer is just to transfer activation to the hidden layer through a weight matrix. The hidden layer in turn copies its activations after every step to the context layer, which provides an additional input to the hidden layer - i.e., information about the past, after a brief delay. Finally, the hidden layer neurons output their signal through a second weight matrix to the output layer neurons. The activation of the latter is interpreted as the product of the network. Since the activation of the hidden layer depends both on its previous state (the context) and on the current input, SRNs have the theoretical capacity to be sensitive to the entire history of the input sequence. However, practical limitations restrict the time span of the context information to maximally 10-15 steps (Christiansen & Chater, 1999). The size of the layers does not restrict the range of temporal sensitivity.

The network operates in two working regimens - supervised training and network use. In the latter, the network is presented the sequential input data S^I(t) and computes the output N(t) using contextual information. The training regimen involves the same sort of processing as network use and also includes a second, training step, which compares the network reaction N(t) to the desired one S^T(t), and which uses the difference to adjust the network behavior in a way that improves future network performance on the same data.

The two most popular supervised learning algorithms used to train SRNs are the standard Back-Propagation algorithm (Rumelhart et al., 1986) and the Back-Propagation Through Time algorithm (Haykin, 1994). While the earlier is simpler because it uses information from one previous time step only (the context activation, the current network activations, and error), the latter trains the network faster, because it collects errors from all time steps during which the network processes the current sequence and therefore it adjusts the weights more precisely. However, the BPTT learning algorithm is also cognitively less plausible, since the collection of the time-spanning information requires mechanisms specific for the symbolic methods. Nevertheless, this compromise allows more extensive research, and without it the problems discussed below would require much longer training time when using standard computers for simulations. Therefore, in the experiments reported here the BPTT learning algorithm will be used. In brief, it works in the following way: the network reaction to a given input sequence is compared to the desired target sequence at every time step and an error is computed. The network activation and error at each step are kept in a stack. When the whole sequence is processed, the error is propagated back through space (the layers) and time, and weight-updating values are computed. Then, the network weights are adjusted with the values computed in this way.

Yüklə 3,17 Mb.

Dostları ilə paylaş:

1 ... 35 36 37 38 39 40 41 42 ... 92