3.5.Training
This section reports on network training. We will add a few more details about the training procedure, then we will present pilot experiments aimed at determining the hidden layer size. The later parts will analyze the network performance.
3.5.1.Procedure
The networks were trained in a pool on the same problem, and independently of each other, with the BPTT learning algorithm. The training of each individual network was organized in epochs, in the course of which the whole training data set is presented in accordance with the word frequencies. The total of the logarithm of the frequencies in the training data base L1M is about 11,000, which is also the number of presentations of sequences per epoch, drawn in a random order. Next, for each word, the corresponding sequence of phonemes is presented to the input, one at a time, followed by the end-of-sequence marker `#'. Each time step is completed by copying the hidden layer activations to the context layer, which is used in the following step.
The parameters of the learning algorithm were as follows: the learning coefficient η started at 0.3 and dropped by 30% each epoch, finishing at 0.001; the momentum (smoothing) term α = 0.7. The networks required 30 epochs to complete training. After this point, very little improvement is noted.
Dostları ilə paylaş: |