Title of the article



Yüklə 3,17 Mb.
səhifə46/92
tarix02.01.2022
ölçüsü3,17 Mb.
#2212
1   ...   42   43   44   45   46   47   48   49   ...   92

3.5.2.Pilot experiments


Pilot experiments aiming at searching for the most appropriate hidden layer size were done with 20, 40 and 80 hidden neurons. In order to avoid other nondeterminism which comes from the random selection of negative data, during the pilot experiments the network was tested solely on its ability to distinguish admissible from inadmissible successors. Those experiments were done with a small pool of three networks, each of them trained for 30 epochs, which resulted in approximately 330,000 word presentations or 1,300,000 segments. The total number of individual word presentations ranged from 30 to 300, according to the individual word frequencies. The results of the training are given in Table 1, under the group of columns "Optimal phonotactics". In the course of the training, the networks typically started with a sharp error drop to about 13%, which soon turned into a very slow decrease (see Table 2, left 3 columns).

The training of the three pools with hidden layer size 20, 40 and 80, resulted in networks with similar performance, with the largest network performing best. Additional experiments with SRNs with 100 hidden neurons resulted in larger errors than a network with 80 hidden neurons, so that we settled experimentally on 80 hidden neurons as the likely optimal size. It is clear that this procedure is rough, and that one needs to be on guard against premature concentration on one size model.


Table 1. Results of a pilot study on phonotactics learning by SRNs with 20, 40, and 80 (rows) hidden neurons. Each network is independently trained on language LM three times (columns). The performance is measured (left 3 columns) using the error in predicting the next phoneme, and (right 3 columns) using L2 (semi-Euclidean) distance between the empirical context-dependent predictions and the network predictions for each context in the tree. Those two methods do not depend on randomly chosen negative data.




Optimal Phonotactics

||SRNL, TL||L2

Hidd Layer Size

SRN1

SRN2

SRN3

SRN1

SRN2

SRN3

20

10.57%

10.65%

10.57%

0.0643

0.0642

0.0642

40

10.44%

10.51%

10.44%

0.0637

0.0637

0.0637

80

10.00%

9.97%

10.02%

0.0634

0.0634

0.0632


Table 2. A typical shape of the SRN error during training. The error drops sharply in the beginning and then slowly decreases to convergence.

Epoch

1

2-4

5-10

11-15

16-30

Error (%)

15.0

12.0

10.8

10.7

10.5

Yüklə 3,17 Mb.

Dostları ilə paylaş:
1   ...   42   43   44   45   46   47   48   49   ...   92




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin