Title of the article

Yüklə 3,17 Mb.

səhifə	46/92
tarix	02.01.2022
ölçüsü	3,17 Mb.
	#2212

1 ... 42 43 44 45 46 47 48 49 ... 92

3.5.2.Pilot experiments

Pilot experiments aiming at searching for the most appropriate hidden layer size were done with 20, 40 and 80 hidden neurons. In order to avoid other nondeterminism which comes from the random selection of negative data, during the pilot experiments the network was tested solely on its ability to distinguish admissible from inadmissible successors. Those experiments were done with a small pool of three networks, each of them trained for 30 epochs, which resulted in approximately 330,000 word presentations or 1,300,000 segments. The total number of individual word presentations ranged from 30 to 300, according to the individual word frequencies. The results of the training are given in Table 1, under the group of columns "Optimal phonotactics". In the course of the training, the networks typically started with a sharp error drop to about 13%, which soon turned into a very slow decrease (see Table 2, left 3 columns).

The training of the three pools with hidden layer size 20, 40 and 80, resulted in networks with similar performance, with the largest network performing best. Additional experiments with SRNs with 100 hidden neurons resulted in larger errors than a network with 80 hidden neurons, so that we settled experimentally on 80 hidden neurons as the likely optimal size. It is clear that this procedure is rough, and that one needs to be on guard against premature concentration on one size model.

Table 1. Results of a pilot study on phonotactics learning by SRNs with 20, 40, and 80 (rows) hidden neurons. Each network is independently trained on language L_M three times (columns). The performance is measured (left 3 columns) using the error in predicting the next phoneme, and (right 3 columns) using L₂ (semi-Euclidean) distance between the empirical context-dependent predictions and the network predictions for each context in the tree. Those two methods do not depend on randomly chosen negative data.

	Optimal Phonotactics			\|\|SRN^L, T^L\|\|_L2
Hidd Layer Size	SRN₁	SRN₂	SRN₃	SRN₁	SRN₂	SRN₃
20	10.57%	10.65%	10.57%	0.0643	0.0642	0.0642
40	10.44%	10.51%	10.44%	0.0637	0.0637	0.0637
80	10.00%	9.97%	10.02%	0.0634	0.0634	0.0632

Table 2. A typical shape of the SRN error during training. The error drops sharply in the beginning and then slowly decreases to convergence.

Epoch	1	2-4	5-10	11-15	16-30
Error (%)	15.0	12.0	10.8	10.7	10.5

Yüklə 3,17 Mb.

Dostları ilə paylaş:

1 ... 42 43 44 45 46 47 48 49 ... 92