Title of the article



Yüklə 3,17 Mb.
səhifə52/92
tarix02.01.2022
ölçüsü3,17 Mb.
#2212
1   ...   48   49   50   51   52   53   54   55   ...   92

4.2.Functional analysis


We may also study NNs by examining their performance as a function of factors such as word frequency, similarity neighborhood, and word length. Such an analysis relates computational language modeling to psycholinguistics, and we submit that it is useful to compare the models' performance with humans'. In this section we introduce several factors which have played a role in psycholinguistic theorizing. We then examine the performance of our model as a function of these factors.

4.2.1.Psycholinguistic Factors


Frequency is one of the most thoroughly investigated characteristics of words that affect performance. Numerous previous studies have demonstrated that the ease and the time with which spoken words are recognized are monotonically related to the experienced frequency of words in the language environment (Luce, Pisoni & Goldinger, 1990; Plaut et al., 1996). The general tendencies found are that the more frequent words are, the faster and the more precise they are recognized.

Our perception of a word is likewise known to depend on its similarity to other words. The similarity neighborhood of a word is defined as the collection of words that are phonetically similar to it. Some neighborhoods are dense with many phonetically similar words while others are sparse with few.

The so-called Colthearth-N measure of a word w counts the number of words that might be produced by replacing a single letter of w with some other. We modify this concept slightly to make it sensitive to similarity of sub-syllabic elements, so that we regard words as similar when they share two of the subsyllabic elements - onset, nucleus and coda. Empty onsets or codas are counted as the same. The word neighborhood is computed by counting the number of the similar words. If implemented precisely, the complexity of the measuring process just explained is high, so we reduce it by probing for sub-syllables rather than for units of variable size, starting from a single phoneme. This simplifies and speeds up processing. The neighborhood size of the corpus we used ranged from 0 to 77 and had mean value of μ= 30; σ = 13.

For example, the phonological neighborhood of the Dutch word broeds /bruts/ is given below. Note that the neighborhood contains only Dutch words.


/brts/, /brots/, /bruj/, /brujt/, /bruk/, /brur/, /brus/, /brut/, /buts/, /kuts/, /puts/, /tuts/
These represent the pronunciations of Brits `British', broods `bread' (gen.sg.), broei `brew', broeit `brew' (3rd. sg.), broek `pants', broer `brother', broes `spray nozzle', broed `brood', boots `boots' (Eng. loan), koets `coach', poets `clean' and toets `test'. Among the words with very poor neighborhood are // schwung, /brts/ boards, /jnt/ joint, and /skrs/ squares, all of which are of foreign origin. Words such as /hk/ hek, /bs/ bas, /lxt/ lacht, and /bkt/ bakt have large neighborhoods.

It is still controversial how similarity neighborhood influences cognitive processes (Balota, Paul & Spieler, 1999). Intuitively, it seems likely that words with larger neighborhoods are easier to access due to many similar items, but from another perspective these words might be more difficult to access due to the nearby competitors and longer selection process. However, in the more specific lexical decision task, the overall activity of many candidates has been shown to facilitate lexical decisions, so we will look for the same effect here.

The property word length might affect performance in the lexical decision task in two different ways. On one hand, longer words provide more evidence since more phonemes are available to decide whether the input sequence is a word so that we expect higher precision for longer words, and lower precision for particularly short words. On the other hand, network error accumulating in iteration increases the error in phoneme predictions at later positions, which in turn will increase the overall error for longer words. For these reasons we expect U-shaped patterns of error as word length increases. Such a pattern was observed in a study on modeling grapheme-to-phoneme conversion with SRNs (Stoianov et al., 1999). Static NNs are less likely (than dynamic models such as SRNs) to produce such patterns.

So far we have presented three main characteristics of the individual words, which we expect to affect the performance of the model. However, a statistical correlation analysis (bivariate Spearman test) showed that they are not independent, which means that an analysis of the influence of any single factor should control for the rest. In particular, there is high negative correlation between word neighborhood and word length (r = -0.476), smaller positive correlation between neighborhood and frequency (r = 0.223), and very small negative correlation between frequency and word length (r = -0.107). Because of the large amount of data all these coefficients are significant at the 0.001 level.



Finally, it will be useful to seek a correlate in the simulation for reaction time, which psycholinguists are particularly fond of using as a probe to understanding linguistic structure. Perhaps we can find an SRN correlate to Reaction Time (RT) for the lexical decision task in network confidence, i.e., the amount of evidence that the test string is a word from the training language. The less confident the network, the slower the reaction, which can be implemented with a lateral inhibition (Haykin, 1994; Plaut et al., 1996). The network confidence for a given word might be expressed as the product of the activations of the neurons corresponding to the phonemes of that word. A similar measure, which we call uncertainty U, is the negative sum of (output) neuron activation logarithms, normalized with respect to word length |w| (2). Note that U varies inversely with confidence. Less certain sequences get higher (positive) scores.
E
quation 2.

To analyze the influence of these parameters, the network scores and U-values were recorded for each monosyllabic word at the optimal threshold θ*= 0.016. The data was then submitted to the statistical package SPSS for analysis of variance using SPSS's General Linear Model (GLM). When analyzing network score, the analysis revealed main effects of all three parameters discussed above: word neighborhood size (F = 18.4; p < 0.0001), word frequency (F = 19.2; p < 0.0001), word length (F = 11.5; p < 0.0001). There was also interaction between neighborhood size and the other parameters: the interaction with word frequency had an F -score 6.6 and the interaction of the neighborhood with word length had an F-score of 4.9, both significant at 0.0001 level. Table 3 summarizes the findings. Error decreases both as neighborhood size and as frequency increases, and error dependent on length shows the predicted U-shaped form (Table 3c).
Table 3. Effect of (a) frequency, (b) neighborhood density and (c) length on word uncertainty U and word error.



Frequency

Low

Mid

High

U

2.30

2.20

2.18

Error (%)

8.6

4.1

1.5






Neighb. size

Low

Mid

High

U

2.62

2.30

2.21

Error (%)

12.7

3.9

0.8






Length

Low

Mid

High

U

2.63

2.20

2.13

Error (%)

5.2

4.4

13.1

Analysis of variance on the U-values revealed similar dependencies. There were main effects of word neighborhood size (F = 58.2; p < 0.0001), word frequency (F = 45.9; p < 0.0001), word length (F = 137.5; p < 0.0001), as well as the earlier observed interactions between neighborhood density and the other two variables: word length (F = 10.4; p < 0.001) and frequency (F = 5.235; p < 0.005).



The frequency pattern of error and uncertainty variance was expected, given the increased evidence to the network for more frequent words. The displayed length effect showed that the influence of error gained in recursion is weaker than the effect of stronger evidence for longer words. Also, the pattern of performance when varying neighborhood density confirmed the hypothesis of the lexical decision literature that larger neighborhoods makes it easier for words to be recognized as such.

Yüklə 3,17 Mb.

Dostları ilə paylaş:
1   ...   48   49   50   51   52   53   54   55   ...   92




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin