Figure 7.5.
Propagation of [i] from cochlea to primary (A
1
) and tertiary (A
3
)
auditory neocortex.
For example, in figure 7.6, a variant acoustic pattern, [
I
], is input to the
mature system after it has already learned the normal phoneme /i/ at A
3
. Even
though this new, variant input has its F
2
peak at x
2
, feedback is stronger around
the learned A
1
–A
3
loop at x
3
because of the learned and heavily weighted long-
term memory trace at z
3
. Thus, activity at x
3
becomes greater than at x
2
, even
though the phonetic input to x
2
is greater than the input to x
3
. As a result, the
phone [
I
] is deformed and perceived in the phonemic category /i/.
Elsewhere in psychology, this deforming-to-match is known as “feature fill
ing.” By the same kind of on-center off-surround polypole mechanisms, we can
recognize a partially masked face or read a poor-quality photocopy of a docu
ment, filling in missing features from long-term memory. In all such cases,
information that is missing in the sensory input is reconstructed from memory.
In ART, these “top-down” signals shaped by (learned) long-term memory traces
are the same expectancies we discussed in chapter 5.
Perceptual Interference and Learning New Patterns
The ability to fill in features has obvious survival value. In certain circumstances,
however, it can give rise to the phenomenon known as interference. For example,
in Spanish, the [i] of figure 7.5 and the [
I
] of figure 7.6 are allophones—vari-
ants that both match the Spanish phonemic category /i/. The Spanish speaker’s
long-term memory array at A
3
in figure 7.6 has learned to deform the lower
second formant of [I] to match the higher second formant of /i/, so she does
SPEECH PERCEPTION
• 117
Figure 7.6.
Phonemic normalization.
not distinguish between bit and beet.
1
This learned equivalence works fine in
Spain, but when a Spanish speaker attempts to use the same circuits to process
English, interference can arise. Like Spanish, English maps [i] onto the phoneme
/i/, but unlike Spanish, English maps [
I
] onto a distinct phoneme, /
I
/. Thus,
beet (/bit/) and bit (/b(t/) are two distinct words in English, but they would
simply be different ways of pronouncing the same word in Spanish.
In the 1950s, the contrastive analysis hypothesis proposed that the more
different two languages were, the more mutually difficult they would be to learn.
So, for example, Italian speakers should find it easier to learn Spanish than
Chinese since Italian and Spanish are both Romance languages and totally
unrelated to Chinese. In general, the contrastive analysis hypothesis held up
fairly well, but under close examination it was found to break down. Sometimes,
second-language learners found it most difficult to learn things that were only
minimally different from their first language. This interference caused a seri
ous conceptual problem that behaviorism was unable to solve. Why should both
maximally different and minimally different structures be more difficult to learn
than structures that were only moderately different? Figure 7.6 explains what
behaviorism could not: previously learned features only interfere with new
features within their off-surround. The polypole simply resolves this contradic
tion between interference theory and contrastive analysis.
In the early stages of learning a second language, gross and confusing
miscategorization of speech sounds is a familiar experience, but it can be rela
tively quickly overcome. To say that our Spanish speaker does not distinguish
beet and bit is not to say that she cannot learn to do so. But how does she learn?
If x
3
dominates x
2
in figure 7.6, how can x
2
ever activate in response to [
I
]? For
118 •
HOW THE BRAIN EVOLVED LANGUAGE
the answer, recall how a flash of white light rebounded a green percept to red
in figure 5.9. White light accomplished this rebound because it contains all
colors, including green and red, and so it stimulated both poles of the retinal
dipole nonspecifically. The flash of white light was an example of nonspecific
arousal (NSA).
Like the flash of white light in the red-green dipole of chapter 5, any sur
prising, arousing event tends to elicit NSA and so has the capacity to rebound
cortical activity and initiate the learning of new information at contextually
inactive sites. To understand how this learning begins, let us continue the pre
ceding example by imagining that our Spanish speaker has been wondering at
the force with which English speakers use the word sheet. Suddenly she realizes
that they are not saying /
∫it/ at all; they are saying /∫
I
t/. This shocking devel
opment unleashes a neocortical wave of NSA, which rebounds the polypole of
figure 7.6 as depicted in figure 7.7.
In figure 7.7, as in figure 7.6, the phone [
I
] is being presented to A
1
. In
figure 7.7, however, a burst of NSA has rebounded the A
1
polypole. In A
1
, x
2
and x
4
, which had previously been dominated by x
3
and its strong long-term
memory trace, have been nonspecifically aroused and have wrested control
from x
3
. Now x
2
is active and its long-term memory trace at z
2
can grow in re
sponse to the bottom-up input of [
I
].
Bilingualism
In a polypole like A
3
of figure 7.7, NSA turns off the sites that were on and
turns on the sites that were off. But now what is to prevent z
2
and z
3
in A
3
from
equilibrating? In figure 7.8, the A
3
long-term memory traces encoding [
I
] as
Figure 7.7.
Rebound across the vowel polypole of /i/.
SPEECH PERCEPTION
• 119
Figure 7.8.
A dipole enables bilingual code switching.
/i/ (Spanish) and /
I
/ (English) have reached equilibrium. In this state, how
could such a “balanced” bilingual ever know which language is being spoken?
Why don’t balanced bilinguals randomly perceive [
I
] as either /i/ or /
I
/ (or,
mutatis mutandis, produce /
I
/ as either [i] or /
I
/)? Worse, how can the bal
ance be maintained? If such a bilingual moves to a community where Span
ish is never spoken, why doesn’t he or she forget Spanish promptly and utterly?
Why are learning and unlearning not strictly governed by overall input
frequency?
2
One answer, which we discovered first in chapter 5, is that a rebound
complements memory into active and inactive sites, so that the new input be
comes remembered at sites which have been inactive in the current context.
No brain cell is ever activated without activating other neurons, so these inac
tive sites encode the new memory in a new, contextually modulated subnet
work. So our answer to the long-term invariance of language learning lies in a
contextual Spanish-English dipole like that of figure 7.8. When the balanced
bilingual is in a Spanish context, the Spanish pole of the contextual subnet
work is active. This biases the A
3
polypole toward interpreting [
I
] as /i/. But
when the balanced bilingual is in an English context, the English pole of the
dipole is active, and the phonemicization network is biased toward the English
phoneme /
I
/. In mixed contexts, the dipole can oscillate, and the balanced
bilingual can “code-switch” in centiseconds between Spanish and English and
between /
I
/ and /i/.
This code-switching dipole was predicted in Loritz 1990. In 1994, Klein
et al. reported a PET study of bilinguals that appears to have located part of
just such a dipole in the left putamen. They analyzed cerebral blood flow when
120 •
HOW THE BRAIN EVOLVED LANGUAGE
sequential bilinguals, who had learned a second language after age five, re
peated words in both languages. There was a significant increase in blood flow
in the left putamen when the second language was spoken. Considering that
the putamen and the other basal ganglia are also implicated in parkinsonism,
a disorder of tonic muscular control, a plausible hypothesis is that the left
putamen is implicated in maintaining articulatory posture.
3
Vowel normalization
[i
In the last chapter we observed that, because vocal tracts are all of different
lengths, the infant language learner faces the daunting task of phonemic nor
malization. That is, in the clear case of vowels, how is a child to learn that [i
mommy
],
daddy
], and [i
baby
] are all allophones of /i/ when mommy, daddy, and baby all
have vocal tracts of different lengths and therefore vowel formants at different
frequencies? Yet Kuhl (1983) established that infants learn that mommy’s and
daddy’s vowel sounds are equivalent in the first year of life!
The first part of the answer is to be found in a classic study by Peterson
and Barney (1952). They asked seventy-six men, women, and children to record
the vowels [hid], [h
I
d], [h
I
d], etc., and spectrographically measured their
formant values. The results are presented in figure 7.9.
The various English vowels clustered along axes in a 2–space defined by F
1
and F
2
, joined at the origin. Within each phoneme, male vowels were located
toward the low-frequency pole of the cluster while children’s vowels were located
toward the high-frequency pole, with female vowels in between. Rauschecker
et al. (1995) found just such an array in A
2
of rhesus monkey cortex: two tonotopic
maps, joined at the origin. Many monkeys have two types of calls which can be
broadly classed as /i/- and /u/-calls, and these calls raise the same basic prob
lem as human vowels. To determine if the call is the call of mommy, daddy, or
child, the calls must somehow be perceptually normalized. The hypothesis that
these two rhesus maps project to an A
3
like the Peterson and Barney vowel chart
has not been tested, but logically such a process must intervene between audi
tion and final phoneme perception in the human case. Positing such a normal
ization mechanism in A
2
, I omitted A
2
from figures 7.5–7.8.
Tonotopic Organization
Having now discussed the dynamics of polypoles at some length, we can re
turn to the question first raised in chapter 5 in connection with the topographic
organization of striate cortex. The existence of retinotopic organization in
vision and tonotopic organization in audition lends itself naturally to the theory
that the brain is genetically preprogrammed in exquisite detail. But as noted
in chapter 5, with some 10
8
rod cells in the retina alone and only 10
5
genes in
the entire genome, this interpretation had to be less than half the answer. Most
of the answer apparently has to do with the on-center off-surround anatomy
of the afferent visual and auditory pathways. To illustrate how polypoles en
SPEECH PERCEPTION
• 121
Figure 7.9.
English vowels of male, female, and child speakers. (Peterson and
Barney 1953. Reprinted by permission of the American Institute of Physics.)
force tonotopic organization, figure 7.10 follows a “F#” afferent from the coch
lear keyboard to the cochlear nucleus.
In figure 7.10, four axon collaterals leave the cochlea, C, encoding the
frequency F #. At the cochlear nucleus (CN), three arrive at a common site
(F #
CN
), but one goes astray to G. As F# is experienced repeatedly, long-term
memory traces in the F#-F# pathway will develop, and at CN, F # will inhibit G.
By equation 5.2, the long-term memory trace from F #
C
to G
CN
will not develop.
With experience, the tonotopic resolution of C-CN pathways will become con-
trast-enhanced and sharpened.
In this chapter, we have seen how dipoles and polypoles can account for the
phonemic perception of voice onset time, the phonemic categorization of
vowels, feature completion, phonemic interference, tonotopic organization,
and vowel normalization. These are all low-level features of speech and audi
122 •
HOW THE BRAIN EVOLVED LANGUAGE
Figure 7.10.
Tonotopic organization enforced by polypoles.
tion, and for the most part, they find analogues in the more widely studied
visual system. The fact that these auditory cases are rather simpler than com
parable cases in the visual system makes them a better starting point for under
standing the essentials of cognitive organization. In the next chapter, however,
audition finds its own complexity in the fact that speech is a serial behavior—
indeed, the most complex serial behavior known.
•
E
I
G
H
T
•
One, Two, Three
Pooh and Piglet were lost. “How many pebbles are in the sock?”
Pooh asked.
“One,” Piglet said.
“Are you sure?” Pooh said. “You’d better count again,
carefully.”
Piglet counted very slowly.
“One.”
A. A. Milne
One, two, three, four, five, six, seven, eight, nine, ten. This seems to form a
simple and perfectly natural sequence. And since the microscope had revealed
that one neuron connected to the next, behaviorists were quick to fasten on
the notion that these neural connections formed “stimulus-response chains.”
In such a chained sequence, the neuron for one could be thought to stimulate
the neuron for two, which stimulated the neuron for three, and so on, like the
crayfish tail in figure 2.6.
Although generative philosophy seemed to reject behaviorism after Chomsky’s
review of Skinner’s Verbal Behavior, it did not reject behaviorism’s belief that
the brain is a serial processor. In a serial computer program, one machine in
struction follows another. Generative philosophy and artificial-intelligence
theory merely replaced the notion that one mental stimulus follows another
with the notion that one mental instruction follows another. Like behaviorism,
this serial theory yielded superficially satisfying initial results, but the effort
ultimately failed to solve many of the same cognitive and linguistic problems
that behaviorism had failed to solve.
Bowed Serial Learning
In the first place, serial theories could not account for the fact that children,
when learning to count to ten, go through a stage in which they count one, two,
123
124 •
HOW THE BRAIN EVOLVED LANGUAGE
three, eight, nine, ten. Explanations invoking the child’s “limited attention span”
or “limited memory span” do not come to the crux of the matter. Such expla
nations just mask the behaviorist assumption that serial processing must un
derlie serial performance. The middle of the list gets lost. If stimulus-response
chains were really responsible for such serial learning, one would expect the
end to be forgotten. Why is it that the end is remembered?
Nor is learning to count an isolated case. Difficulty with the middles of lists
appears ubiquitously in the experimental psychology literature under the
rubric of the “bowed learning curve” (figure 8.1, see Crowder 1970 for a para
digmatic example). The bowed learning curve describes a pattern of results in
which items at the beginning of a list and at the end of a list are remembered
better (or learned faster) than items in the middle. But why? To understand
the bowed learning curve, consider figure 8.2, which illustrates how a competi
tive, parallel anatomy learns to count to three.
z
In figure 8.2, we look more closely at how x
j
, a node in a parallel, on-center
off-surround cerebral anatomy, can learn to count to three. That is, x
j
must
somehow faithfully remember the order of the three x
i
motor patterns x
1
, x
2
,
and x
3
, which correspond to the English words one, two, and three. In an ART
anatomy, x
j
must remember this at its three long-term memory (LTM) sites,
j1
, z
j2
, and z
j3
.
Recall now that any z
ji
can grow only when both sites x
i
and x
j
are “on” (see
table 5.1 and equation 5.2). Then, at time t = 1, x
1
is active, and z
j1
grows.
1
At t =
2, x
2
will be activated, but it will be inhibited by the persistent, lateral inhibitory
surround of x
1
. At t = 3, x
3
will be activated, but it will be inhibited by both x
1
and
x
2
, so the trace z
j3
cannot grow as rapidly as z
j2
, much less z
j1
. In time, with re
peated rehearsal, the gradient of LTM strengths in figure 8.2 will become z
j1
> z
j2
> z
j3
, and x
j
will remember the serial order one, two, three. Thereafter, activation of
x
j
will cause the remembered serial pattern to be “read out” across x
1–3
: x
1
will be
gated by the largest LTM trace, z
j1
, so it will receive the largest signal from x
j
.
The first motor control site to reach threshold will therefore be x
1
and the sys-
Figure 8.1.
Bowed learning curve.
ONE
,
TWO
,
THREE
• 125
Figure 8.2.
Learning to count to three.
tem will perform the word one. After x
1
is produced, x
2
, gated by the next-largest
LTM trace, z
j2
, will be the next site to reach threshold, and it will perform two.
Finally, x
3
will perform three. In this manner, the serial behavior one, two, three is
learned and performed by a parallel, cerebral architecture.
x
There are, however, problems and limits to this simple parallel architec
ture. Figure 8.3 depicts the first such problem, which is encountered when
learning the end of a list. In figure 8.3, when nine is learned, x
9
inhibits the
next item at x
10
. But inhibition from x
9
also works backward, inhibiting x
8
! When
10
is learned, it will likewise inhibit x
9
and x
8
. But if x
10
is the last element of
the list, there will be no x
11
or x
12
to inhibit it! Accordingly, an x
8
< x
9
< x
10
short-
term memory (STM) activity gradient will develop. With time, this will trans
late into a z
j8
< z
j 9
< z
j10
LTM gradient.
This kind of backward learning defied explanation under serial theories,
but ART still has some explaining to do, too. Otherwise, it would imply that
children learn to count backward when they learn to count forward! Before
addressing this problem, let us see how the LTM gradients solve the problem
of the lost middle.
z
If we combine figures 8.2 and 8.3 in figure 8.4, the LTM gradient z
j1
> z
j2
>
j3
creates a “primacy effect” whereby earlier elements of a list are learned bet
ter and faster. At the same time, the LTM gradient z
j10
> z
j9
> z
j 8
creates a “recency
effect” whereby later elements of a list are learned better and faster. The middle
of the list is inhibited by both of these effects. That is why it is learned worst
and last.
So why don’t children learn to count to ten in the fashion of one, two, three,
ten, nine, eight? In order to completely account for serial learning, we must first
126 •
HOW THE BRAIN EVOLVED LANGUAGE
Figure 8.3.
Learning eight, nine, ten.
differentiate between short and long lists. Short lists like one, two, three can hardly
be said to have a middle. They tend to exhibit primacy effects and are not prone
to bowing and recency effects. These latter effects only begin to appear in
longer lists.
2
To achieve a reliable performance, the child must “chunk” this
long list into several shorter sublists, each organized by the primacy effect, for
example, (one two three four) (five six seven) (eight nine ten).
Figure 8.4.
Primacy and recency effects produce bowing.
ONE
,
TWO
,
THREE
• 127
Unitization
In a famous paper, “The Magic Number Seven,” George Miller (1956) reviewed
the serial-learning literature and concluded that seven, “plus or minus two,”
was an apparent limit on the length of lists which could be learned. Miller ar
gued that any longer list would normally be “chunked” and memorized as a
list of several smaller sublists. Note, however, that even lists of length seven,
like U.S. phone numbers, tend to be broken into smaller chunks of three or
four items.
From figures 8.2–8.4, we can observe that serial bowing depends largely
upon the extent of the inhibitory surround. For example, if the radius of inhi
bition in figure 8.4 were only two nodes, then the absence of x
11
would cause a
recency effect to appear at x
9
. If, however, the inhibitory surround extended
three nodes left and right, the absence of x
11
would create a recency effect at
x
8
. Accordingly, we may take the extent of inhibitory axons to provide a physi
ological basis for the “magic number.” A typical, inhibitory, cortical basket cell
axon collateral might have a radial extent of 0.5 mm and synapse with some
300 target neurons (Douglas and Martin 1990).
3
Along a single polypole ra
dius within 5 degrees of arc, a single collateral would therefore synapse with
about 4 target neurons, making four items a reasonable biological upper limit
on the transient memory span. We therefore take the magic number to be more
on the order of “four, plus or minus two.” Following Grossberg, we will call this
the transient, or immediate, memory span, and we will refer to the “chunking”
process as unitization.
Perseveration
The preceding discussion explains how serial behavior like one, two, three can
be learned by a parallel brain, but it raises yet another critical question. Since
one is performed first because it dominates two and succeeding items, why
doesn’t one tyrannically maintain that domination? Why doesn’t the anatomy
perseverate and count one one one one one . . . ? In fact, this is very nearly what
happens when one stutters, but why don’t we stutter all the time?
Following Cohen and Grossberg (1986), we can solve this problem by sim
ply attaching an inhibitory feedback loop to each node in figure 8.4, as in fig
ure 8.5. Now, when one completes its performance, it inhibits itself, thereby
allowing two to take the stage.
This is a simple solution to the stuttering problem, but it is not without its
own complications. The inhibitory feedback loops in figure 8.5 are “suicide
loops.” If they inhibit the x
i
sites as soon as the x
i
are stimulated, then no learn
ing could ever occur! Each x
i
would also immediately cease inhibiting its neigh
bors, and no serial order gradient could be learned either!
Grossberg (1986) suggested that an (inhibitory) “rehearsal wave” could
turn these suicide loops off while the system was learning. One would thereby
be allowed to perseverate during learning and so inhibit two, three, etc., long
128 •
HOW THE BRAIN EVOLVED LANGUAGE
Dostları ilə paylaş: |