Figure 4.11.
Columnar organization in neocortex. (Eccles 1977, after Szentágothai
1969. Reprinted by permission of McGraw-Hill Book Company.)
(Spec. aff. on figure 4.11) arise into the neocortical sheet, innervating smallish
stellate cells (S
n
), and defining a column. One supposes that it is difficult for a
small synaptic input from the afferent axon collateral of a distant neuron to trig
ger a response in a single large pyramidal cell. Rather, the small input triggers a
chain reaction among the smaller stellate cells. This chorus then excites the large
pyramidal cell of the column. When a column becomes thus innervated, the
pyramidal cell eventually reaches threshold and generates the column’s output:
a volley of spikes is sent along the axon to many other distant columns.
Szentágothai’s schematic of this organization (figure 4.11) was developed
after experiments by Mountcastle (1957) which demonstrated that neocortex
responded better to electrodes placed perpendicular to the cortical sheet (line
P in figure 4.12) than to electrodes inserted obliquely (O in figure 4.12).
Independently of the work of Mountcastle and Szentágothai, Hubel and
Wiesel popularized use of the term “column” in another sense (which we will
encounter in chapter 5, figure 5.4f), so researchers began instead to use the term
“barrel” to refer to the columns of figure 4.11 (Welker et al. 1996). In this meta
phor, we think of a single, afferent axon as defining the center of a neural bar
rel. Within the barrel, a number of excitatory pyramidal and stellate cells become
activated by the input, as well as some number of basket (large dark cells in
Szentágothai’s drawing) and chandelier cells (absent in Szentágothai’s drawing),
which inhibit surrounding barrels.
6
In this view, the barrel is more of a statistical
entity, a kind of distribution of the probability of an afferent axon innervating
excitatory and inhibitory cells.
72 •
HOW THE BRAIN EVOLVED LANGUAGE
Figure 4.12.
Perpendicular, not oblique, stimuli activate neocortex.
In either view, we pause to ask what stops the inhibitory cells “in the barrel”
from inhibiting the excitatory cells in the barrel. That is, what stops the barrel
from committing neural suicide? The answer lies in inspection of the lateral
extent of the axon collaterals of the inhibitory cells (figure 4.8). If we stipulate
that these collaterals cannot consummate the act of synapsing until they reach a
kind of neural puberty, then they can be prevented from synapsing with pyrami
dal cells in their own barrel. This leads directly to the “planar” view of cortex.
Planar Organization
In the planar view of cortex, we look down upon the cortical sheet as in the
surgical view, but we look more closely. Each afferent input defines the on-
center of a barrel, and surrounding that on-center are two concentric rings.
Like a pebble dropped in a still pool, there is an on-center peak at the point of
impact, and waves ripple out from it. The innermost, inhibitory wave follows a
Gaussian probability distribution: it peaks at the radius where axons of most of
the barrel’s inhibitory cells “reached puberty” and began to form synapses.
7
The outermost, excitatory wave follows a Gaussian probability distribution that
peaks at the radius where the barrel’s excitatory cells reached puberty. These
waves do not simply spread and dissipate, however. They interact in complex
patterns with the waves of other barrels.
One of the first researchers to take this planar view and explore these com
plex patterns was von der Malsburg (1973). Using a variant of the modeling
equations developed by Grossberg (1972a), von der Malsburg constructed a pla
THE SOCIETY OF BRAIN
•
73
Figure 4.13.
Von der Malsburg’s planar cortex. (Von der Malsburg 1973. Reprinted
by permission of Springer-Verlag.)
nar computer model of striate cortex (figure 4.13). Von der Malsburg’s simula
tion used an on-center off-surround architecture to recognize inputs. Early neural
network models simply sent the excitatory output of some individual “neurode”
directly to some other neurode. Von der Malsburg essentially added the off-
surround, inhibitory cells that were missing in figure 4.11. When stimulated, each
barrel now increased its own activity and decreased that of its neighbor.
Missing, however, from von der Malsburg’s model was the fact that in neo
cortex, barrels also send long-distance, excitatory pyramidal cell output to many
other barrels. They also receive reciprocal excitatory feedback from those other
barrels. In the next chapter we will build and test a neocortical model that adds
these missing elements to the planar model.
74
•
HOW THE BRAIN EVOLVED LANGUAGE
•
F
I
V
E
•
Adaptive Resonance
One cannot step into the same river twice.
Heraclitus
In chapters 3 and 4, we glimpsed the marvelous biochemical and anatomical
complexity of the human brain. But in a single breath of a summer wind, a
million leaves turn and change color in a single glance. The mind need not
read meaning into every turning leaf of nature, but neither the hundreds of
neurochemical messengers of chapter 3 nor the forty-odd Brodmann areas of
chapter 4 can begin to tally the infinite complexity of an ever-changing envi
ronment. To gain even the smallest evolutionary advantage in the vastness of
nature, a brain must combinatorially compute thousands and millions of pat
terns from millions and billions of neurons. In the case of Homo loquens, as we
estimated in chapter 1, the competitive brain must be capable of computing
something on the order of 10
7,111,111
patterns.
But how can we begin to understand a brain with 10
7,111,111
possible con
figurations? As the reader by now suspects, our technique will be to study
minimal anatomies—primitive combinations of small numbers of synapses.
First we will model the behavior of these minimal anatomies. Then we will
see how, grown to larger but self-similar scales, they can explain thought and
language.
We have already seen several minimal anatomies. In chapter 2 we some
what fancifully evolved a bilaterally symmetrical protochordate with a six-
celled brain. Then, in chapter 4, we touched upon Hartline’s work detailing
the horseshoe crab’s off-center off-surround retina and sketched a preview
of the on-center off-surround anatomy of the cerebrum. Learning by on-
center off-surround anatomies has been the focus of Grossberg’s adaptive reso-
nance theory (ART), and it is from this theory that we now begin our approach
to language.
74
ADAPTIVE RESONANCE
•
75
From Neocortex to Diagram: Resonant On-Center
Off-Surround Anatomies
Figure 5.1a is a reasonably faithful laminar diagram of neocortex, but for sim
plicity each barrel is modeled by a single excitatory pyramidal cell and a single
inhibitory cell. Afferent inputs arise from the white matter beneath the cortex
and innervate the barrels. A single fine afferent axon collateral cannot by it
self depolarize and fire a large pyramidal cell. So figure 5.1a has the afferent
fiber fire smaller, stellate cells first. These stellate cells then fire a few more
stellate cells, which each innervate a few more stellate cells, and so on. Even
tually, by this kind of nonlinear mass action, an activated subnetwork of stel
late cells fires the barrel’s large pyramidal and inhibitory cells. The on-center
pyramidal cell sends long-distance outputs, while the inhibitory cell creates an
off-surround.
Figure 5.1.
Three schematics of on-center off-surround anatomies. (a) is a biologi
cally faithful schematic detailing pyramidal cells and inhibitory basket cells. (b) and
(c) abstract essential design elements.
76 •
HOW THE BRAIN EVOLVED LANGUAGE
In figure 5.1b, we abstract away from Figure 5.1a, and we no longer explic
itly diagram inhibitory cells. Following White 1989, we also treat stellate cells as
small pyramidal cells, so each node in F
2
of figure 5.1b can be interpreted as
either a local subnetwork of stellate-pyramidal cells or a distal network of pyra
midal cells. In either case, F
1
remains an on-center off-surround minimal anatomy.
Figure 5.1c abstracts still further, no longer explicitly diagramming the on-
center resonance of F
2
nodes. In the diagrams of minimal anatomies that fol
low, it is important that the reader understand that a circle can stand for one
cell or many, while “on-center loops” like those in figure 5.1c can represent
entire, undiagrammed fields of neurons. Since we will focus almost exclusively
on cerebral anatomies, and since the on-center off-surround anatomy is ubiq
uitous in neocortex, I will often omit even the on-center loops and off-surround
axons from the diagrams.
Gated Dipole Rebounds
In a series of papers beginning in 1972, Grossberg reduced the on-center off-sur-
round architecture of figure 5.1 to the gated dipole minimal anatomy. This, in turn,
led to a series of remarkable insights into the structure and functioning of mind.
Consider, for example, the rather familiar example at the top of figure 5.2:
stare at the black circles for fifteen seconds (longer if the lighting is dim). Then
close your eyes. An inverse “retinal afterimage” appears: white circles in a black
Figure 5.2.
A McCollough rebound occurs by switching the gaze to the lower pane
after habituating to the upper pane.
ADAPTIVE RESONANCE
•
77
field!
1
Although this percept is often called a “retinal afterimage,” it arises
mainly in the lateral geniculate nucleus of the thalamus and neocortex (Livings
tone and Hubel 1987). If, while staring at figure 5.2, a flashbulb suddenly in-
creases the illumination, an inverse image also appears—and it can occur during
as well as after image presentation. (If you don’t have a flashbulb handy, you
can simulate this effect by staring at figure 5.2 and then abruptly shifting your
gaze to the focusing dot in the center of the all-white at the bottom of figure
5.1.) Both decreasing illumination (closing the eyes) and increasing illumina
tion (the flashbulb effect) can create inverse percepts, and this can happen
during, as well as after, a sensation. We can account for all of these effects with
the minimal anatomy in figure 5.3.
Figure 5.3.
The McCollough effect. A red-green gated dipole: (a) With white-light
input, both poles are active. (b) With red input, the red pole is active and neuro
transmitter depletes at the r
0
–r
1
synapse. (c) Closing the eyes allows background
activity in the green pole to dominate competition and produce a retinal after
image. (d) Alternatively, NSA (e.g., a flash of white light) can produce an after
image rebound, even while red input is maintained.
78 •
HOW THE BRAIN EVOLVED LANGUAGE
In the human visual system, black and white, blue and yellow, and red and
green response cells are all arrayed in gated dipoles. This leads to a group of
phenomena collectively known as the McCollough effect (McCollough 1965; see
also Livingstone and Hubel 1987). Under white light, as schematized in figure
5.3a, red and green receptor cells compete to a standoff. White is perceived, but
no red or green color percept is independently output. In figure 5.3b, red light
illuminates the dipole. The red pole inhibits the green pole via the inhibitory
interneuron i
rg
, so only the red pole responds, and a red percept is output from
r
2
. After protracted viewing under intense illumination, however, neurotransmit
ter becomes depleted at the r
0
– r
1
synapse, and the dipole becomes unbalanced.
Neurons maintain a low level of random, background firing even in the absence
of specific inputs, so if specific inputs are shut off (e.g., if the eyes are closed), as
in figure 5.3c, then the green pole will come to dominate the depleted red pole
in response to background activation. On the other hand, in figure 5.3d, a burst
of white light stimulates the unbalanced dipole. Because this burst of white light
contains equal amounts of red and green light, it is an example of what ART
calls nonspecific arousal (NSA). Even if the original red input is maintained dur
ing this burst, so more red than green remains in the total stimulus spectrum,
the green pole still gains control because of its greater neurotransmitter reser
2
voir at synapse g
0
– g
1
.
One of Sherrington’s many contributions to the understanding of the
nervous system was his description of neuromuscular control in terms of ago-
nist-antagonist competition. After Sherrington, “antagonistic rebounds,” in
which an action is reflexively paired with its opposite reaction, began to be
found everywhere in neurophysiology. Accordingly, Grossberg referred to
events like the red-green reversal of figure 5.3 as “antagonistic dipole rebounds.”
Mathematical Models of Cerebral Mechanics
Grossberg analyzed the gated dipole and many related neural models math
ematically, first as a theory of “embedding fields” and then as the more fully
developed Adaptive Resonance Theory. Our purpose in the following chap
ters will be to analyze similar neural models linguistically, but the preceding
example offers an opportunity to make a simplified presentation of Grossberg’s
mathematical models. One of the advantages of mathematical analysis is that,
although it is not essential to an understanding of adaptive grammar, it can
abstract from the flood of detail we encountered in the previous chapters and
bring important principles of system design into prominence. A second rea
son to develop some mathematical models at this point is that in the second
half of this chapter we will use them to build a computer model of a patch of
neocortex. We will then be better prepared to explore the question of how
language could arise through and from such a patch of neocortex. Finally, many
of the leading mathematical ideas of ART are really quite simple, and they can
give the nonmathematical reader a helpful entry point into the more mathe
matical ART literature.
ADAPTIVE RESONANCE
•
79
ART Equations
The central equations of Grossberg’s adaptive resonance theory model a short-
term memory trace, x, and a long-term memory trace, z. Using Grossberg’s basic no
tation, equations 5.1 and 5.2 are differential equations that express the rate of
change of short-term memory ( x in 5.1) and long-term memory ( z in 5.2):
x
[
j
= – Ax
j
+ Bx
i
z
ij
(5.1)
z
ij
[ = – Dz
ij
+ Ex
i
x
j
(5.2)
Equation 5.1 is a differential equation in dot notation. It describes the rate of
change of a short-term memory trace, x
j
. We can think of this short-term memory
trace as the percentage of Na
+
gates that are open in a neuron’s membrane.
In the simplest model, x
j
decreases at some rate A. We can say that A is the rate
at which the neuron (or neuron population) x
j
“forgets.” Equation 5.1 also states
that x
j
increases at some rate B. The quantity B is one determinant of how fast
x
j
depolarizes, or “activates.” We can think of this as the rate at which x
j
“learns,”
but we must remember that here we are talking about short-term learning. Per
haps we should say B determines how fast x
j
“catches on.”
The rate at which x
j
catches on also depends upon a second factor, z
ij
, the
long-term memory trace. We can think of this long-term memory trace as the
size of the synapse from x
i
to x
j
(cf. the synapses in figure 5.3). Changes in z
ij
are modeled by equation 5.2. Equation 5.2 says that z
ij
decreases (or forgets)
at some rate D, and that z
ij
also increases at some rate E, a function of x
i
times x
j
.
We can say that z
ij
“learns” (slowly) at the rate E.
It is important to note that A, B, D, and E are all shorthand abbreviations
for what, in vivo, are very complex and detailed functions. The rate B, for ex
ample, lumps all NMDA and non-NMDA receptor dynamics, all glutamate,
aspartate, and GABA neurotransmitters, all retrograde neurotransmitters and
messengers, all neurotransmitter release, reuptake, and manufacture processes,
membrane spiking thresholds, and who knows what else into a single abstract
function relating barrel x
j
to barrel x
i
across synapse z
ij
. This may make B seem
crude (and undoubtedly it is), but it is the correct level of abstraction from
which to proceed.
It is also important to note that, by self-similarity, ART intends x
j
and z
ij
to
be interpretable on many levels. Thus, when speaking of an entire gyrus, x
j
might correlate with the activation that is displayed in a brain scan. When speak
ing of a single neuron, x
j
can be interpreted as a measure of the neuron’s ac
tivation level above or below its spiking threshold. When speaking of signal
propagation in the dendritic arborization of a receptor neuron, x
j
can be
interepreted as the membrane polarization of a dendritic branch. In these last
two cases, a mathematical treatment might explicitly separate a threshold func
tion
Γ
out from equation 5.1, changing +Bx
i
z
ij
into something like + B
Γ
( x
i
, z
ij
).
Usually, however, ART equations like 5.1 and 5.2 describe neural events on the
scale of the barrel or of larger, self-similar subnetworks. At these scales, x
i
may
80 •
HOW THE BRAIN EVOLVED LANGUAGE
have dozens or thousands of pyramidal output neurons, and at any particular
moment, 3 or 5 or 5000 of them may be above spiking threshold. The subnet
work as a whole will have a quenching threshold, above which the activity of neu
rons in the subnetwork will be amplified and below which the activity of
neurons in the subnetwork will be attenuated. But the subnetwork as a whole
need not exhibit the kind of “all-or-none” discrete spiking threshold that has
been claimed for individual neurons, so ART equations do not usually elabo
rate a term for thresholds. Instead, they use nonlinear gating functions.
Nonlinearity
There is only one way to draw a straight line, but there are many ways to be
nonlinear, and nonlinearity holds different significance for different sciences.
ART equations are nonlinear in two ways that are especially important to cog
nitive modeling: they are (1) sigmoidal and (2) resonant.
The curves described by ART equations and subfunctions (like A, B, D, and
E above) are always presumed to be sigmoidal (
∫-shaped). That is to say, they
are naturally bounded. For example, a neuron membrane can only be activated
until every Na
+
channel is open; it cannot become more activated. At the lower
bound, a membrane can be inhibited only until every Na
+
channel is closed; it
cannot become more inhibited.
3
So x
j
has upper and lower limits, and a graph
of its response function is sigmoidal. In equation 5.2, z
ij
is similarly bounded
by the sigmoidal functions D and E.
z
Equations 5.1 and 5.2 also form a nonlinear, resonant system. The LTM trace
ij
influences x
j
, and x
j
influences z
ij
. Both feedforward and feedback circuits
exist almost everywhere in natural neural systems, and feedforward and feed
back circuits are implicit almost everywhere in ART systems: for every equa
tion 5.1 describing x
j
’s response to x
i
, there is a complementary equation 5.1'
describing x
i
’s reciprocal, resonant, response to x
j
.
4
This is the same kind of
nonlinearity by which feedback causes a public address system to screech out
of control, but equations 5.1 and 5.2 are bounded, and in the neural systems
they describe, this kind of feedback makes rapid learning possible.
Shunting
The fact that the terms of ART equations are multiplicative is an important
detail which was not always appreciated in earlier neural network models.
Imagine that table 5.1 presents the results of four very simple paired associate
learning experiments in which we try to teach a parrot to say ij (as in “h-i-j-k”).
In experiment A, we teach the parrot i and then we teach it j. The parrot learns
ij. In experiment B, we teach the parrot neither i nor j. Not surprisingly, the
parrot does not learn ij. In experiment C, we teach the parrot i, but we do not
teach it j. Again, it does not learn ij. In experiment D, we do not teach the
parrot i, but we do teach it j. Once again, it does not learn ij.
ADAPTIVE RESONANCE
•
81
TABLE 5.1
.
Truth table for logical AND
(multiplication).
A
B
C
D
i
1
0
1
0
j
1
0
0
1
Learned?
1
0
0
0
The not-very-surprising results of our parrot experiment clearly reflect the
truth table for multiplication. So ART computes learning by multiplying x
i
by x
j
in equation 5.2. Similarly, in equation 5.1, ART multiplies x
i
by z
ij
. In the litera
ture on artificial neural networks and elsewhere, engineers often refer to such
multiplicative equations as shunting equations.
Habituation
In the psychological literature habituation is said to occur when a stimulus ceases
to elicit its initial response. The rebound described in figure 5.3 is a common
example of a habituation effect. Although the term is widely and imprecisely
used, we will say that habituation occurs whenever neurotransmitter becomes
depleted in a behavioral subcircuit. Neurotransmitter depletion can be de
scribed with a new equation:
n
if
[ = + Kz
if
– Fn
if
x
i
(5.3)
Equation 5.3 states that the amount of neurotransmitter n at synapse ij grows
at some rate K, a function of z
ij
, the capacity of the long-term memory (LTM)
trace. By equation 5.3, n
ij
is also depleted at a rate F, proportional to the pre
synaptic stimulation from x
i
. Put differently, z
ij
represents the potential LTM
trace, and n
ij
represents the actual LTM trace. Put concretely, at the scale of
the single synapse, z
ij
can be taken to represent (among other factors) the avail
able NMDA receptors on the postsynaptic membrane, while n
ij
represents
(among other factors) the amount of presynaptic neurotransmitter available
to activate those receptors. Given equation 5.3, equation 5.1 can now be elabo
rated as 5.4:
x
[
j
= – Ax
j
+ B
∑ n
if
x
j
– C
∑ n
kj
x
j
(5.4)
i
≠
j
k
≠
j
Equation 5.4 substitutes actual neurotransmitter, n
ij
, into the original Bx
i
z
ij
term
of equation 5.1. It then elaborates this term into two terms, one summing
multiple excitatory inputs, +B
∑
n
ij
x
j
, and the second summing multiple inhibi
tory inputs, – C
∑
n
kj
x
j
. This makes explicit the division between the long-distance,
on-center excitatory inputs and the local, off-surround inhibitory inputs dia
grammed in figure 5.1.
82 •
HOW THE BRAIN EVOLVED LANGUAGE
Habituation, specific and nonspecific arousal, and lateral inhibition, as
described in equations 5.2–5.4, give rise to a computer model of cerebral cor
tex and a range of further cognitive phenomena, including noise suppression,
contrast enhancement, edge detection, normalization, rebounds, long-term
memory invariance, opportunistic learning, limbic parameters of cognition,
P300 and N400 evoked potentials, and sequential parallel memory searches.
These, in turn, form the cognitive basis of language.
A Quantitative Model of a Cerebral Gyrus
z
Figure 5.4a–e shows what happens when equations 5.2–5.4 are used to create
a computer model of a cerebral gyrus.
5
The model gyrus in figure 5.4 is twenty-
three barrels high by forty-eight barrels wide. Each barrel forms forty-eight
excitatory synapses with other barrels at radius 3–4 and twenty-four inhibitory
synapses with twenty-four barrels at radius 1–2. The gyrus is modeled as a closed
system in the shape of a torus: barrels at the top edge synapse with the bottom
edge, and barrels at the left edge synapse with the right edge. Each synapse’s
ij
and n
ij
are modeled by equations 5.2 and 5.3, while each barrel’s x
j
is mod
eled by equation 5.4. Figure 5.4 displays the activation level of each barrel (its
x
j
value) according to the gray scale at the bottom: black is least active and white
is most active.
At time t = 0 in figure 5.4a, the gyrus is an inactive, deep-gray tabula rasa.
Specific inputs I are applied to target nodes at [ x y] coordinates [10 9], [10
11], [10 13], and [10 15], and a black, inhibited surround begins to form. At
t = 1 after another application of specific inputs to the target field, resonant
activation begins to appear at radius 3–4 (figure 5.4b).
Noise suppression
At time t = 1 (figure 5.4b), the target nodes are activated above the level of the
rest of the gyrus, and a black, inhibitory surround has formed around them.
The inhibitory surround is graphic illustration of how noise suppression arises as
an inherent property of an on-center off-surround system: any noise in the
surround that might corrupt the on-center signal is suppressed.
Contrast enhancement
Figure 5.4b also illustrates contrast enhancement, a phenomenon similar to noise
suppression. The target nodes at t = 1 are light gray; that is, their activity is con
trastively “enhanced” above the background and the target nodes at t = 0 (fig
ure 5.4a). This enhancement is not only due to repeated, additive specific
inputs; it is also due to resonant feedback to the target node field from the
emerging active fields at radius 3–4 and the lateral inhibition just described
under noise suppression.
ADAPTIVE RESONANCE
•
83
Dostları ilə paylaş: |