|
|
||||||||
The Journal of Neurophysiology Vol. 84 No. 3 September 2000, pp. 1224-1239
Copyright ©2000 by the American Physiological Society
1Department of Psychiatry, 2Department of Physiology, 3W. M. Keck Center for Integrative Neuroscience, and 4Sloan Center for Theoretical Neurobiology at UCSF, University of California, San Francisco, California 94143-0444
| |
ABSTRACT |
|---|
|
|
|---|
Troyer, Todd W. and
Allison J. Doupe.
An Associational Model of Birdsong Sensorimotor Learning II.
Temporal Hierarchies and the Learning of Song Sequence.
J. Neurophysiol. 84: 1224-1239, 2000.
Understanding the neural mechanisms underlying serially ordered
behavior is a fundamental problem in motor learning. We present a
computational model of sensorimotor learning in songbirds that is
constrained by the known functional anatomy of the song circuit. The
model subsumes our companion model for learning individual song
"syllables" and relies on the same underlying assumptions. The extended model addresses the problem of learning to produce syllables in the correct sequence. Central to our approach is the
hypothesis that the Anterior Forebrain Pathway (AFP) produces signals
related to the comparison of the bird's own vocalizations and a
previously memorized "template." This "AFP comparison
hypothesis" is challenged by the lack of a direct projection from the
AFP to the song nucleus HVc, a candidate site for the generator
of song sequence. We propose that sequence generation in
HVc results from an associative chain of motor and sensory
representations (motor
sensory
next motor ... ) encoded
within the two known populations of HVc projection neurons. The sensory
link in the chain is provided, not by auditory feedback, but by a
centrally generated efference copy that serves as an internal
prediction of this feedback. The use of efference copy as a substitute
for the sensory signal explains the ability of adult birds to produce normal song immediately after deafening. We also predict that the AFP
guides sequence learning by biasing motor activity in nucleus RA, the premotor nucleus downstream of HVc. Associative learning then remaps the output of the HVc sequence generator. By
altering the motor pathway in RA, the AFP alters the correspondence between HVc motor commands and the resulting sensory feedback and
triggers renewed efference copy learning in HVc. Thus, auditory feedback-mediated efference copy learning provides an indirect pathway
by which the AFP can influence sequence generation in HVc. The model
makes predictions concerning the role played by specific neural
populations during the sensorimotor phase of song learning and
demonstrates how simple rules of associational plasticity can
contribute to the learning of a complex behavior on multiple time scales.
| |
INTRODUCTION |
|---|
|
|
|---|
Like many complex
behaviors, birdsong is arranged in a temporal hierarchy. In zebra
finches, song consists of a few short introductory notes, followed by
several repetitions of a stereotyped sequence of vocal gestures, or
"syllables," separated by brief periods of silence (Sossinka
and Böhner 1980
). Song is learned in two phases. First,
birds listen to and memorize a tutor song, or "template"
(Konishi 1965
; Marler 1964
). Later,
during sensorimotor learning, birds use auditory feedback
from their own vocalizations to gradually match their vocal output to
the template. In the companion paper (Troyer and Doupe
2000
), we focused on one level of the hierarchy for song and
showed how simple associational (Hebbian) learning rules could be used
to learn the motor representations for individual tutor syllables. The
syllable learning model addresses the important problem of feedback
delay and demonstrates that associational plasticity naturally leads to
the learning of an efference copy, or internal prediction, of the
auditory feedback. This internal prediction can then be compared with
the memorized tutor song to guide sensorimotor learning.
In this paper, we address a second fundamental problem in motor
learning, the question of serial order in behavior (Lashley 1951
), by extending our syllable learning model to account for the learning of syllable sequence. As in our companion paper, we use
simple rules of associational plasticity and assume that the template
comparison signals that guide learning are provided by the Anterior
Forebrain Pathway (AFP), a circuit that passes through avian basal
ganglia, thalamic, and cortex-like nuclei before projecting back onto
the motor pathway (see Troyer and Doupe 2000
; Fig.
1). We also assume a functional
segregation between the two known populations of projection neurons in
song nucleus HVc (Nordeen and Nordeen 1988
; HVc used as
proper name, Margoliash et al. 1994
), with
AFP-projecting HVc neurons (HVc_AFP) receiving auditory feedback and
encoding signals in sensory coordinates, and HVc neurons projecting to
the robust nucleus of the archistriatum (RA; HVc_RA) more closely tied
to a motor code (Troyer and Doupe 2000
). The main
biological constraint addressed in this paper is the hierarchical
organization of the motor pathway (Fig. 1): the detailed motor programs
for individual syllables are believed to be contained in nucleus RA
(Vu et al. 1994
; Yu and Margoliash 1996
),
whereas the central pattern generator for song sequence is likely
to be found upstream of RA, perhaps within the song nucleus HVc
(Vu et al. 1994
). Our sequence learning model addresses two key questions left unanswered by current experimental data: what is
the mechanism for sequence generation in HVc, and how can signals from
the AFP guide sequence learning given that there are no known
connections from the AFP to HVc?
|
We propose that sequence generation results from a reciprocal chaining
of motor and sensory representations [motor (HVc_RA)
sensory
(HVc_AFP)
next motor (HVc_RA)
next sensory
(HVc_AFP) ... ] between the two populations of HVc projection
neurons. Our model differs from classic "associative chaining"
models (James 1983
) in that the "sensory" component
in this chain is actually an efference copy, a motor signal that serves
as a prediction of the expected sensory feedback (Sperry
1950
; von Holst and Mittelstaedt 1980
).
We also propose that AFP-guided teaching signals act to remap the
connections from HVc to RA, so that the output of the HVc pattern
generator maps onto the sequence of motor features (encoded in RA) that
matches the memorized tutor song (cf. Doya and Sejnowski
1998
). However, simply remapping HVc outputs cannot explain
AFP-guided learning within HVc. In our model, auditory feedback-driven efference copy learning provides the crucial link between the AFP and HVc. By altering the HVc outflow tract, the AFP
alters the association between HVc_RA motor activity and the auditory
feedback received by HVc_AFP. The resulting efference copy learning
then changes the motor-sensory interaction underlying sequence
generation in HVc.
Our model demonstrates that associational learning, distributed throughout the motor pathway, is sufficient for learning both individual syllables and their proper sequence. The model provides a specific hypothesis for how basal ganglia-forebrain loops could contribute to learning a sequential behavior and highlights key computational problems imposed by the functional anatomy of the song circuit. More generally, the model provides a framework that relates the neural mechanisms underlying song learning to fundamental problems in motor learning and speech production.
Model and approach
In this paper, we extend our previous model for learning
individual syllables (Troyer and Doupe 2000
) to address
the learning of syllable sequence. Our sequence learning model subsumes
our syllable model, accomplishing syllable learning as well as the learning of syllable sequence. The structure of this paper mirrors that
of the preceding companion paper (Troyer and Doupe 2000
) and relies on the same underlying biological assumptions. We present our results in the form of two closely related models: a "conceptual model" containing a self-consistent set of functional hypotheses, and
a "computational model" that incorporates these hypotheses into a
working computer algorithm. In this section, we describe the functional
problems addressed by our sequence learning model and outline the key
elements of our proposed solutions. Then, we present our conceptual
model, which describes our functional hypotheses in greater detail.
Quantitative results from our computational model are presented in the
RESULTS section. Because our model is relatively abstract
at the level of local circuits, implementation of these hypotheses was
governed chiefly by considerations of computational simplicity. Related
issues are described in the METHODS but are not crucial for
understanding the main functional implications of the model. The
details of our computer algorithm are confined to an APPENDIX.
Problems addressed
Our model explores how song learning can result from associational
learning, guided by template comparison signals transmitted by the AFP.
We do not address learning the detailed temporal structure within each
syllable, nor learning the length of syllables and intersyllable gaps.
Timing of song syllables is provided by a rhythmically clocked premotor
drive arriving in HVc_RA (Troyer and Doupe 2000
). While
the timing of this drive is fixed, its pattern is
completely random; the magnitude of each component of the premotor
input is generated independently for each vocalization produced by the
model. The model's task is to take this unstructured premotor timing
signal and convert it to a sequence of syllables matched to the tutor template.
The learning of motor representations for individual song syllables was
addressed in the preceding companion paper (Troyer and Doupe
2000
). This model contained three key functional elements. By
associating premotor commands in HVc_RA with auditory feedback arriving
in HVc_AFP, a motor
sensory efference copy mapping develops between
the two populations of HVc projection neurons (Fig.
2, marked 1). After this
mapping develops, HVc_AFP activity driven by a given HVc_RA motor
command encodes a sensory prediction of the vocal output resulting from
that command. This prediction is then compared with the template in the
AFP, resulting in a global reinforcement signal that modulates
plasticity in all RA neurons (Fig. 2, marked 2). This
reinforcement learning leads to a pattern of connectivity in RA in
which neurons encoding the same tutor syllable become strongly
connected (Fig. 2, marked 3). As a result, RA has a strong
tendency to produce coherent patterns of motor activity matched to the
syllables in the tutor template.
|
Given our adoption of the AFP comparison hypothesis, the most difficult problem regarding sequence learning is the following: how can the AFP guide learning given that 1) the only known output from the AFP projects to RA, and 2) the site of sequence generation is likely to be upstream of RA? Our solution involves the concerted action of multiple associational mechanisms acting at different levels of the motor hierarchy. For ease of presentation, we will break this problem into three smaller problems, described below (see Conceptual model). However, our choice of solution to each individual problem is affected by the other two, as well as constraints imposed by our solution to the problem of syllable learning. The key to our model is the concept of efference copy, which serves to link all model components into a coherent hypothesis regarding the multiple sensory-motor interactions involved in song learning.
The first problem we address is the problem of sequence generation,
i.e., what is the nature of the central pattern generator for song? We
propose that sequence generation results from a reciprocal interaction
between the two populations of HVc projection neurons (Table
1, number 1). The solution naturally
incorporates the mechanism of efference copy, which contributes one
half of this interaction by providing a motor
sensory mapping from
HVc_RA
HVc_AFP. The other half of the interaction depends on
connections from HVc_AFP
HVc_RA. These are hypothesized to provide
slow signals carrying information from one syllable to the next (Fig. 2, marked 4). We call such signals "context" signals.
Thus, sequences are generated as a chain of mappings from motor
sensory
next motor
next sensory, etc. This hypothesis borrows
from classical chaining ideas (James 1983
), as well as
more recent computational models (Kleinfeld and Sompolinsky
1988
) of sequence generation.
|
The second problem we address is the problem of how AFP signals guide
sequence learning at the level of RA. The most straightforward method
of directing associational learning toward a desired goal is to bias
the pattern of neural activity toward the desired state. Associational
plasticity then strengthens the connections consistent with this
pattern. In our model, we assume that the AFP generates an expectation
of the next syllable in the tutor sequence and uses this expectation to
bias RA activity (Table 1, number 2; Fig. 2, marked 5).
Associational plasticity then changes the pattern of connections
between HVc and RA so that syllables are produced in the proper
sequence (Fig. 2, marked 6). Note that this solution gives
rise to an additional problem to be solved before the AFP can bias RA
activity in the proper direction: template information is stored
in sensory coordinates, but the required bias must be in motor
coordinates. We propose that a sensory
motor mapping is learned
between the AFP and RA soon after the initial period of efference copy
learning (Table 1, number 3; see Conceptual model).
The third problem we address is the problem of sequence learning at the
level of HVc. While the mechanism outlined above is sufficient for a
rudimentary form of sequence learning, it fails as a complete model. In
particular, it fails to account for any learned changes in the number
or sequence of premotor commands formed upstream of RA. In our model,
the efference copy provides the key link between learning at the level
of RA and learning upstream of RA, in HVc. In particular, by altering
connections between HVc and RA, the AFP changes the pattern of vocal
output and hence auditory reafference. This in turn induces new
efference copy learning in HVc (Table 1, number 4; Fig. 2, marked
7) via the same mechanism described in our syllable learning
model (Troyer and Doupe 2000
). Since efference copy
mapping plays a key role in the HVc pattern generator, the new
efference copy learning alters the sequence of HVc outputs (see
Conceptual model). In addition to providing a
specific mechanism for how the AFP affects sequence generation in HVc,
the need for ongoing efference copy learning is consistent with
experiments demonstrating that auditory feedback is required throughout
development (Price 1979
).
In addressing the problem of sequence learning, we have added two new
sets of connections to our model for syllable learning (Fig. 2).
The connections from HVc_AFP
HVc_RA are necessary for
sequence generation. Without the context signals carried by these connections, activity within HVc_RA would not be affected by
activity related to the previous syllable and the sequence of HVc
outputs would be random (Troyer and Doupe 2000
).
Patterned connections from the AFP
RA are necessary for
sequence learning in our model. Without these connections,
information stored in the AFP related to the tutor sequence cannot be
used to guide learning in the motor pathway.
Conceptual model
PROBLEM 1: SEQUENCE GENERATION.
We propose that sequences of song syllables are generated by a
reciprocal interaction between motor (HVc_RA) and sensory/efference copy (HVc_AFP) activity within HVc (Table 1, number 1): motor
sensory prediction
next motor
next sensory prediction
... (Fig. 3A). The
motor
sensory component of this interaction is subserved by the
efference copy mapping between HVc_RA and HVc_AFP. This mapping is
learned early in development by associating HVc_RA motor commands with
auditory feedback arriving back in HVc_AFP, as described in our model
for syllable learning (Troyer and Doupe 2000
). Figure
3B shows how these mappings result in the reproduction
of the tutor song after learning is complete, using the
transition from syllable A to syllable B as an example. Let
SenA denote the sensory representation for
syllable A in HVc_AFP. This representation is elicited by the
efference copy mapping during production of A. Via the connections from
HVc_AFP
HVc_RA, SenA elicits a
context signal CtxtA that drives activity in
HVc_RA during the syllable following syllable A. CtxtA maps onto the motor representation
MotB in RA, and the model produces syllable B after syllable A. This is the sensory prediction
next motor component of the interaction. With an accurate efference copy mapping,
CtxtA also elicits an efference copy
representation SenB in HVc_AFP. This
motor
sensory prediction component of the interaction completes the cycle. Thus, correct sequence learning in our model depends on learning the chain of mappings
SenA
(CtxtA
MotB)
SenB
... . Note that our
implementation of this functional circuit is highly simplified: HVc_RA
HVc_AFP connections transmit only fast motor
sensory (efference copy) signals, whereas HVc_AFP
HVc_RA
connections transmit only slow sensory
next motor (context) signals. More realistic circuit models of HVc will be required to
explore possible local circuit mechanisms subserving this reciprocal flow of activity.
|
PROBLEM 2: SEQUENCE LEARNING IN RA.
In our model, the AFP uses template information to generate "sequence
teaching" signals that bias RA activity toward the proper tutor
sequence (Table 1, number 2). The details of how these signals
reorganize the motor pathway to produce correct sequence transitions
are illustrated in Fig. 4, using the
transition from syllable A to syllable B as an example. In our model,
the efference copy representation, SenA,
that is registered in HVc_AFP during the production of syllable A,
generates two distinct signals during the vocalization that follows
syllable A. First, in HVc, due to the slow connections from HVc_AFP
HVc_RA, SenA results in a context signal,
CtxtA, that is input to HVc_RA. Second, the
AFP receives the efference copy SenA from
HVc_AFP and generates the sequence teaching signal for syllable B,
after an appropriate delay. This signal is input to RA and biases RA
activity toward the next motor representation in the tutor sequence,
MotB. Since both of these signals exert
their effects with a one syllable delay, during the syllable following
A, neurons in HVc_RA that are part of the context representation
CtxtA tend to be co-active with RA neurons
comprising the motor representation MotB.
Associational learning then strengthens the connections between these
sets of neurons (Fig. 4, white arrow). In this way, the context
representation CtxtA gets mapped onto
MotB, and the model learns the transition SenA
CtxtA
MotB.
|
SENSORY
MOTOR MAPPING FROM THE AFP
RA.
If the sequence teaching signal for syllable B, which we assume to be
encoded in sensory coordinates in the AFP, is to bias RA motor activity
toward syllable B, a sensory
motor mapping between the AFP and RA
is required (Table 1, number 3). In our sequence learning model, the
required map develops soon after the initial period of efference copy
learning, and before syllable learning is complete. With an accurate
efference copy, HVc_RA excites a sensory representation in the output
neurons of the AFP (via HVc_AFP) that corresponds to the motor activity
in RA. For example, if HVc_RA drives motor activity in RA that is
relatively well matched to tutor syllable A, it will also drive an
efference copy within HVc_AFP that leads to excitation within the AFP
output neurons encoding tutor syllable A (Fig.
5). Associative learning then strengthens
connections between the AFP neurons encoding syllable A in sensory
coordinates and the RA neurons encoding A in motor coordinates. Note
that to develop the appropriate mapping between the AFP and
RA, the output neurons in the AFP must encode a sensory representation
of the current syllable. To use the map to bias
RA activity toward the tutor sequence, these same AFP output neurons
must encode a representation of the next syllable. Our model
simply assumes that AFP efferents contain a combination of these
signals. Possible explanations for how the components of this mixed
signal could exert distinct functional influences in RA are described
in the METHODS.
|
PROBLEM 3: SEQUENCE LEARNING IN HVC.
Even though the model has learned the correct efference copy
next
motor transition, SenA
CtxtA
MotB,
sequence learning is not yet complete. This is because by altering
synapses in RA, the AFP has perturbed the motor
sensory matching
necessary for an accurate efference copy in HVc. In particular, HVc_RA
neurons belonging to the representation for
CtxtA originally mapped onto some
particular combination of motor representations in RA. For example,
perhaps CtxtA originally mapped most
strongly onto syllable D. With an accurate efference copy, these same
HVc_RA neurons were mapped onto the corresponding combination of
sensory representations in HVc_AFP,
SenD. Remapping
CtxtA onto MotB
in RA alters this correspondence, and the HVc sequence generator
produces the following set of mappings:
SenA
CtxtA
SenD
CtxtD. Presumably, the context signal from
syllable D, CtxtD, is mapped onto
MotE in RA. Therefore, syllable B (produced
by CtxtA) will be followed, not by C, but
by E. However, such errors in the efference copy component of the HVc
sequence generator are continually corrected by renewed auditory
feedback-driven learning in the HVc_RA
HVc_AFP connections (Table
1, number 4): CtxtA excites
MotB in RA, leading to an auditory feedback
signal SenB arriving in HVc_AFP (Fig. 6). Therefore, HVc_RA
HVc_AFP
connections between HVc_RA neurons belonging to
CtxtA and HVc_AFP neurons belonging to
SenB are strengthened (Fig. 6, white
arrow), supplanting the "old" connections from CtxtA
SenD.
In this way, the HVc sequence generator is able to track the
AFP-induced changes in RA. By combining the appropriate sensory
motor and motor
sensory mappings, the model learns the chain of
sensory-motor associations that reproduces the tutor sequence:
SenA
(CtxtA
MotB)
SenB ... .
|
| |
METHODS |
|---|
|
|
|---|
The model presented in this paper is an extension of the
syllable learning model described in the preceding companion paper (Troyer and Doupe 2000
). To account for the generation
and learning of song sequence, we added two new sets of synaptic
connections to this model (Fig. 2B). Because our model is
relatively abstract at the level of local circuits, the choice of how
these connections were embedded in our computer algorithm was governed
chiefly by considerations of computational simplicity (a variety of
biological mechanisms could contribute to their functionality). An
understanding of the theoretical issues related to our implementation
is not necessary to understand our simulation results. Most features of
the model are described in detail in Troyer and Doupe
(2000)
. We discuss here only new additions to the model. The
final subsection in the METHODS describes the method we
used for quantifying the time course of model development.
Most simulations of the complete model contained 25,000 syllables, over
5,000 more than were typically needed for model output to become
stereotyped (see APPENDIX). Computer simulations were written using the MATLAB simulation environment (version 5.3; The
Mathworks, Natick, MA). Typical simulations took
3 h when run using
a 400-MHz Pentium II processor. Details regarding simulations and
parameters are contained in the APPENDIX.
HVc_AFP
HVc_RA connections
To account for sequence generation, connections from
HVc_AFP to HVc_RA were added (Fig. 2B).
These connections are assumed to be functionally "slow synapses"
that carry information from one syllable to the next (cf.
Kleinfeld and Sompolinsky 1988
). For computational
simplicity, the functional separation of HVc connections was strict:
HVc_RA
HVc_AFP connections carried only efference copy
information related to the current syllable, and the HVc_AFP
HVc_RA connections broadcast signals that affected only the
subsequent syllable. However, our general approach requires only a
functional imbalance between the two populations of HVc projection
neurons. A strict separation is not crucial. To match the
functional delay in the HVc_AFP
HVc_RA
pathway (
50 ms), a corresponding delay was introduced in the
time window for synaptic plasticity in these connections (see
APPENDIX). In general, we followed the principle that
the time window for synaptic plasticity should be roughly
proportional to the time scale of encoding for the information
passed over that synapse. RA connections, which encode the
detailed motor programs within each syllable, had the shortest plasticity window, and the HVc_AFP
HVc_RA context synapses had the longest.
Since it relies on reciprocal excitatory connections, the pattern
generator within HVc tended to be unstable. To help control this
positive feedback, we 1) normalized the size of the context signal during each syllable (see APPENDIX), and
2) included "adaptation" in the HVc_RA assemblies.
HVc_RA adaptation was of the same form as the HVc_AFP adaptation
included to cancel the delayed auditory feedback (Troyer and
Doupe 2000
). However, because HVc_RA adaptation was included to
counteract an overall build up of HVc activity, its decay time (225 ms)
was considerably longer than the decay time of HVc_AFP adaptation (115 ms).
AFP
RA connections and signals
The circuitry within the three song nuclei that make up the AFP could, in principle, subserve a variety of complex processing tasks. Our model treats the entire AFP as a "black box" performing the necessary calculations related to template comparison (see APPENDIX for details). Our algorithm was governed chiefly by computational simplicity, but most calculations could be implemented relatively easily by a variety of biologically plausible circuits.
Processing within the AFP is shown in Fig.
7. Each AFP "input assembly" receives
input from the HVc_AFP assemblies encoding sensory features related to
the corresponding tutor syllable (the nature of the encoding scheme
used in our model is described in Troyer and Doupe 2000
;
Fig. 6). Input is also received by a single inhibitory unit that
broadcasts its output to all input assemblies. This "feedforward
inhibition" implements a form of competition in which the only active
AFP assemblies are those that receive significantly more input than
average.
|
The main difficulty for our model is that the AFP is assumed to
simultaneously broadcast three distinct signals that are important for
separate aspects of sensorimotor learning. Each of these calculations is represented by a separate box in the middle of Fig. 7: 1)
to guide syllable learning, the AFP transmits a nonspecific
reinforcement signal that modulates plasticity in RA; 2) to
organize a sensory
motor mapping between the AFP and RA, the AFP
forms a sensory representation related to the current syllable;
3) to guide sequence learning, the AFP must generate, with a
one syllable delay, a sequence teaching signal that biases RA activity
toward the next syllable in the tutor sequence. A possible neural
substrate for this delayed sequence teaching signal is the axon
collaterals that transmit information from the lateral portion of the
magnocellular nucleus of the anterior neostriatum (LMAN), the
output nucleus of the AFP, to area X, the input nucleus of
the AFP (see Fig. 1C, Troyer and Doupe 2000
).
The appropriate delay is roughly 75 ms, the length of a typical song
syllable (
115 ms) minus the processing delay contributed by the AFP
(
40 ms). Note that signals 1 and 2 are used to guide plasticity in
RA but are not required to influence RA activity. In contrast, the
purpose of signal 3 is to guide activity, but in principle, could
disrupt learning in the AFP
RA pathway.
In our implementation, the three signals are not segregated at
the level of AFP outputs: the activity within the AFP output assemblies
is just a summation of signals 1-3. The input to each RA assembly is
then calculated as a sum of AFP outputs, weighted by the pattern of
synaptic strengths from the AFP
RA. This input serves both as a
source of additive external input summed with RA input coming from HVc,
and as a modulatory term in the RA plasticity rule (see
APPENDIX). The modulation of RA plasticity in our
model is completely phenomenological. Candidate mechanisms include
release of trophic factors by AFP efferents (Johnson et al.
1997
) or downstream effects of calcium entering through AFP
glutamatergic synapses, which are dominated by NMDA receptors
(Mooney and Konishi 1991
).
How does the superposition of signals 1-3 in AFP output neurons exert separate effects in RA? The nonspecific reinforcement component of the AFP activity (signal 1) is separated from the two patterned components by its magnitude: we assume that the reinforcement signal contributes 75% of the input to AFP output assemblies. AFP output is then dominated by this reinforcement signal, and the resulting modulation of RA plasticity can be used to guide syllable learning. To allow the two patterned signals to play their role in song learning, we assume that the AFP also excites a population of inhibitory interneurons local to RA (Fig. 7, filled circle, bottom). This feedforward inhibition counteracts the nonspecific (reinforcement) component of the AFP input to RA, causing this nonspecific input to have little effect on spiking activity in RA. However, inhibition would not be expected to cancel trophic effects of AFP inputs and hence would not block reinforcement mediated by neurotrophins. In an alternative scenario, inhibition that is proximal to the cell body might eliminate spiking but not prevent the depolarization within distal dendrites by inputs from HVc_RA or other RA neurons. Thus, calcium entry through NMDA receptors at AFP synapses could still be used to modulate plasticity within the dendritic tree, even though the currents flowing through these receptors are counteracted by inhibition arriving at the soma.
In addition to explaining how the nonspecific reinforcement component
of the AFP activity is prevented from disrupting patterns of RA
activity, we must explain how to prevent it from disrupting the learning in the AFP
RA pathway. By definition, a
large reinforcement signal that is expressed as high activity in all
AFP output assemblies will also lead to increased plasticity
within all RA assemblies. This correlation between nonspecific
presynaptic firing in the AFP and nonspecific modulation of plasticity
in RA tends to strengthen all synapses from the AFP
RA.
To counteract this tendency, AFP
RA synapses were assigned a higher
plasticity threshold (see APPENDIX).
The action of the AFP activity related to the current efference copy
(signal 2) is straightforward: after the efference copy mapping from
HVc_RA to HVc_AFP gives an accurate prediction of the motor input from
HVc_RA to RA (Troyer and Doupe 2000
), the AFP assembly
corresponding to the current syllable will be most active when RA
assemblies corresponding to that syllable are also active. Sensory
motor associational learning follows, causing AFP assemblies encoding a
particular tutor syllable to project most strongly to RA assemblies
encoding the same syllable. (Fig. 5). After the sensory
motor
matching is accomplished, the input from the AFP activity related to
signal 2 will be redundant with the (stronger) input to RA from HVc.
Our functional requirements for the sequence teaching signal (signal 3)
are that it biases RA activity toward the next syllable in the tutor
sequence, but does not disrupt the learning in the AFP
RA pathway
driven by signal 2. To implement the proper bias, the processing box
marked "Sequence Template" in Fig. 7 accepts a pattern of input,
waits for one syllable, and then excites AFP output assemblies in a
pattern that is shifted one syllable forward in the tutor sequence.
Since the AFP
RA connections perform a sensory
motor mapping,
this signal will bias RA toward the next motor command in the tutor
sequence (Fig. 4). The reason that this signal does not disrupt the
associations necessary to develop a sensory
motor mapping to RA is
that, before sequence learning is accomplished, the inputs from HVc_RA
to RA are strong and their sequence is random. Therefore, AFP activity
for the subsequent syllable (signal 3) will not be strongly correlated with RA activity and hence will not contribute significantly to plasticity in the AFP
RA connections. After the model begins to
produce the proper sequence, the motor patterns in RA driven by HVc_RA
will be matched to the sequence teaching signal syllable (signal 3).
Hence, the associational plasticity related to signal 3 will simply
reinforce the sensory
motor mapping originally organized by signal 2.
Our implementation represents only one of many plausible ways in which
different signals could exert different effects in RA. A conceptually
simple solution to the problem of segregation would be to have
different functional signals carried by distinct classes of AFP
projection neurons. However, developing such a separation could be
difficult. Another alternative is for different signals to be encoded
in different temporal patterns of AFP activity (e.g., bursting versus
tonic). These could preferentially excite separate receptors in RA
and/or trigger different plasticity mechanisms in RA. Finally, since
the three signals make crucial contributions to learning at different
times during song learning (see Fig. 11 in RESULTS), their
functions could be subserved by mechanisms tied to developmental
critical periods. Our model makes predictions regarding the functional
information carried by the AFP
RA pathway. Further experiments will
be required to determine the possible neural substrate for these signals.
Quantifying learning time course
To obtain quantitative results regarding the time course of
learning in the model, we measured how closely the statistics of RA
motor output matched the statistics of the tutor song, as well as
measuring how closely important patterns of connectivity matched the
properties of an "ideal" model that would accurately reproduce the
tutor song. The measure used to compute these matches was the
correlation coefficient (CC) applied to the elements of the relevant
connection matrices (see METHODS in Troyer and Doupe 2000
). Syllable-related activity was quantified as in
Troyer and Doupe 2000
. Sequence-related activity was
quantified by dividing the model output into 250 syllable epochs and
constructing Mnext, the matrix of
co-fluctuations between patterns of RA activity for a given syllable
and the patterns of RA activity for the next syllable
|
(n) is the average activity across
assemblies during syllable n. We used the CC to compare
Mnext to an ideal syllable
transition matrix,
Mseq:
Mijseq = 4, if
assembly j forms part of the representation for the syllable following the syllable coded by assembly i;
Mijseq =
1,
otherwise. Diagonal entries were included.
In addition to monitoring patterns of RA activity, we monitored
development in four sets of connections. 1) The accuracy of the efference copy map was quantified by calculating the correlation coefficient between the pattern of HVc_RA
motor connections (HVc_RA
RA) and HVc_RA
sensory connections (HVc_RA
HVc_AFP). 2) To quantify the development of the sensory
motor
mapping (Fig. 5), we computed the CC between the pattern of AFP
RA
connection strengths and the ideal pattern of connectivity, in which
the AFP assembly representing a given tutor syllable would have
connections only onto RA assemblies encoding the motor features
belonging to that syllable. 3) To quantify the progress of
syllable learning, we computed the CC between the ideal syllable
correlation matrix, Msyl, and the
pattern of intrinsic RA connections as in Troyer and Doupe
2000
. Mijseq = 4, if assembly j forms part of the representation for same syllable as assembly i;
Mijseq =
1,
otherwise. Diagonal entries were excluded. 4) To evaluate sequence-related connectivity, we multiplied the HVc_AFP
HVc_RA and
HVc_RA
RA connection matrices. The resulting matrix represents the
influence of each HVc_AFP assembly on each RA assembly via the context
signal in HVc (Fig. 3). The correlation coefficient between this matrix
and Mseq was used to measure the
development of sequence-related connectivity.
| |
RESULTS |
|---|
|
|
|---|
Our model explores how song learning can result from associational
plasticity, guided by template comparison signals transmitted by the
AFP. The representation of the sensory and motor aspects of song in our
model is described in detail in our companion paper (Fig. 6 in
Troyer and Doupe 2000
). Briefly, the information encoded within each neural population (HVc_RA, HVc_AFP, RA, and the AFP) is
represented by the activation value of a number of processing units,
each meant to capture the average level of activity within a connected
set of neurons or "cell assembly" (Hebb 1949
). For most simulations, the tutor song contains five syllables, with each
syllable composed of eight abstract vocal features. The features encoding different syllables are assumed to be unique, so we number the
features according to tutor syllable (syllable A, features 1-8;
syllable B, features 9-16; etc.). Each of 40 RA assemblies encodes the
motor aspect of one vocal feature, and each of 40 HVc_AFP assemblies
encodes the sensory aspect of one feature. The template for syllables
is stored in the connections from HVc_AFP
AFP, and the template for
tutor sequence is stored by circuitry internal to the AFP (see
METHODS).
Sensorimotor learning is accomplished in three stages. The first two
stages were explored in our companion paper (Troyer and Doupe
2000
). At the beginning of the simulation, all connections in
the motor pathway are unstructured, and the premotor drive initiating
each syllable drives unorganized patterns of RA activity (Fig.
8A). During the initial,
efference copy learning stage, associations between the HVc_RA motor
activity and the resulting auditory feedback input to HVc_AFP cause a
motor
sensory efference copy mapping to develop between these two
populations (stage 1; Figs. 4A, 8 in Troyer and Doupe
2000
). In the second, syllable learning stage, the AFP
evaulates the efference copy signals and broadcasts template matching
"reinforcement" signals that reorganize synaptic strengths in RA so
that assemblies corresponding to individual tutor syllables are
co-active (stage 2; Fig. 8B; Figs. 4A, 10 in
Troyer and Doupe 2000
). In this paper, we focus on the
final, sequence learning stage, in which "sequence teaching"
signals from the AFP act in concert with the sequence generation
mechanism in HVc so that syllable representations are produced in the
correct order, A
B
C
D
E
A ... (stage 3; Fig.
8C). It is important to note that a segregation between
developmental stages is not embedded within our learning rule or
network architecture. Rather, all synapses in HVc and RA are plastic,
and this plasticity lasts throughout the simulation. Thus, development
is driven by interdependent patterns of association that emerge during
song learning.
|
Sequence learning
The key to sequence learning in the model is the ability of
signals from the AFP to bias RA activity toward the proper syllable transitions (Fig. 9A, arrows).
Acting over multiple syllables, this in turn biases the association
between HVc_RA and RA activity. The resulting change in connections
from HVc_RA
RA connectivity leads to the production of appropriate
syllable transitions (Fig. 4). Auditory feedback ensures that an
accurate efference copy mapping is maintained (Fig. 6). The gradual
improvement of syllable transitions is shown in Fig. 9B.
|
Time course of learning
To examine the time course of learning, we considered the
properties of an "ideal" solution, in which patterns of
connectivity were set so that this ideal model would accurately
reproduce the tutor song (see METHODS for detailed
definitions). We then quantified how closely important sets of
connections matched the ideal model. The match was calculated using the
correlation coefficient, a method that gives a value of one for
identical connection patterns and values near zero for connection
patterns that are uncorrelated. We measured four sets of connections,
the efference copy map from HVc_RA
HVc_AFP, the sensory
motor map from the AFP
RA, syllable storage in the RA
RA
connections, and the sensory
next motor pathway from HVc_AFP
HVc_RA
RA. We also measured how closely the motor output from
RA matched the tutor song. These calculations were performed for
"epochs" consisting of 250 consecutive syllables produced by the
model. To quantify the development of tutor syllables, we calculated
the matrix of co-fluctuations, whose ijth entry indicates
whether assembly i and assembly j have similar
patterns of activity. To quantify the development of tutor sequence, we calculated a similar matrix, except that the ijth entry
indicates whether activity in RA assembly i during syllable
n co-fluctuated with the activity in assembly j
during syllable n + 1. These matrices were matched to the
corresponding matrices computed from the tutor song, again using the
correlation coefficient (see METHODS).
The developmental time courses of the multiple, interacting
associations underlying model development are summarized in Fig. 10A. Figure 10B
shows which connections are most important during each of the song
learning stages traced in Fig. 10A. Initially, the only
consistent pattern of association in the network is between motor
activity and delayed auditory feedback, and the corresponding efference
copy mapping develops rapidly (stage 1, dotted line). As accurate
efference copies are passed onto the AFP, a sensory
motor mapping
also develops between the AFP and RA (stage 1a, dashed-dotted line; see
Fig. 5). An accurate efference copy also causes the AFP to produce
consistent reinforcement signals, which reorganize intrinsic RA
connections so that RA assemblies corresponding to the same tutor
syllable begin to receive common patterns of synaptic input (stage 2, thin solid line). As this happens, the model begins to produce RA
activity patterns matched to the tutor syllables (thin dashed line). As
syllables are learned, efference copy activity in HVc_AFP becomes
increasingly confined to patterns matched to the relatively small
number of tutor syllables. These aspects of the model (with the
exception of stage 1a) were described in detail in our companion paper
(Troyer and Doupe 2000
). As syllable learning proceeds,
clearly defined sequence teaching signals begin to be produced by the
AFP. These begin to bias RA activity toward the tutor sequence (stage
3, thick solid line; see Fig. 9A). This altered activity
then remaps the connections from HVc_RA to RA, so that the polysynaptic
pathway from HVc_AFP
HVc_RA
RA (thick dashed line) yields
correct sensory
next motor syllable transitions. Note that
improvement in the sequencing of RA activity happens before
the learning of the appropriate connectivity from HVc_AFP
HVc_RA
RA, since AFP-driven sequence transitions are necessary to drive
sequence related learning. The reorganization of the HVc_RA
RA
pathway disrupts the efference copy mapping, which begins to degrade
slightly during the period of sequence learning (dotted line, syllables
8,000-17,000). This tension between AFP-guided changes in the motor
pathway and renewed efference copy learning continues until both are in
rough agreement. This agreement causes a transient decline in the
efference copy match (near syllable 16,000), since the HVc
RA
connection races ahead to the final solution. The efference copy makes
a final recovery, and the model produces a stereotyped sequence of song
syllables.
|
Range of model behavior
By presenting results from a single representative simulation, we
have demonstrated the plausibility of our core hypothesis that
associational learning, distributed widely throughout the song system,
is sufficient for sensorimotor matching to a previously memorized
template stored in the AFP. Because each stage of the learning is
dependent on previously developed associations, a complete assessment
of the reaction of our model to changes in model parameters is beyond
the scope of this paper (see Troyer and Doupe 2000
for
some important manipulations).
Overall, sequence learning was significantly less robust than syllable
learning, since it results from continual interplay between the changes
in the HVc to RA projection and the efference copy mapping in HVc. The
robustness of model behavior at the default set of parameters was
assessed by running 10 simulations, each with different random seeds
determining the initial pattern of synaptic connectivity and the
sequence of premotor drives. All simulations eventually learned the
tutor song perfectly. Nine of these simulations followed a similar time
course, completing sequence learning near syllable 17,000 (Fig.
11A). However, in one of the
simulations, correct learning took significantly longer and was not
complete until syllable 25,000 (Fig. 11B). Examination of
the output of this simulation reveals that during the period between
syllable 15,000 and 20,000 when the other simulations were stringing
together series of transitions to match the tutor song, this simulation
began to repeat the subsequence A-D, omitting syllable E (Fig.
11C). Since the strong homeostatic mechanisms in the model
prevent any RA assemblies from becoming permanently inactive, the model
compromised, occasionally inserting a strong version of syllable E in
place of syllable D. However, by syllable 23,000, the model began to
insert syllable E in its proper place in the sequence, but
sometimes syllable E was repeated and sometimes syllable A was dropped.
By syllable 25,000, the model had converged on the correct sequence.
Personal observation of many simulations revealed that such temporary
"compromise" solutions to the competing requirements of
associational change in the HVc_RA
RA projection and the
maintenance of an accurate efference copy mapping within HVc were not
uncommon.
|
To further assess the range of model behavior, we increased the number
of syllables to eight, thereby increasing the range of possible
sequence transitions. The number of vocal features in each syllable was
reduced to five, so that the simulations contained the same number of
RA assemblies as before (8 × 5 = 40). AFP circuitry was
adjusted for the different template, and AFP
RA learning was
slightly adjusted to ensure that an accurate sensory
motor mapping
was learned (see APPENDIX). To push the model to make
mistakes, all learning rate parameters were increased by a factor of 5. No other parameters were readjusted. The range of RA output for a set
of 10 simulations is shown in Fig. 12.
Perfect learning occurred in six of the ten simulations. An example is shown in Fig. 12A. In one simulation, the model produced a
stereotyped sequence of eight syllables, but this motif consisted of
two "chunks" of appropriately copied song, separated by a string of
three syllables sung in reverse order (Fig. 12B). In the
three other simulations, the full sequence was broken into two repeated
subsequences (Fig. 12, C-E). These were sung in
alternation, with the rate of alternation controlled by the interaction
between associational learning and homeostatic mechanisms that prevent
the elimination of either subsequence. In versions of the model with
weaker homeostatic mechanisms, syllables outside of the most commonly
sung subsequence were simply dropped (not shown).
|
| |
DISCUSSION |
|---|
|
|
|---|
Principal findings and predictions
By constructing a computational model, we have demonstrated that simple rules of associational plasticity, operating throughout the song system, are sufficient to support sensorimotor learning at multiple levels of the temporal hierarchy for song. Learning proceeds in a series of stages, with efference copy learning followed by syllable learning and then sequence learning. These developmental stages are not predetermined by our learning rule, but follow a cascade of interrelated associations that are guided by template matching signals from the AFP.
In this paper, we focused on the problem of learning song
sequence. We propose that sequence generation results from a reciprocal sensory-motor interaction between the two populations of HVc projection neurons: the motor component is encoded primarily in RA-projecting HVc
neurons, whereas the sensory component is encoded primarily in
AFP-projecting neurons (Katz and Gurney 1981
;
Kimpo and Doupe 1997
; Lewicki 1996
;
Saito and Maekawa 1993
). This mechanism predicts that
the participation of neurons in both populations is required for normal
sequence generation. We also predict that the slow "context"
signals linking one syllable to the next flow primarily from
AFP-projecting to RA-projecting neurons. While we have not explored
possible neural substrates for this functionally slow connection,
Kubota and Taniguchi (1998)
have reported that
RA-projecting neurons possess an ionic current that delays the
initiation of action potentials.
The absence of a direct projection from the AFP to nuclei upstream of
RA, the likely site of sequence generation (Vu et al. 1994
), poses a significant challenge to the hypothesis that the AFP guides learning of song sequence. One strategy for overcoming this
challenge is for the AFP to guide learning within the connections from
HVc to RA, so that the outputs from the pattern generator are mapped
onto the appropriate sequence of syllable representations in RA
(Doya and Sejnowski 1998
). Viewed in isolation,
this hypothesis predicts the existence of an autonomous pattern
generator that is unaffected by outputs from the AFP. In our model,
however, a motor