|
|
||||||||
The Journal of Neurophysiology Vol. 84 No. 3 September 2000, pp. 1204-1223
Copyright ©2000 by the American Physiological Society
1Department of Psychiatry, 2Department of Physiology, 3W. M. Keck Center for Integrative Neuroscience, and 4Sloan Center for Theoretical Neurobiology at UCSF, University of California, San Francisco, California 94143-0444
| |
ABSTRACT |
|---|
|
|
|---|
Troyer, Todd W. and Allison J. Doupe. An Associational Model of Birdsong Sensorimotor Learning I. Efference Copy and the Learning of Song Syllables. J. Neurophysiol. 84: 1204-1223, 2000. Birdsong learning provides an ideal model system for studying temporally complex motor behavior. Guided by the well-characterized functional anatomy of the song system, we have constructed a computational model of the sensorimotor phase of song learning. Our model uses simple Hebbian and reinforcement learning rules and demonstrates the plausibility of a detailed set of hypotheses concerning sensory-motor interactions during song learning. The model focuses on the motor nuclei HVc and robust nucleus of the archistriatum (RA) of zebra finches and incorporates the long-standing hypothesis that a series of song nuclei, the Anterior Forebrain Pathway (AFP), plays an important role in comparing the bird's own vocalizations with a previously memorized song, or "template." This "AFP comparison hypothesis" is challenged by the significant delay that would be experienced by presumptive auditory feedback signals processed in the AFP. We propose that the AFP does not directly evaluate auditory feedback, but instead, receives an internally generated prediction of the feedback signal corresponding to each vocal gesture, or song "syllable." This prediction, or "efference copy," is learned in HVc by associating premotor activity in RA-projecting HVc neurons with the resulting auditory feedback registered within AFP-projecting HVc neurons. We also demonstrate how negative feedback "adaptation" can be used to separate sensory and motor signals within HVc. The model predicts that motor signals recorded in the AFP during singing carry sensory information and that the primary role for auditory feedback during song learning is to maintain an accurate efference copy. The simplicity of the model suggests that associational efference copy learning may be a common strategy for overcoming feedback delay during sensorimotor learning.
| |
INTRODUCTION |
|---|
|
|
|---|
The combination of a
well-characterized, stereotyped behavior and specialized anatomy makes
birdsong an ideal system in which to study the neural basis of motor
learning. Moreover, song learning shares important similarities with
human speech learning (Doupe and Kuhl 1999
). In birds,
vocal learning is accomplished in two phases. During an initial,
sensory phase, birds listen to and memorize a tutor song,
often called the "template" (Konishi 1965
; Marler 1964
). In a later, sensorimotor phase,
birds gradually match their vocalizations to the memorized song, using
auditory feedback from their own vocalizations (Fig.
1, A and B). We
have constructed a computational model demonstrating that
simple associational (Hebbian) learning rules are sufficient
to address important problems related to the sensorimotor learning of
song. Our model focuses on the zebra finch, a species commonly used in
physiological investigations of song learning. Zebra finch song
consists of a stereotyped sequence of vocal gestures or
"syllables." In this paper, we focus on the learning of the
individual syllables. In the following companion paper (Troyer
and Doupe 2000
), we extend our model to include sequence
learning.
|
The likely neural substrate for sensorimotor learning is the
song system, a set of brain nuclei specialized for vocal learning and
production (Nottebohm et al. 1976
) (Fig. 1C).
The motor pathway for song includes the direct projection
from nucleus HVc (used as a proper name; Margoliash et al.
1994
) to the robust nucleus of the archistriatum (RA). Both
nuclei display neural activity time-locked to song production
(McCasland 1987
; Yu and Margoliash 1996
),
and lesions in either nucleus disrupt normal song production at all
stages of development (Nottebohm et al. 1976
;
Simpson and Vicario 1990
). HVc and RA are also connected
by an indirect pathway, the Anterior Forebrain Pathway
(AFP). Lesion studies indicate that the AFP is crucial for song
learning, but is not necessary for normal song production in adults
(Bottjer et al. 1984
; Scharff and Nottebohm
1991
; Sohrabji et al. 1990
). These and other
data (see Biologically supported assumptions) have
led to the "AFP comparison hypothesis," in which the AFP guides
sensorimotor learning by transmitting a comparison between auditory
feedback from the bird's own vocalizations and the memorized
template (Bottjer and Arnold 1986
; Doupe
1993
; Mooney 1992
; Nordeen and Nordeen
1988
; Saito and Maekawa 1993
). These
comparison signals are used to guide learning in the motor pathway at
the level of RA (Fig. 1, C and D).
The AFP comparison hypothesis is challenged by a fundamental
problem in motor learning, the problem of feedback delay
(Lashley 1951
; Miall and Wolpert 1996
;
Miles and Evarts 1979
). In zebra finches, the 100-ms
estimated latency (see Fig. 2) for presumptive AFP comparison
signals to arrive in the motor pathway after a motor command is nearly
as long as a typical song syllable. This delay would cause comparison
signals for one syllable to have greatest overlap with the neural
activity for the subsequent syllable and poses a significant challenge
to the notion that AFP comparison signals guide learning in RA (see
Bottjer and Arnold 1986
). In our model, we retain the
hypothesis that the AFP plays an important role in template comparison
but propose that instead of waiting for the actual auditory feedback,
an internal prediction or "efference copy" of the auditory feedback
is generated within HVc to guide song learning. Therefore, we predict
that the signals recorded in the AFP during singing (Hessler and
Doupe 1999a
,b
) are motor signals that also carry
sensory information. Furthermore, our model suggests a functional
reason for why the AFP is located downstream of the motor nucleus HVc
(Fig. 1, C and D): use of an efference copy
requires that brain areas involved in template comparison receive motor efferents.
Preliminary versions of this work have been presented in conference
proceedings (Troyer et al. 1996a
,b
).
Model and approach
Over the past 25 years, anatomical, lesion, and in vivo physiology studies have yielded a wealth of data concerning the functional anatomy of the song system. However, current hypotheses regarding the sensory-motor interactions during song learning lack detail. To explore these issues, we set out to build a computational model of the sensorimotor phase of song learning. Our goal was to determine if basic theoretical problems in sensorimotor learning could be solved using simple rules of associational plasticity, constrained by the known anatomy of the song circuit. We hoped to direct future experiments by identifying important gaps in our knowledge, as well as to evaluate previous experimental results from a computational point of view.
Our efforts resulted in two closely related models, addressing the problem of song learning at different levels of abstraction. The first model is a purely "conceptual model," i.e., a self-consistent set of functional hypotheses conforming to a wide range of experimental results. The functional hypotheses contained in this model constitute the core contribution of our research. The second model is a true "computational model" that incorporates these hypotheses into a working computer algorithm. Due to the very limited knowledge of the song system at the level of local circuits, implementing this algorithm required a number of specific assumptions that reach beyond current experimental knowledge. As a result, several aspects of the computational model are not well-constrained by biology. Moreover, we made a number of simplifying assumptions to ensure that simulations could be run in a reasonable amount of time. However, the computational model played an important role in exploring our initial functional ideas and serves to illustrate our core conceptual hypotheses. Perhaps more importantly, the construction of a working computational algorithm demonstrates the mutual consistency of our hypotheses, as well as providing a theoretical demonstration that they are sufficient to account for important aspects of song learning. This dual approach not only highlights general problems of sensorimotor learning and generates testable predictions at a functional level, it also provides a framework for understanding how specific biological mechanisms may contribute to their solution. These models are only a first step, and, of necessity, contain many simplifications. However, taken together, they constitute the most detailed set of hypotheses to date regarding the interaction of sensory and motor signals during the sensorimotor phase of song learning.
In this section, we present the justification for our working biological assumptions. We then describe the main problems addressed by our model and outline the key elements of our proposed solution. Finally, we present our conceptual model, which describes our functional hypotheses in greater detail. In the METHODS section, we outline the theoretical assumptions incorporated into our working computational model, including a description of the network architecture and the simple encoding scheme used to represent song. In the RESULTS section, we present quantitative results generated by our computational model. Details of the computer algorithm are confined to an APPENDIX.
Biologically supported assumptions
Although the nature of template memorization is largely unknown,
various lines of evidence suggest that the AFP may transmit a
comparison between the bird's own vocalizations and the memorized tutor song. We call such signals "template comparison signals." Initial evidence suggesting a role for the AFP in template comparison came from lesion experiments: AFP lesions in juvenile zebra finches disrupt song learning, whereas lesions in adult birds have little effect on normal song production (Bottjer et al. 1984
;
Nottebohm et al. 1976
; Scharff and Nottebohm
1991
; Sohrabji et al. 1990
). Further experiments
have shown that the lateral portion of the magnocellular nucleus of the
anterior neostriatum (LMAN), the output nucleus of the AFP,
appears to be necessary any time the song changes, even in adulthood
(Brainard and Doupe 2000
; Morrison and Nottebohm
1993
; Williams and Mehta 1999
). Other
experiments suggest that circuitry within the AFP may function as a
template: AFP neurons develop song selective auditory responses during
song learning (Doupe 1997
; Solis and Doupe
1997
), and a subset of these neurons respond vigorously to the
tutor song (Solis and Doupe 1997
, 1999
). Using a more
direct approach, Basham et al. (1996)
showed that local
blockade of N-methyl-D-aspartate (NMDA) receptors in the
AFP specifically during song memorization disrupts normal song learning.
Within the framework of our model, the simplest hypothesis is that the AFP not only transmits a template comparison signal, but that it also computes the match between the efference copy and the memorized template, i.e., the AFP is the storage site for the tutor template. We did not attempt to model the AFP circuitry that subserves template comparison but rather viewed the AFP as a "black box" that performs the necessary calculations. An alternative hypothesis that is still consistent with the basic structure of our model is that the AFP transmits a template comparison signal, but that memorized template information is stored closer to the auditory periphery than the AFP (see DISCUSSION).
Additional studies into the functional anatomy of the song system have
shown that the neurons that project to RA and those that project to the
AFP form distinct populations within HVc (Nordeen and Nordeen
1988
). We denote these two populations HVc_RA and HVc_AFP.
While the evidence is indirect, these two populations are likely to be
highly interconnected (Fortune and Margoliash 1995
;
Vu and Lewicki 1994
). Various data suggest that activity within HVc_RA neurons is more closely tied to motor behavior, whereas
activity within HVc_AFP neurons is more closely tied to auditory input
(Katz and Gurney 1981
; Kimpo and Doupe
1997
; Lewicki 1996
; Saito and Maekawa
1993
; but see Doupe and Konishi 1991
; Vicario and Yohay 1993
). Moreover, experiments in
singing birds suggest that the motor pathway is arranged
hierarchically, with RA encoding the detailed motor program for each
song syllable, and the central pattern generator for song sequence
lying upstream of RA, perhaps in HVc (Vu et al. 1994
;
Yu and Margoliash 1996
).
The main biologically supported assumptions that are incorporated into the model are summarized in Table 1.
|
The final data included in the model were the estimated latencies
between various song nuclei (Fig.
2A). We included only the best
studied neural pathways in the song system, as the functional significance of other signaling pathways remains unclear (see Foster and Bottjer 1998
; Foster et al.
1997
; Striedter and Vu 1998
; Vates et al.
1997
). We used 50 ms for the latency from HVc premotor activity
to vocal output (McCasland 1987
; McCasland and Konishi 1981
), and 15 ms for auditory latencies to HVc
(Margoliash and Fortune 1992
). Estimating the processing
time through the AFP during song was more problematic, since activity
in LMAN, the output nucleus of this pathway, is quite variable. We used 45 ms for the latency to LMAN (A. J. Doupe 1997
; personal
observations). Subtracting 15 ms for the latency to HVc and adding 10 ms for the delay between LMAN and RA, we obtained a processing time
through the AFP of roughly 40 ms. Simulated syllables were 80-ms long with a 35-ms gap between syllables (Fig. 2B), typical of
mean values for zebra finch song (M. Brainard, personal communication; Scharff and Nottebohm 1991
; Zann 1993
).
These timing data suggest that, on average, presumptive template
comparison signals from the AFP will have the greatest overlap with
motor activity for the subsequent syllable (Fig. 2C, dotted
box).
|
Problems addressed
In this paper, we address the problem of learning a collection of motor representations corresponding to song syllables stored within a memorized template. For simplicity, we do not address learning the detailed temporal structure within each syllable, nor learning the length of syllables and inter-syllable gaps. Our model rests on two key assumptions: 1) song learning is accomplished using simple associational learning rules and 2) the AFP guides song learning by transmitting a signal that carries information about the match between the bird's auditory feedback and a stored template. Here, we present a brief outline of the main problems addressed by our model and the key functional hypotheses that underlie our solutions (see Table 2). More detail regarding our hypothesized solutions is presented in the form of a conceptual model (see Conceptual model) and a computational model (see RESULTS). The presentation of both models is structured according to the following outline.
|
The first problem we address is the important problem of auditory
feedback delay: presumptive AFP comparison signals would arrive in RA
during the neural activity for the next syllable (Fig.
2C). We hypothesize that the AFP does not directly evaluate auditory feedback, but instead, receives an internally generated prediction of the sensory feedback resulting from song-related motor
activity (Table 2, number 1). Such an internal prediction requires a
transformation from motor to sensory coordinates and has been termed
efference copy (Sperry 1950
), "corollary discharge" (von Holst and Mittelstaedt 1980
), or the result of a
"forward model" (reviewed in Jordan 1995
;
Miall and Wolpert 1996
). We will use the term efference
copy. Sensory signals resulting from motor behavior have been termed
sensory "reafference" (von Holst and Mittelstaedt
1980
). We further hypothesize that the motor
sensory efference copy develops between the two populations of HVc projection neurons (Table 2, number 2). To learn this mapping, it is important that our associational plasticity rule is "temporally asymmetric," i.e., presynaptic activity must be followed by postsynaptic
activity to induce plasticity (Table 2, number 3).
The second problem we address is the nature of AFP-guided syllable
learning in RA. We make two functional hypotheses. First, we
hypothesize that syllable learning is guided by nonspecific reinforcement signals provided by the AFP that modulate the degree of
ongoing associational plasticity throughout RA (Table 2, number 4; see
Sutton and Barto 1998
, for an overview of reinforcement learning). This hypothesis is motivated by the fact that nonspecific reinforcement signals, while generated by a match to a sensory template, do not have to be directed toward specific patterns of RA
motor neurons. As a result, no sensory
motor mapping is required to
guide learning. Second, we hypothesize that synapses intrinsic to RA
play an important role in storing syllable representations (Table 2,
number 5). This hypothesis was motivated by the need to learn a number
of discrete patterns of neural activity corresponding to the syllables
in the tutor template and is consistent with estimates that up to 85%
of synapses in RA come from local collaterals of other RA neurons
(Herrmann and Arnold 1991
). Theoretical models have
shown that recurrent activity is ideal for stabilizing such patterns
(e.g., Hopfield 1984
). Moreover, if the representation for individual syllables is encoded in the pattern of intrinsic RA
synapses, plasticity in the synapses connecting HVc and RA can alter
the sequence of syllables produced, with only minor disruption to the
representation for each individual syllable (see Troyer and
Doupe 2000
).
The third problem we address results from the competing requirements of both learning and using the efference copy signal. Learning an efference copy mapping by associating motor activity with delayed auditory feedback implies that auditory inputs induce significant levels of activity. However, when using the short-latency efference copy signals to guide syllable learning, the strong auditory inputs will interfere with the efference copy signal. We address this problem by assuming that the auditory feedback signal is relatively weak and/or that the response of HVc_AFP neurons is strongly adapting (Table 2, number 6).
Conceptual model
Our model focuses on four neural populations (Fig.
3): nucleus RA in the motor pathway,
separate populations of HVc projection neurons projecting to RA and the
AFP (Nordeen and Nordeen 1988
), and a single population
representing the output of the AFP. Because we do not explicitly model
nuclei downstream of RA, activity in RA represents the motor output of
the model. In this paper, we explore the functional consequences of
associational plasticity in three sets of connections: HVc_RA
HVc_AFP, HVc_RA
RA, and intrinsic RA
RA connections.
|
Our model does not address the learning of syllable timing. We assume
that timing is provided by rhythmically clocked bursts of premotor
activity arriving in HVc_RA, with the duration of each burst
controlling the duration of premotor activity and hence the length of
song syllables (Fig. 2B). While the source of the premotor
drive is not explicitly modeled, the song nuclei nucleus uvaeformis (Uva) and/or nucleus interfacialis (NIf) are likely candidates (McCasland 1987
; Striedter and Vu
1998
; Williams and Vicario 1993
). Input
from the forebrain nucleus medial MAN is also a possible source.
Although the timing of this drive is fixed, we assume that
HVc_RA neurons receive varying magnitudes of drive, and these
magnitudes are generated independently for each HVc_RA neuron and
each vocalization produced by the model. Thus, HVc_RA produces
random patterns of premotor activity that are independent from one
syllable to the next. The model's task is to use template comparison
signals generated by the AFP to reorganize the connections in
the motor pathway so that 1) random HVc_RA activity is
converted into a handful of stereotyped patterns of RA motor
activity, and 2) these stereotyped patterns of RA activity
lead to vocal output matched to the memorized template. Note that
HVc_RA activity becomes ordered when we address the problem of
sequence learning (Troyer and Doupe 2000
).
PROBLEM 1: AUDITORY FEEDBACK DELAY.
To address the problem of feedback delay, we hypothesize that an
efference copy mapping is learned between the two populations of
HVc projections neurons (Table 2, numbers 1 and 2). Since the
connections in the motor pathway are initially unstructured, the random
patterns of HVc_RA activity lead to a random exploration of motor space
(cf. Bullock et al. 1993
; Kuperstein
1988
; Salinas and Abbott 1995
). Activity flows
down the motor pathway (McCasland 1987
) and returns to
HVc_AFP as auditory feedback (Fig.
4A, dark lines). While the
exact form of the learning is not crucial for our model, it is
important that associational learning is temporally asymmetric (Table
2, number 3), i.e., synaptic strengths increase only when
presynaptic activity precedes postsynaptic activity (Bi and Poo
1998
; Debanne et al. 1998
; Gustafsson et
al. 1987
; Hebb 1949
; Markram et al.
1997
). By strengthening synapses onto neurons that are likely
to fire in the near future, temporally asymmetric "Hebbian"
learning strengthens synaptic inputs that "anticipate" any
postsynaptic activity that regularly follows presynaptic spiking (cf.
Blum and Abbott 1996
; Gerstner and Abbott 1997
). In our model, auditory feedback to HVc_AFP neurons
encoding the sensory aspects of a particular vocal gesture will follow spiking in HVc_RA neurons encoding motor aspects of that gesture. Associational learning then strengthens the synapses from that (presynaptic) HVc_RA neuron onto the corresponding (postsynaptic) neurons in HVc_AFP (Fig. 4A, white arrow). After this motor
sensory mapping is learned, activity within HVc_RA motor neurons will drive, with short latency, the HVc_AFP neurons encoding the corresponding sensory representation. This short-latency motor activity
in HVc_AFP constitutes a sensory prediction of the auditory reafference. This efference copy can then be passed on to the AFP and
used to guide learning in RA. Note that efference copy learning occurs
within HVc and proceeds without reference to the tutor template stored
in the AFP. Using efference copy in this way splits the total feedback
delay for AFP comparison signals to return to RA into two shorter
delays: the auditory feedback delay of 65 ms to HVc (Fig.
4A) and the 40-ms processing delay from HVc through the AFP
(Fig. 4B).
|
PROBLEM 2: SYLLABLE LEARNING IN RA.
To guide syllable learning, the AFP evaluates the efference copy
and transmits a reinforcement signal to RA (Table 2, number 4).
This nonspecific reinforcement signal is assumed to modulate the degree
of ongoing associational plasticity throughout RA. An efference copy
that is well-matched to the tutor song results in a large plasticity
signal in RA neurons that are significantly activated, leading to a
potentiation of recently activated synapses; a poor match evokes small
potentiation or depression. Since a good match to the tutor song occurs
when the RA neurons that encode a single tutor syllable are co-active,
reinforcement leads to the development of strong connections between RA
neurons encoding the same tutor syllable (Table 2, number 5).
Reinforcement also reorders the connections from HVc_RA
RA (see
RESULTS). These patterns of connectivity result in a strong
tendency for RA to produce coherent patterns of motor activity matched
to the template, i.e., the tutor syllables have become "attractors"
for the neural dynamics within RA (see, e.g., Amit
1989
).
PROBLEM 3: SEPARATING MOTOR AND SENSORY SIGNALS IN HVC.
In our model, HVc_AFP neurons receive two distinct inputs:
auditory feedback, which drives efference copy learning, and motor input from HVc_RA, which carries the efference copy used for AFP-driven song learning. While necessary for efference copy learning, the delayed
auditory signal can interfere with the efference copy signal used to
guide learning. We propose two strategies for separating sensory and
motor signals within HVc_AFP (Table 2, number 6). First, the auditory
feedback signal is set significantly weaker than the efference copy
signal. Hence, auditory feedback only weakly perturbs the efference
copy, which can remain sufficiently accurate to guide syllable
learning. However, weak auditory feedback is able to guide efference
copy learning by providing, over the course of multiple syllables, a
consistent association between HVc_RA motor activity and the resulting
weak sensory activation. The second strategy is based on the
cancellation of auditory feedback signals in HVc_AFP by
"adaptation." Specifically, adaptation in the HVc_AFP circuitry
results in a "negative after-image" of any given pattern of HVc_AFP
activity (Fig. 5), which has a decay time
(100 ms) similar to the length of a typical song syllable (see
APPENDIX for implementation). A variety of biological
mechanisms could provide this kind of adaptation, e.g., spike-triggered
or voltage-dependent intrinsic currents and/or slow feedback
inhibition. Such mechanisms have been shown to be present within HVc
(Dutar et al. 1998
; Kubota and Saito
1991
; Kubota and Taniguchi 1998
; Schmidt
and Perkel 1998
). Because the efference copy arrives in HVc_AFP
with a shorter delay than the auditory feedback, the after-image of the
efference copy will counteract the corresponding auditory reafference.
That is, HVc_AFP neurons strongly activated by efference copy input
from HVc_RA will be in an adapted state by the time that the
corresponding patterns of delayed auditory feedback arrive in HVc_AFP.
Note that an inaccurate efference copy will lead to an incomplete
cancellation of auditory feedback, and interference from this delayed
feedback will create an inaccurate efference copy. However,
associations between the uncanceled feedback signal and the HVc_RA
motor activity that gave rise to it will lead to new plasticity that
improves the quality of future efference copy predictions. Details of
how this cancellation mechanism works in the context of our computer
algorithm are presented in the RESULTS.
|
| |
METHODS |
|---|
|
|
|---|
The main assumptions that were necessary to construct our computational algorithm are summarized in Table 3 and are discussed below. Only the subsections explaining our method of neural encoding (see Neural encoding, Fig. 6) and the nature of HVc_AFP activity (see Tonic activity patterns, Fig. 7) are necessary for understanding the main computational results presented in the RESULTS. Other subsections describe issues of mainly theoretical interest. In the final subsection of the METHODS, we provide formulas for our method for characterizing the developmental time course in the model. Details of the computational algorithm are presented in the APPENDIX. The assumptions outlined in Table 3 are not crucial for the main predictions of our model; alternative algorithms that implement our functional hypotheses for song learning are possible. Our particular algorithm should be seen as a first approximation, one that allows us to explore associational learning between patterns of sensory and motor activity on the time scale of tens to hundreds of milliseconds.
|
|
|
Each simulation consisted of repeated iterations of a computer subroutine that 1) calculated activity patterns related to a single syllable output by the model, 2) applied our synaptic plasticity rule, and 3) updated the various homeostatic mechanisms in the model. The details of the algorithm and the specification of model parameters are given in the APPENDIX. In most simulations, the subroutine was iterated for 25,000 syllables, ~5,000 more than were typically needed for model output to become stereotyped. When performance was degraded by changing parameters (see APPENDIX), simulations were extended to 50,000 syllables, but output sometimes lacked stereotypy. Computer simulations were written using the MATLAB simulation environment (version 5.3; The Mathworks, Natick, MA). Typical simulations took ~2 h when run using a 400-MHz Pentium II processor.
Neural encoding
Activity in the model was represented by the output of a number
of neural "units." Each of these units is meant to represent the
activity within a network of connected neurons or "cell assembly" (Hebb 1949
). Hereafter, we will use the term
"assemblies." Given the lack of data concerning the neural code for
vocal gestures in the song system, we sought the simplest encoding
scheme that could support associational learning (Table 3, number 1).
Each vocal gesture produced by the model is viewed as a combination of
40 abstract "vocal features," with each RA assembly representing motor-related aspects of one feature, and each HVc_AFP assembly representing sensory-related aspects of one feature. Because of this
one-to-one mapping of auditory and motor features, motor activity in a
given RA assembly leads to auditory feedback input to the unique
corresponding assembly in HVc_AFP. The tutor song consisted of five
syllables, within the normal range for zebra finch song (3-9;
Price 1979
). We denote these syllables by the letters
A-E and assumed that each tutor syllable was encoded by a distinct set
of assemblies, allowing us to number vocal features consecutively,
i.e., tutor syllable A contains vocal features 1-8, tutor syllable B
contains features 9-16, etc. (Fig. 6A). The tutor template
is stored in the AFP, with tutor syllables encoded in the connections
from HVc_AFP: each AFP assembly corresponds to a single tutor syllable
and receives input from the HVc_AFP assemblies representing the
auditory features comprising that syllable (Fig. 6B).
Connections related to syllable B are shown as an example. Our choice
of this very simple representation was guided by the following
considerations: 1) due to the complexity of the network and
finite computational resources, our model contains only a limited
number of assemblies; 2) since learning correlated patterns
with Hebbian learning rules is a largely unsolved theoretical problem,
we chose an encoding scheme in which uncorrelated patterns of motor
activity result in uncorrelated patterns of sensory feedback; 3) our encoding scheme ensures decorrelation in the motor
sensory mapping even for assemblies using nonlinear input-output functions.
Initially, all connections in the motor pathway are unstructured. Thus, random activity in HVc_RA leads to random motor activity in RA (Fig. 6C). The model's task is to 1) compare sensory signals with the stored template in the AFP to guide plasticity within the motor pathway, and 2) use these signals to guide plasticity in the motor pathway so that random HVc_RA activity is converted to stereotyped patterns of RA activity matched to the tutor song.
Tonic activity patterns
For simplicity, we assume that song-related activity is encoded
by the neural firing rates averaged over the course of each song
syllable. Thus, the activity within each of the four neural populations
is modeled as a vector of firing rates, with one entry for each
assembly in the population. For all populations except HVc_AFP, firing
rates are assumed to be constant during the period of premotor drive
for each syllable and zero during the gap between syllables. In
HVc_AFP, we divided each syllable into four time epochs depending on
the combination of efference copy (related to the current syllable) and
auditory feedback input received during that syllable (Fig. 7). During
the early part of each syllable (marked E), HVc_AFP receives
efference copy input from HVc_RA that relates to the current syllable,
while the sensory input is due to delayed auditory feedback from the
previous syllable. The middle portion of each syllable (marked
M) corresponds to the period of silence in the delayed
feedback. During this period, HVc_AFP receives efference copy input
only. During the late part of the syllable (marked L), the
efference copy and auditory inputs correspond to the same syllable.
Finally, during the "gap" period between bursts of HVc_RA activity
(marked G), HVc_AFP receives only auditory input. During the
epochs when efference copy and auditory feedback inputs overlap, the
two sources of input were simply summed. For computational and
conceptual simplicity, we chose not to propagate this subdivision of
activity to the AFP. The efference copy activity that was passed on to
the AFP was calculated from the average activity in HVc_AFP during the
early and middle portion of the syllable. Late and gap portions were excluded for the following reasons. RA activity generated during the
current syllable contributes to the late and gap portion of HVc_AFP
activity. In our sequence learning model (Troyer and Doupe 2000
), the AFP not only provides a reinforcement signal to RA, but also affects the pattern of RA activity. Excluding the late and gap
portions of HVc_AFP activity from the efference copy prevents RA output
from contributing to RA input during the same syllable via the RA
HVc_AFP
AFP
RA feedback loop. It also prevents auditory
feedback from the current syllable from contributing acutely to the AFP
reinforcement signal. We will view the combined early and middle
activity signal as the efference copy passed on to the AFP, although it
may include auditory feedback from the previous syllable.
Plasticity rule
We used a simple model of associational learning. Synaptic
projections are in principle "all-to-all," i.e., associational learning takes place between all relevant combinations of pre- and
postsynaptic assemblies. Assemblies become functionally
disconnected when associational learning drives connection strengths to
zero. While our learning rule is meant to encompass the many potential mechanisms of associational plasticity in the song system, the form of our learning rule is based on analogies with NMDA
receptor-dependent long-term potentiation (LTP; Malenka and
Nicoll 1993
; Table 3, number 3). In the equation below
we use rpre(t) and
rpost(t) to denote the
activity level of the pre- and postsynaptic assemblies at time
t. Each presynaptic spike (at time
tpre) was assumed to give rise to a
postsynaptic "plasticity trace,"
, analogous to the amount of
NMDA-receptor binding. The shape of the function
determines the
time window for neural plasticity (see APPENDIX). This
plasticity trace is multiplied by postsynaptic activity to yield a
"plasticity signal,"
(t
tpre)rpost(t),
analogous to postsynaptic calcium concentration. Input from the AFP is
assumed to give a reinforcement signal R that modulates the
plasticity signal in all RA assemblies. (R is set to a
constant value of 1 in HVc.) Plasticity signals above a threshold value
increase synaptic strength (LTP); signals below
give rise to
long-term depression (LTD; Cummings et al. 1996
;
Hansel et al. 1997
; Lisman 1989
).
is
a "sliding threshold" that depends on the average amount of
activity in the postsynaptic cell (Abraham and Bear
1996
; Bienenstock et al. 1982
; Sejnowski
1977
). Thus, the change in synaptic strength resulting from
postsynaptic activity at time t and presynaptic activity at
time tpre is proportional to the
following quantity (see APPENDIX)
|
|
Local circuit mechanisms
Activity within each neural population was based on very simple
local circuitry. The output of each excitatory cell assembly was
computed as a linear function of its input after subtracting a
threshold value. RA included intrinsic excitatory connections that were
used to store syllable representations in a manner analogous to other
associative memory models or so-called attractor networks (Table 3,
number 4A; Amit 1989
). To minimize computation, only RA
includes such connections. Each population also includes a single
inhibitory assembly that is connected to all assemblies within the
corresponding population (Table 3, number 4B). Inhibition is of two
basic types. HVc_RA, HVc_AFP, and the AFP use "feedforward inhibition," in which inhibitory activity is equal to the average afferent input received by the population, minus a
threshold. RA uses "feedback inhibition," in which inhibitory
activity is driven by the average activity within the local
population. Feedback inhibition allows tighter control of the activity
in the local network but is computationally more expensive. Within a
given population, all excitatory assemblies receive a similar level of
inhibition. Since the only assemblies that get strongly activated are
those that receive enough input to overcome this inhibition, inhibition
mediates a form of "competition" among excitatory assemblies.
Decorrelating initial connectivity
Our simulations were designed to determine whether associational
learning, guided by template matching signals from the AFP, could
organize initially unstructured connections in the motor pathway to
produce the stored tutor song. The dominant computational problem
encountered in building the model was the positive feedback inherent in
associational learning rules: correlated activity increases synaptic
strength, which tends to further strengthen the correlation. Left
unchecked, this learning will continually amplify initially weak
associations, even spurious associations resulting from chance events.
One of the most important factors contributing to spurious correlations
was the limited size of our network simulations. The strength of random
correlations is highly dependent on network size, roughly decreasing
with the square root of the number of network units. Because of the
computational expense of simulating intrinsic feedback dynamics within
RA, we limited the number of RA assemblies to 40 (Fig. 6B).
Independently choosing each connection within such a network will
result in correlations that are an order of magnitude stronger than
those expected in a more realistically sized network containing 4000 assemblies. The calculation of HVc_RA activity was computationally less
expensive, and a larger number (5 × 40 = 200) of HVc_RA
assemblies was included. While reducing correlations to some degree,
these numbers still do not approach physiologically realistic numbers. We note here that the greater storage capacity of larger networks resulting from a reduction in random correlations (Amit
1989
) may relate to reports of a relationship between the size
of various song nuclei and the number of song syllables learned
(reviewed in Brenowitz 1997
; Nordeen and Nordeen
1997
).
To address the problem of correlated connections, we chose initial
patterns of connectivity specifically aimed at minimizing these
correlations (Table 3, number 5). Initial connection strengths were
chosen according to two basic strategies. For HVc_RA
RA and HVc_RA
HVc_AFP connections, we used a "single-projection" strategy, in
which each presynaptic assembly connects with a single postsynaptic
assembly. This ensures that the levels of input received by any two
assemblies in the postsynaptic population are independent. However, the
single-projection strategy does not prevent correlations arising from
polysynaptic pathways within the recurrent circuitry in RA. For these
intrinsic RA connections, we used a "uniform" strategy, in which
each presynaptic assembly connects with all postsynaptic assemblies
with equal strength. This ensures that all correlations result from a
global signal shared by all assemblies. While such a signal will
increase overall synaptic strengths, it will not lead to spurious
patterns of correlations within the network. To ensure that
our model was robust to some degree of correlation, zero-mean Gaussian
perturbations were added to all plastic connections during the
initialization process. The standard deviation of the perturbations was
set to 10% of the strength of the nonzero synapses. After the
perturbation, negative strengths were set to zero. Noise was not added
to the three projections that did not undergo plasticity (the premotor
drive, auditory feedback, and template storage connections from HVc_AFP
to the AFP).
Homeostatic mechanisms
In addition to decorrelating initial connectivity patterns, we
include two sources of homeostatic negative feedback to
counteract the positive feedback inherent in associational learning
(Table 3, number 6). The first is a normalization of synaptic strength: after applying associational change for each simulated syllable, the
strengths of all synapses onto (or from) a given assembly are
multiplied by a single number so that the total amount of postsynaptic
(or presynaptic) strength for any one assembly remains nearly constant
(see APPENDIX). This kind of multiplicative normalization controls total synaptic strength without altering the relative magnitude of the individual connections. Presynaptic normalization was
applied before postsynaptic normalization (see APPENDIX).
The strengths to which synaptic connections were normalized were chosen by hand so that 1) intrinsic RA circuitry contributed a
large component (50%) of the input to RA assemblies, and 2)
auditory feedback contributed a modest portion (20%) of the input to
HVc_AFP. The mechanisms underlying homeostasis are just now beginning
to receive focused attention. Multiplicative normalization of synaptic strength has been shown by Turrigiano et al. (1998)
and
was hypothesized to depend on mean levels of activity. An approximation
to our postsynaptic normalization rule follows if mean levels of
activity (calculated on long time scales) are related to total
excitatory strength synapsing on that neuron. Mechanisms such as
conservation of transmitter released and/or retrograde trophic factors
could underlie presynaptic normalization.
The second source of negative feedback is inhibitory plasticity that is
homeostatic, i.e., if an excitatory assembly becomes too active, the
inhibitory connection onto that assembly is strengthened (Rutherford et al. 1997
; see APPENDIX). We
note that controlling feedback in the model was not always
straightforward, since oscillatory instability results if negative and
positive feedback mechanisms operate on similar time scales.
Quantifying learning time course
To quantify the learning time course, we divided the model
output into 250 syllable epochs and computed the matrix
Mact of co-fluctuations in activity
between each pair of RA assemblies over each epoch. During the
mth epoch
|
(n) is the average activity across
assemblies during syllable n. We compared
Mact to an ideal syllable matrix,
Msyl, characterizing the groupings of
assemblies that characterize syllables in the tutor song. Following
Fig. 6, the 40 vocal features were grouped into five syllables, indexed
as follows: syllable A, 1-8; B, 9-16; C, 17-24; D, 25-32; E,
33-40. Mijsyl = 4, if
i and j belong to the same syllable;
Mijsyl =
1, if
i and j belong to different syllables. This is
the matrix of co-fluctuations that would be obtained from tutor song
depicted in Fig. 6A, if each assembly had an average
activity level of 1. Comparison between matrices was done by taking the
correlation coefficient (CC) between the entries in two matrices. The
CC between any two N × M dimensional
matrices A and B was defined as follows. First
the mean value is subtracted from each element in the matrix: Âij = Aij
(1/NM)
Aij;
ij = Bij
(1/NM)
Bij. Then
|
j.
The CC was also used to monitor the connectivity appropriate for the
efference copy mapping and for the intrinsic RA connectivity that
underlies syllable encoding. For the efference copy, we measured the CC
between the matrix of motor
sensory connection strengths from
HVc_RA
HVc_AFP and the HVc_RA
RA connections in the motor pathway. To quantify the development of syllable-based connectivity, we
calculated the CC between Msyl and the
matrix of intrinsic RA connection strengths, again excluding diagonal entries.
| |
RESULTS |
|---|
|
|
|---|
In presenting the results of our computational model, we focus on the quantitative data produced by a single representative simulation. This allows a step-by-step illustration of song development and demonstrates the mutual consistency of the functional hypotheses described above. After presenting these results, we show how the model reacts to changes in important parameters.
Problem 1: Auditory feedback delay
The first step in our proposed solution to the problem of feedback
delay is the learning of an efference copy mapping. At the beginning of
each simulation, connections in the motor pathway are unstructured and
random HVc_RA activity leads to random patterns of RA output (Fig.
6C). Efference copy learning results from associations between the random patterns of HVc_RA activity and HVc_AFP activity induced by auditory reafference (Fig. 4A). We examined the
development of an efference copy map in two ways (Fig.
8). First, an accurate map should cause
efference copy activity to match the auditory reafference. Figure
8A shows the pattern of vocal output (left column of
each pair, marked V) and HVc_AFP efference copy activity (right column of each pair, marked EC) for five syllables
spanning the period of initial efference copy learning. Note that,
because of our simple encoding scheme, vocal output, RA motor activity, and auditory feedback have equivalent representations (see
METHODS). Initially, both patterns of activity are highly
distributed and unrelated (Fig. 8A, left pairs).
As efference copy learning progresses, the activity remains distributed
(significant syllable learning has not taken place), but the efference
copy activity is highly correlated with the vocal output (Fig.
8A, right pair). Note that a perfect match is not
required (Jordan and Rumelhart 1992
); the efference copy
estimate only has to be accurate enough so that, on average, the AFP
will reinforce the proper correlations in RA (see Fig.
9, bottom). An accurate
efference copy mapping can also be measured by determining the
similarity between the mapping of HVc_RA onto motor features in RA and
the efference copy mapping of HVc_RA onto sensory features in HVc_AFP.
Figure 8B shows the correlation coefficient (see
METHODS) between the connection strengths from HVc_RA
RA and those from HVc_RA
HVc_AFP. By syllable 500, efference copy
correlation has reached 0.81, 84% of the maximum value (0.96) reached
during the simulation.
|
|
Problem 2: Syllable learning in RA
CALCULATION OF THE REINFORCEMENT SIGNAL. The AFP guides syllable learning by transmitting a nonspecific reinforcement signal that uniformly modulates plasticity in all RA assemblies. To calculate the match to the template, each AFP assembly sums the input from HVc_AFP assemblies encoding a distinct tutor syllable (Fig. 6B). The competition mediated by mutual inhibition in the AFP ensures that significant activation of the AFP occurs only if HVc_AFP activity is mostly confined to assemblies corresponding to one (or a few) tutor syllables. The final reinforcement value was obtained by thresholding each AFP assembly's output and summing these thresholded outputs (see APPENDIX for details).
The outcome of this procedure is shown in Fig. 9. Figure 9A shows the vocal output (marked V) and efference copy (marked EC) for 11 consecutive syllables sung during the period of syllable learning. The black bars show the reinforcement signal. This reinforcement is obtained from evaluating the HVc_AFP efference copy activity on the right of each column but is used to modulate associational learning for the RA motor activity generating the vocal output shown on the left. Large reinforcement is obtained when efference copy activity is concentrated within assemblies encoding a single tutor syllable (e.g., syllable 11,006 and 11,009). Smaller reinforcement signals are computed when HVc_AFP activity is distributed among assemblies encoding two syllables (e.g., syllables 11,000, and 11,003). Note that the 11,007th syllable produced by the model was dominated by the motor assemblies encoding D, but the AFP signaled minimal reinforcement because of an inaccurate efference copy representation.SYNAPTIC REORGANIZATION.
Reinforcement-guided syllable learning is shown in Fig.
10. Initially, RA
RA connection
strengths were set to be nearly equal (A,
middle), minimizing the presence of randomly correlated
connections that would have to be "unlearned" (see
METHODS). Note that self-connections are not included in
our model (diagonal entries are zero), since strong self-correlations
would tend to dominate associational learning. Unstructured input from
HVc_RA (A, left) resulted in random patterns of
RA activity (A, right). Because AFP-mediated reinforcement 1) is greatest when assemblies corresponding
to a common tutor syllable are co-active, and 2) results in
large increases in synaptic strength onto active RA assemblies, RA
assemblies began to develop strong connections with other RA assemblies
encoding the same syllable (B, middle).
Reinforcement also guided learning within the projection from HVc
RA, causing RA assemblies encoding the same tutor syllable to receive
input from similar sets of HVc_RA assemblies and thus to receive
correlated patterns of HVc input (B, left). Both
the recurrent circuitry and HVc_RA input led to RA activity partially
matched to the tutor syllables (B, right). After
learning was complete, HVc_RA input was a mixture of tutor syllable
representations (C, left). Strong intrinsic circuitry (C, middle) amplifies the activity
within assemblies encoding the most strongly driven syllable, and
inhibitory competition suppresses other responses (see
METHODS and APPENDIX). As a result, the model
produced motor output perfectly matched to the syllables in the tutor
song (C, right). Because HVc_RA continues to be
driven by the random premotor drive, syllables are produced in a random sequence. Sequence learning will be addressed in our companion paper
(Troyer and Doupe 2000
).
|
TIME COURSE OF LEARNING. The developmental time course of song learning in our model is shown in Fig. 11. To quantify convergence toward the tutor song, we first computed an "ideal" syllable covariance matrix, Msyl. This is a 40 × 40 matrix containing the covariance in the level of activity between each pairing of the 40 RA assemblies (sampled over a number of consecutive syllables), where it is assumed that the model is producing a perfect rendition of the tutor song. Msyl has strong positive entries for pairs of assemblies belonging to the same tutor syllable and negative entries for pairs belonging to different syllables. We then divided the model output into 250 syllable epochs and computed the matrix of co-fluctuations in activity between each pair of RA assemblies over each epoch. Convergence toward the tutor song was quantified by computing the correlation coefficient between the entries in Msyl and those in the co-fluctuation matrix (Fig. 11, solid line). For detailed definitions of these calculations, see METHODS, Quantifying learning time course. We also computed the correlation coefficient between the pattern of RA connectivity and Msyl (Fig. 11, dashed line). The development of intrinsic RA connectivity is mirrored by the appearance of the corresponding correlations in RA activity.
|
Problem 3: Separating motor and sensory signals in HVc
HVc_AFP receives two functionally distinct sets of inputs: efference copy inputs from HVc_RA and auditory feedback (Fig. 7). The unmixing of signals is addressed in our model by 1) using weak feedback, and 2) including "adaptation" in HVc_AFP (Fig. 5). The action of the HVc_AFP adaptation mechanism is shown in Fig. 12. Excitation within HVc_AFP assemblies recruits a negative current that decays exponentially (Fig. 12A, bottom). When the efference copy input from HVc_RA correctly predicts the pattern of audit