JN Fuel your research with LabChart
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J Neurophysiol 84: 1204-1223, 2000;
0022-3077/00 $5.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (48)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Troyer, T. W.
Right arrow Articles by Doupe, A. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Troyer, T. W.
Right arrow Articles by Doupe, A. J.

The Journal of Neurophysiology Vol. 84 No. 3 September 2000, pp. 1204-1223
Copyright ©2000 by the American Physiological Society

An Associational Model of Birdsong Sensorimotor Learning I. Efference Copy and the Learning of Song Syllables

Todd W. Troyer1,3 and Allison J. Doupe1,2,3,4

 1Department of Psychiatry,  2Department of Physiology,  3W. M. Keck Center for Integrative Neuroscience, and  4Sloan Center for Theoretical Neurobiology at UCSF, University of California, San Francisco, California 94143-0444


    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
APPENDIX
REFERENCES

Troyer, Todd W. and Allison J. Doupe. An Associational Model of Birdsong Sensorimotor Learning I. Efference Copy and the Learning of Song Syllables. J. Neurophysiol. 84: 1204-1223, 2000. Birdsong learning provides an ideal model system for studying temporally complex motor behavior. Guided by the well-characterized functional anatomy of the song system, we have constructed a computational model of the sensorimotor phase of song learning. Our model uses simple Hebbian and reinforcement learning rules and demonstrates the plausibility of a detailed set of hypotheses concerning sensory-motor interactions during song learning. The model focuses on the motor nuclei HVc and robust nucleus of the archistriatum (RA) of zebra finches and incorporates the long-standing hypothesis that a series of song nuclei, the Anterior Forebrain Pathway (AFP), plays an important role in comparing the bird's own vocalizations with a previously memorized song, or "template." This "AFP comparison hypothesis" is challenged by the significant delay that would be experienced by presumptive auditory feedback signals processed in the AFP. We propose that the AFP does not directly evaluate auditory feedback, but instead, receives an internally generated prediction of the feedback signal corresponding to each vocal gesture, or song "syllable." This prediction, or "efference copy," is learned in HVc by associating premotor activity in RA-projecting HVc neurons with the resulting auditory feedback registered within AFP-projecting HVc neurons. We also demonstrate how negative feedback "adaptation" can be used to separate sensory and motor signals within HVc. The model predicts that motor signals recorded in the AFP during singing carry sensory information and that the primary role for auditory feedback during song learning is to maintain an accurate efference copy. The simplicity of the model suggests that associational efference copy learning may be a common strategy for overcoming feedback delay during sensorimotor learning.


    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
APPENDIX
REFERENCES

The combination of a well-characterized, stereotyped behavior and specialized anatomy makes birdsong an ideal system in which to study the neural basis of motor learning. Moreover, song learning shares important similarities with human speech learning (Doupe and Kuhl 1999). In birds, vocal learning is accomplished in two phases. During an initial, sensory phase, birds listen to and memorize a tutor song, often called the "template" (Konishi 1965; Marler 1964). In a later, sensorimotor phase, birds gradually match their vocalizations to the memorized song, using auditory feedback from their own vocalizations (Fig. 1, A and B). We have constructed a computational model demonstrating that simple associational (Hebbian) learning rules are sufficient to address important problems related to the sensorimotor learning of song. Our model focuses on the zebra finch, a species commonly used in physiological investigations of song learning. Zebra finch song consists of a stereotyped sequence of vocal gestures or "syllables." In this paper, we focus on the learning of the individual syllables. In the following companion paper (Troyer and Doupe 2000), we extend our model to include sequence learning.



View larger version (27K):
[in this window]
[in a new window]
 
Fig. 1. The song system. A: developmental time course. During sensory learning, birds memorize a song from their tutor. Our model assumes that this process has already been completed. During sensorimotor learning, birds use auditory feedback from their own vocalizations to match their song to the memorized template. These stages of learning may overlap. After learning, song "crystallizes," becoming more stable and less dependent on auditory feedback. B: behavioral schematic of sensorimotor learning (cf. Konishi 1965). C: song system anatomy: Anterior Forebrain Pathway (AFP) (gray); motor pathway (white). Field L (black) receives input from auditory thalamus and provides direct and/or indirect auditory input to HVc (Fortune and Margoliash 1995; Janata and Margoliash 1999; Vates et al. 1996). D: schematic of the "AFP comparison hypothesis." Note that the 100-ms estimated latency (see Model and approach) for motor signals to leave robust nucleus of the archistriatum (RA), return as auditory feedback via L, and then be processed in the AFP is nearly as long a typical song syllable. Thus, the evaluation of auditory feedback from one syllable would arrive in RA during the motor activity for the subsequent syllable.

The likely neural substrate for sensorimotor learning is the song system, a set of brain nuclei specialized for vocal learning and production (Nottebohm et al. 1976) (Fig. 1C). The motor pathway for song includes the direct projection from nucleus HVc (used as a proper name; Margoliash et al. 1994) to the robust nucleus of the archistriatum (RA). Both nuclei display neural activity time-locked to song production (McCasland 1987; Yu and Margoliash 1996), and lesions in either nucleus disrupt normal song production at all stages of development (Nottebohm et al. 1976; Simpson and Vicario 1990). HVc and RA are also connected by an indirect pathway, the Anterior Forebrain Pathway (AFP). Lesion studies indicate that the AFP is crucial for song learning, but is not necessary for normal song production in adults (Bottjer et al. 1984; Scharff and Nottebohm 1991; Sohrabji et al. 1990). These and other data (see Biologically supported assumptions) have led to the "AFP comparison hypothesis," in which the AFP guides sensorimotor learning by transmitting a comparison between auditory feedback from the bird's own vocalizations and the memorized template (Bottjer and Arnold 1986; Doupe 1993; Mooney 1992; Nordeen and Nordeen 1988; Saito and Maekawa 1993). These comparison signals are used to guide learning in the motor pathway at the level of RA (Fig. 1, C and D).

The AFP comparison hypothesis is challenged by a fundamental problem in motor learning, the problem of feedback delay (Lashley 1951; Miall and Wolpert 1996; Miles and Evarts 1979). In zebra finches, the 100-ms estimated latency (see Fig. 2) for presumptive AFP comparison signals to arrive in the motor pathway after a motor command is nearly as long as a typical song syllable. This delay would cause comparison signals for one syllable to have greatest overlap with the neural activity for the subsequent syllable and poses a significant challenge to the notion that AFP comparison signals guide learning in RA (see Bottjer and Arnold 1986). In our model, we retain the hypothesis that the AFP plays an important role in template comparison but propose that instead of waiting for the actual auditory feedback, an internal prediction or "efference copy" of the auditory feedback is generated within HVc to guide song learning. Therefore, we predict that the signals recorded in the AFP during singing (Hessler and Doupe 1999a,b) are motor signals that also carry sensory information. Furthermore, our model suggests a functional reason for why the AFP is located downstream of the motor nucleus HVc (Fig. 1, C and D): use of an efference copy requires that brain areas involved in template comparison receive motor efferents.

Preliminary versions of this work have been presented in conference proceedings (Troyer et al. 1996a,b).

Model and approach

Over the past 25 years, anatomical, lesion, and in vivo physiology studies have yielded a wealth of data concerning the functional anatomy of the song system. However, current hypotheses regarding the sensory-motor interactions during song learning lack detail. To explore these issues, we set out to build a computational model of the sensorimotor phase of song learning. Our goal was to determine if basic theoretical problems in sensorimotor learning could be solved using simple rules of associational plasticity, constrained by the known anatomy of the song circuit. We hoped to direct future experiments by identifying important gaps in our knowledge, as well as to evaluate previous experimental results from a computational point of view.

Our efforts resulted in two closely related models, addressing the problem of song learning at different levels of abstraction. The first model is a purely "conceptual model," i.e., a self-consistent set of functional hypotheses conforming to a wide range of experimental results. The functional hypotheses contained in this model constitute the core contribution of our research. The second model is a true "computational model" that incorporates these hypotheses into a working computer algorithm. Due to the very limited knowledge of the song system at the level of local circuits, implementing this algorithm required a number of specific assumptions that reach beyond current experimental knowledge. As a result, several aspects of the computational model are not well-constrained by biology. Moreover, we made a number of simplifying assumptions to ensure that simulations could be run in a reasonable amount of time. However, the computational model played an important role in exploring our initial functional ideas and serves to illustrate our core conceptual hypotheses. Perhaps more importantly, the construction of a working computational algorithm demonstrates the mutual consistency of our hypotheses, as well as providing a theoretical demonstration that they are sufficient to account for important aspects of song learning. This dual approach not only highlights general problems of sensorimotor learning and generates testable predictions at a functional level, it also provides a framework for understanding how specific biological mechanisms may contribute to their solution. These models are only a first step, and, of necessity, contain many simplifications. However, taken together, they constitute the most detailed set of hypotheses to date regarding the interaction of sensory and motor signals during the sensorimotor phase of song learning.

In this section, we present the justification for our working biological assumptions. We then describe the main problems addressed by our model and outline the key elements of our proposed solution. Finally, we present our conceptual model, which describes our functional hypotheses in greater detail. In the METHODS section, we outline the theoretical assumptions incorporated into our working computational model, including a description of the network architecture and the simple encoding scheme used to represent song. In the RESULTS section, we present quantitative results generated by our computational model. Details of the computer algorithm are confined to an APPENDIX.

Biologically supported assumptions

Although the nature of template memorization is largely unknown, various lines of evidence suggest that the AFP may transmit a comparison between the bird's own vocalizations and the memorized tutor song. We call such signals "template comparison signals." Initial evidence suggesting a role for the AFP in template comparison came from lesion experiments: AFP lesions in juvenile zebra finches disrupt song learning, whereas lesions in adult birds have little effect on normal song production (Bottjer et al. 1984; Nottebohm et al. 1976; Scharff and Nottebohm 1991; Sohrabji et al. 1990). Further experiments have shown that the lateral portion of the magnocellular nucleus of the anterior neostriatum (LMAN), the output nucleus of the AFP, appears to be necessary any time the song changes, even in adulthood (Brainard and Doupe 2000; Morrison and Nottebohm 1993; Williams and Mehta 1999). Other experiments suggest that circuitry within the AFP may function as a template: AFP neurons develop song selective auditory responses during song learning (Doupe 1997; Solis and Doupe 1997), and a subset of these neurons respond vigorously to the tutor song (Solis and Doupe 1997, 1999). Using a more direct approach, Basham et al. (1996) showed that local blockade of N-methyl-D-aspartate (NMDA) receptors in the AFP specifically during song memorization disrupts normal song learning.

Within the framework of our model, the simplest hypothesis is that the AFP not only transmits a template comparison signal, but that it also computes the match between the efference copy and the memorized template, i.e., the AFP is the storage site for the tutor template. We did not attempt to model the AFP circuitry that subserves template comparison but rather viewed the AFP as a "black box" that performs the necessary calculations. An alternative hypothesis that is still consistent with the basic structure of our model is that the AFP transmits a template comparison signal, but that memorized template information is stored closer to the auditory periphery than the AFP (see DISCUSSION).

Additional studies into the functional anatomy of the song system have shown that the neurons that project to RA and those that project to the AFP form distinct populations within HVc (Nordeen and Nordeen 1988). We denote these two populations HVc_RA and HVc_AFP. While the evidence is indirect, these two populations are likely to be highly interconnected (Fortune and Margoliash 1995; Vu and Lewicki 1994). Various data suggest that activity within HVc_RA neurons is more closely tied to motor behavior, whereas activity within HVc_AFP neurons is more closely tied to auditory input (Katz and Gurney 1981; Kimpo and Doupe 1997; Lewicki 1996; Saito and Maekawa 1993; but see Doupe and Konishi 1991; Vicario and Yohay 1993). Moreover, experiments in singing birds suggest that the motor pathway is arranged hierarchically, with RA encoding the detailed motor program for each song syllable, and the central pattern generator for song sequence lying upstream of RA, perhaps in HVc (Vu et al. 1994; Yu and Margoliash 1996).

The main biologically supported assumptions that are incorporated into the model are summarized in Table 1.


                              
View this table:
[in this window]
[in a new window]
 
Table 1. Biologically supported assumptions

The final data included in the model were the estimated latencies between various song nuclei (Fig. 2A). We included only the best studied neural pathways in the song system, as the functional significance of other signaling pathways remains unclear (see Foster and Bottjer 1998; Foster et al. 1997; Striedter and Vu 1998; Vates et al. 1997). We used 50 ms for the latency from HVc premotor activity to vocal output (McCasland 1987; McCasland and Konishi 1981), and 15 ms for auditory latencies to HVc (Margoliash and Fortune 1992). Estimating the processing time through the AFP during song was more problematic, since activity in LMAN, the output nucleus of this pathway, is quite variable. We used 45 ms for the latency to LMAN (A. J. Doupe 1997; personal observations). Subtracting 15 ms for the latency to HVc and adding 10 ms for the delay between LMAN and RA, we obtained a processing time through the AFP of roughly 40 ms. Simulated syllables were 80-ms long with a 35-ms gap between syllables (Fig. 2B), typical of mean values for zebra finch song (M. Brainard, personal communication; Scharff and Nottebohm 1991; Zann 1993). These timing data suggest that, on average, presumptive template comparison signals from the AFP will have the greatest overlap with motor activity for the subsequent syllable (Fig. 2C, dotted box).



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 2. Timing within the song system. A: numbers represent estimated latencies between song nuclei (see RESULTS); 40 ms represents the entire processing time for signals passing through the AFP. B: the length of model syllables (M. Brainard, personal communication; Scharff and Nottebohm 1991; Zann 1993). C: time delay (100 ms) for motor activity to return to RA via auditory feedback (45 + 15 ms) and the AFP (40 ms). A signal transmitted by the AFP that carries the match between the syllable just sung and the memorized template will arrive in RA during the motor activity for the next syllable (dotted box).

Problems addressed

In this paper, we address the problem of learning a collection of motor representations corresponding to song syllables stored within a memorized template. For simplicity, we do not address learning the detailed temporal structure within each syllable, nor learning the length of syllables and inter-syllable gaps. Our model rests on two key assumptions: 1) song learning is accomplished using simple associational learning rules and 2) the AFP guides song learning by transmitting a signal that carries information about the match between the bird's auditory feedback and a stored template. Here, we present a brief outline of the main problems addressed by our model and the key functional hypotheses that underlie our solutions (see Table 2). More detail regarding our hypothesized solutions is presented in the form of a conceptual model (see Conceptual model) and a computational model (see RESULTS). The presentation of both models is structured according to the following outline.


                              
View this table:
[in this window]
[in a new window]
 
Table 2. Functional hypotheses for syllable learning

The first problem we address is the important problem of auditory feedback delay: presumptive AFP comparison signals would arrive in RA during the neural activity for the next syllable (Fig. 2C). We hypothesize that the AFP does not directly evaluate auditory feedback, but instead, receives an internally generated prediction of the sensory feedback resulting from song-related motor activity (Table 2, number 1). Such an internal prediction requires a transformation from motor to sensory coordinates and has been termed efference copy (Sperry 1950), "corollary discharge" (von Holst and Mittelstaedt 1980), or the result of a "forward model" (reviewed in Jordan 1995; Miall and Wolpert 1996). We will use the term efference copy. Sensory signals resulting from motor behavior have been termed sensory "reafference" (von Holst and Mittelstaedt 1980). We further hypothesize that the motor right-arrow sensory efference copy develops between the two populations of HVc projection neurons (Table 2, number 2). To learn this mapping, it is important that our associational plasticity rule is "temporally asymmetric," i.e., presynaptic activity must be followed by postsynaptic activity to induce plasticity (Table 2, number 3).

The second problem we address is the nature of AFP-guided syllable learning in RA. We make two functional hypotheses. First, we hypothesize that syllable learning is guided by nonspecific reinforcement signals provided by the AFP that modulate the degree of ongoing associational plasticity throughout RA (Table 2, number 4; see Sutton and Barto 1998, for an overview of reinforcement learning). This hypothesis is motivated by the fact that nonspecific reinforcement signals, while generated by a match to a sensory template, do not have to be directed toward specific patterns of RA motor neurons. As a result, no sensory right-arrow motor mapping is required to guide learning. Second, we hypothesize that synapses intrinsic to RA play an important role in storing syllable representations (Table 2, number 5). This hypothesis was motivated by the need to learn a number of discrete patterns of neural activity corresponding to the syllables in the tutor template and is consistent with estimates that up to 85% of synapses in RA come from local collaterals of other RA neurons (Herrmann and Arnold 1991). Theoretical models have shown that recurrent activity is ideal for stabilizing such patterns (e.g., Hopfield 1984). Moreover, if the representation for individual syllables is encoded in the pattern of intrinsic RA synapses, plasticity in the synapses connecting HVc and RA can alter the sequence of syllables produced, with only minor disruption to the representation for each individual syllable (see Troyer and Doupe 2000).

The third problem we address results from the competing requirements of both learning and using the efference copy signal. Learning an efference copy mapping by associating motor activity with delayed auditory feedback implies that auditory inputs induce significant levels of activity. However, when using the short-latency efference copy signals to guide syllable learning, the strong auditory inputs will interfere with the efference copy signal. We address this problem by assuming that the auditory feedback signal is relatively weak and/or that the response of HVc_AFP neurons is strongly adapting (Table 2, number 6).

Conceptual model

Our model focuses on four neural populations (Fig. 3): nucleus RA in the motor pathway, separate populations of HVc projection neurons projecting to RA and the AFP (Nordeen and Nordeen 1988), and a single population representing the output of the AFP. Because we do not explicitly model nuclei downstream of RA, activity in RA represents the motor output of the model. In this paper, we explore the functional consequences of associational plasticity in three sets of connections: HVc_RA right-arrow HVc_AFP, HVc_RA right-arrow RA, and intrinsic RA right-arrow RA connections.



View larger version (31K):
[in this window]
[in a new window]
 
Fig. 3. Network architecture. Black arrows: plastic connections. Gray arrows: nonplastic connections. AFP right-arrow RA connections transmit a reinforcement signal that modulates plasticity in RA but does not affect RA activity patterns. Plastic connections from HVc_AFP right-arrow HVc_RA and from AFP right-arrow RA (not shown) are considered in the following companion paper (Troyer and Doupe 2000).

Our model does not address the learning of syllable timing. We assume that timing is provided by rhythmically clocked bursts of premotor activity arriving in HVc_RA, with the duration of each burst controlling the duration of premotor activity and hence the length of song syllables (Fig. 2B). While the source of the premotor drive is not explicitly modeled, the song nuclei nucleus uvaeformis (Uva) and/or nucleus interfacialis (NIf) are likely candidates (McCasland 1987; Striedter and Vu 1998; Williams and Vicario 1993). Input from the forebrain nucleus medial MAN is also a possible source. Although the timing of this drive is fixed, we assume that HVc_RA neurons receive varying magnitudes of drive, and these magnitudes are generated independently for each HVc_RA neuron and each vocalization produced by the model. Thus, HVc_RA produces random patterns of premotor activity that are independent from one syllable to the next. The model's task is to use template comparison signals generated by the AFP to reorganize the connections in the motor pathway so that 1) random HVc_RA activity is converted into a handful of stereotyped patterns of RA motor activity, and 2) these stereotyped patterns of RA activity lead to vocal output matched to the memorized template. Note that HVc_RA activity becomes ordered when we address the problem of sequence learning (Troyer and Doupe 2000).

PROBLEM 1: AUDITORY FEEDBACK DELAY. To address the problem of feedback delay, we hypothesize that an efference copy mapping is learned between the two populations of HVc projections neurons (Table 2, numbers 1 and 2). Since the connections in the motor pathway are initially unstructured, the random patterns of HVc_RA activity lead to a random exploration of motor space (cf. Bullock et al. 1993; Kuperstein 1988; Salinas and Abbott 1995). Activity flows down the motor pathway (McCasland 1987) and returns to HVc_AFP as auditory feedback (Fig. 4A, dark lines). While the exact form of the learning is not crucial for our model, it is important that associational learning is temporally asymmetric (Table 2, number 3), i.e., synaptic strengths increase only when presynaptic activity precedes postsynaptic activity (Bi and Poo 1998; Debanne et al. 1998; Gustafsson et al. 1987; Hebb 1949; Markram et al. 1997). By strengthening synapses onto neurons that are likely to fire in the near future, temporally asymmetric "Hebbian" learning strengthens synaptic inputs that "anticipate" any postsynaptic activity that regularly follows presynaptic spiking (cf. Blum and Abbott 1996; Gerstner and Abbott 1997). In our model, auditory feedback to HVc_AFP neurons encoding the sensory aspects of a particular vocal gesture will follow spiking in HVc_RA neurons encoding motor aspects of that gesture. Associational learning then strengthens the synapses from that (presynaptic) HVc_RA neuron onto the corresponding (postsynaptic) neurons in HVc_AFP (Fig. 4A, white arrow). After this motor right-arrow sensory mapping is learned, activity within HVc_RA motor neurons will drive, with short latency, the HVc_AFP neurons encoding the corresponding sensory representation. This short-latency motor activity in HVc_AFP constitutes a sensory prediction of the auditory reafference. This efference copy can then be passed on to the AFP and used to guide learning in RA. Note that efference copy learning occurs within HVc and proceeds without reference to the tutor template stored in the AFP. Using efference copy in this way splits the total feedback delay for AFP comparison signals to return to RA into two shorter delays: the auditory feedback delay of 65 ms to HVc (Fig. 4A) and the 40-ms processing delay from HVc through the AFP (Fig. 4B).



View larger version (45K):
[in this window]
[in a new window]
 
Fig. 4. Two-step solution to the problem of feedback delay. A: step 1: efference copy learning. Each syllable is initiated by a random premotor drive to HVc_RA. This signal travels through the motor and auditory feedback pathways (black arrows) arriving in HVc_AFP with a delay of 65 ms. Motor nuclei downstream of RA are not explicitly modeled. Associational learning (white arrow) between premotor HVc_RA activity and HVc_AFP activity driven by auditory feedback results in an efference copy mapping. B: step 2: learning syllable representations. The efference copy is passed on to the AFP, and the match with the stored template serves as a reinforcement signal (line with round end) that modulates plasticity signals in RA. This modulation reorganizes intrinsic connections within RA, as well as the projection from HVc (white arrows).

PROBLEM 2: SYLLABLE LEARNING IN RA. To guide syllable learning, the AFP evaluates the efference copy and transmits a reinforcement signal to RA (Table 2, number 4). This nonspecific reinforcement signal is assumed to modulate the degree of ongoing associational plasticity throughout RA. An efference copy that is well-matched to the tutor song results in a large plasticity signal in RA neurons that are significantly activated, leading to a potentiation of recently activated synapses; a poor match evokes small potentiation or depression. Since a good match to the tutor song occurs when the RA neurons that encode a single tutor syllable are co-active, reinforcement leads to the development of strong connections between RA neurons encoding the same tutor syllable (Table 2, number 5). Reinforcement also reorders the connections from HVc_RA right-arrow RA (see RESULTS). These patterns of connectivity result in a strong tendency for RA to produce coherent patterns of motor activity matched to the template, i.e., the tutor syllables have become "attractors" for the neural dynamics within RA (see, e.g., Amit 1989).

PROBLEM 3: SEPARATING MOTOR AND SENSORY SIGNALS IN HVC. In our model, HVc_AFP neurons receive two distinct inputs: auditory feedback, which drives efference copy learning, and motor input from HVc_RA, which carries the efference copy used for AFP-driven song learning. While necessary for efference copy learning, the delayed auditory signal can interfere with the efference copy signal used to guide learning. We propose two strategies for separating sensory and motor signals within HVc_AFP (Table 2, number 6). First, the auditory feedback signal is set significantly weaker than the efference copy signal. Hence, auditory feedback only weakly perturbs the efference copy, which can remain sufficiently accurate to guide syllable learning. However, weak auditory feedback is able to guide efference copy learning by providing, over the course of multiple syllables, a consistent association between HVc_RA motor activity and the resulting weak sensory activation. The second strategy is based on the cancellation of auditory feedback signals in HVc_AFP by "adaptation." Specifically, adaptation in the HVc_AFP circuitry results in a "negative after-image" of any given pattern of HVc_AFP activity (Fig. 5), which has a decay time (100 ms) similar to the length of a typical song syllable (see APPENDIX for implementation). A variety of biological mechanisms could provide this kind of adaptation, e.g., spike-triggered or voltage-dependent intrinsic currents and/or slow feedback inhibition. Such mechanisms have been shown to be present within HVc (Dutar et al. 1998; Kubota and Saito 1991; Kubota and Taniguchi 1998; Schmidt and Perkel 1998). Because the efference copy arrives in HVc_AFP with a shorter delay than the auditory feedback, the after-image of the efference copy will counteract the corresponding auditory reafference. That is, HVc_AFP neurons strongly activated by efference copy input from HVc_RA will be in an adapted state by the time that the corresponding patterns of delayed auditory feedback arrive in HVc_AFP. Note that an inaccurate efference copy will lead to an incomplete cancellation of auditory feedback, and interference from this delayed feedback will create an inaccurate efference copy. However, associations between the uncanceled feedback signal and the HVc_RA motor activity that gave rise to it will lead to new plasticity that improves the quality of future efference copy predictions. Details of how this cancellation mechanism works in the context of our computer algorithm are presented in the RESULTS.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 5. Separating efference copy and delayed auditory feedback. Given our estimates (Fig. 2A), auditory feedback will reach HVc_AFP 60 ms after the efference copy input from HVc_RA. Activity in HVc_AFP results from a mixture of these two signals (see Fig. 7 below). Adaptation mechanisms in HVc_AFP produce a delayed, negative image of HVc_AFP activity, which is subtracted from the auditory feedback. Accurate efference copy predictions can cancel auditory feedback; inaccurate predictions yield a difference signal that drives new efference copy learning.


    METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
APPENDIX
REFERENCES

The main assumptions that were necessary to construct our computational algorithm are summarized in Table 3 and are discussed below. Only the subsections explaining our method of neural encoding (see Neural encoding, Fig. 6) and the nature of HVc_AFP activity (see Tonic activity patterns, Fig. 7) are necessary for understanding the main computational results presented in the RESULTS. Other subsections describe issues of mainly theoretical interest. In the final subsection of the METHODS, we provide formulas for our method for characterizing the developmental time course in the model. Details of the computational algorithm are presented in the APPENDIX. The assumptions outlined in Table 3 are not crucial for the main predictions of our model; alternative algorithms that implement our functional hypotheses for song learning are possible. Our particular algorithm should be seen as a first approximation, one that allows us to explore associational learning between patterns of sensory and motor activity on the time scale of tens to hundreds of milliseconds.


                              
View this table:
[in this window]
[in a new window]
 
Table 3. Theoretical assumptions



View larger version (55K):
[in this window]
[in a new window]
 
Fig. 6. Encoding the problem of sensorimotor learning. A: representation of the tutor song. Ten consecutive syllables in the tutor song ( ... ABCDE ... ). For simplicity, we assume that each tutor syllable contains a nonoverlapping set of vocal features. These are numbered according to tutor syllable (features in syllable A numbered 1-8, features in B numbered 9-16, etc.). B: neural encoding and template storage. HVc_AFP and RA contain 40 assemblies, one for each of the 40 vocal features in the tutor song. The auditory feedback pathway connects each RA assembly (motor representation) with its corresponding HVc_AFP assembly (sensory representation). The AFP contains 5 assemblies, 1 for each tutor syllable. The connections from HVc_AFP to the AFP determine how vocal features are matched to tutor syllables, i.e., these connections store the template information. Connections to syllable B are shown as an example. C: motor output of the model (RA activity) for the first 10 syllables produced. Each column shows the pattern of RA activity for one particular syllable. Each row represents the activity of a particular RA assembly over the 10 syllables shown. Since connections in HVc and RA are unstructured, random patterns of premotor drive lead to RA activity that is initially unstructured. Using reinforcement signals, the model must "transfer" template information stored in sensory coordinates in the AFP to the motor pathway.



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 7. Mixing of signals in HVc_AFP. Due to the 60-ms delay between direct input from HVc_RA (5 ms) and auditory feedback (5 + 45 + 15 ms), separate calculations of HVc_AFP activity were made for the early (E), middle (M), late (L), and gap (G) portions of the premotor activity corresponding to each syllable. The efference copy output of HVc_AFP compared with the template in the AFP was calculated as the average value of HVc_AFP activity over the early and middle portions of the syllable. This activity will reach RA before the onset of the next syllable.

Each simulation consisted of repeated iterations of a computer subroutine that 1) calculated activity patterns related to a single syllable output by the model, 2) applied our synaptic plasticity rule, and 3) updated the various homeostatic mechanisms in the model. The details of the algorithm and the specification of model parameters are given in the APPENDIX. In most simulations, the subroutine was iterated for 25,000 syllables, ~5,000 more than were typically needed for model output to become stereotyped. When performance was degraded by changing parameters (see APPENDIX), simulations were extended to 50,000 syllables, but output sometimes lacked stereotypy. Computer simulations were written using the MATLAB simulation environment (version 5.3; The Mathworks, Natick, MA). Typical simulations took ~2 h when run using a 400-MHz Pentium II processor.

Neural encoding

Activity in the model was represented by the output of a number of neural "units." Each of these units is meant to represent the activity within a network of connected neurons or "cell assembly" (Hebb 1949). Hereafter, we will use the term "assemblies." Given the lack of data concerning the neural code for vocal gestures in the song system, we sought the simplest encoding scheme that could support associational learning (Table 3, number 1). Each vocal gesture produced by the model is viewed as a combination of 40 abstract "vocal features," with each RA assembly representing motor-related aspects of one feature, and each HVc_AFP assembly representing sensory-related aspects of one feature. Because of this one-to-one mapping of auditory and motor features, motor activity in a given RA assembly leads to auditory feedback input to the unique corresponding assembly in HVc_AFP. The tutor song consisted of five syllables, within the normal range for zebra finch song (3-9; Price 1979). We denote these syllables by the letters A-E and assumed that each tutor syllable was encoded by a distinct set of assemblies, allowing us to number vocal features consecutively, i.e., tutor syllable A contains vocal features 1-8, tutor syllable B contains features 9-16, etc. (Fig. 6A). The tutor template is stored in the AFP, with tutor syllables encoded in the connections from HVc_AFP: each AFP assembly corresponds to a single tutor syllable and receives input from the HVc_AFP assemblies representing the auditory features comprising that syllable (Fig. 6B). Connections related to syllable B are shown as an example. Our choice of this very simple representation was guided by the following considerations: 1) due to the complexity of the network and finite computational resources, our model contains only a limited number of assemblies; 2) since learning correlated patterns with Hebbian learning rules is a largely unsolved theoretical problem, we chose an encoding scheme in which uncorrelated patterns of motor activity result in uncorrelated patterns of sensory feedback; 3) our encoding scheme ensures decorrelation in the motor right-arrow sensory mapping even for assemblies using nonlinear input-output functions.

Initially, all connections in the motor pathway are unstructured. Thus, random activity in HVc_RA leads to random motor activity in RA (Fig. 6C). The model's task is to 1) compare sensory signals with the stored template in the AFP to guide plasticity within the motor pathway, and 2) use these signals to guide plasticity in the motor pathway so that random HVc_RA activity is converted to stereotyped patterns of RA activity matched to the tutor song.

Tonic activity patterns

For simplicity, we assume that song-related activity is encoded by the neural firing rates averaged over the course of each song syllable. Thus, the activity within each of the four neural populations is modeled as a vector of firing rates, with one entry for each assembly in the population. For all populations except HVc_AFP, firing rates are assumed to be constant during the period of premotor drive for each syllable and zero during the gap between syllables. In HVc_AFP, we divided each syllable into four time epochs depending on the combination of efference copy (related to the current syllable) and auditory feedback input received during that syllable (Fig. 7). During the early part of each syllable (marked E), HVc_AFP receives efference copy input from HVc_RA that relates to the current syllable, while the sensory input is due to delayed auditory feedback from the previous syllable. The middle portion of each syllable (marked M) corresponds to the period of silence in the delayed feedback. During this period, HVc_AFP receives efference copy input only. During the late part of the syllable (marked L), the efference copy and auditory inputs correspond to the same syllable. Finally, during the "gap" period between bursts of HVc_RA activity (marked G), HVc_AFP receives only auditory input. During the epochs when efference copy and auditory feedback inputs overlap, the two sources of input were simply summed. For computational and conceptual simplicity, we chose not to propagate this subdivision of activity to the AFP. The efference copy activity that was passed on to the AFP was calculated from the average activity in HVc_AFP during the early and middle portion of the syllable. Late and gap portions were excluded for the following reasons. RA activity generated during the current syllable contributes to the late and gap portion of HVc_AFP activity. In our sequence learning model (Troyer and Doupe 2000), the AFP not only provides a reinforcement signal to RA, but also affects the pattern of RA activity. Excluding the late and gap portions of HVc_AFP activity from the efference copy prevents RA output from contributing to RA input during the same syllable via the RA right-arrow HVc_AFP right-arrow AFP right-arrow RA feedback loop. It also prevents auditory feedback from the current syllable from contributing acutely to the AFP reinforcement signal. We will view the combined early and middle activity signal as the efference copy passed on to the AFP, although it may include auditory feedback from the previous syllable.

Plasticity rule

We used a simple model of associational learning. Synaptic projections are in principle "all-to-all," i.e., associational learning takes place between all relevant combinations of pre- and postsynaptic assemblies. Assemblies become functionally disconnected when associational learning drives connection strengths to zero. While our learning rule is meant to encompass the many potential mechanisms of associational plasticity in the song system, the form of our learning rule is based on analogies with NMDA receptor-dependent long-term potentiation (LTP; Malenka and Nicoll 1993; Table 3, number 3). In the equation below we use rpre(t) and rpost(t) to denote the activity level of the pre- and postsynaptic assemblies at time t. Each presynaptic spike (at time tpre) was assumed to give rise to a postsynaptic "plasticity trace," alpha , analogous to the amount of NMDA-receptor binding. The shape of the function alpha  determines the time window for neural plasticity (see APPENDIX). This plasticity trace is multiplied by postsynaptic activity to yield a "plasticity signal," alpha (t - tpre)rpost(t), analogous to postsynaptic calcium concentration. Input from the AFP is assumed to give a reinforcement signal R that modulates the plasticity signal in all RA assemblies. (R is set to a constant value of 1 in HVc.) Plasticity signals above a threshold value psi  increase synaptic strength (LTP); signals below psi  give rise to long-term depression (LTD; Cummings et al. 1996; Hansel et al. 1997; Lisman 1989). psi  is a "sliding threshold" that depends on the average amount of activity in the postsynaptic cell (Abraham and Bear 1996; Bienenstock et al. 1982; Sejnowski 1977). Thus, the change in synaptic strength resulting from postsynaptic activity at time t and presynaptic activity at time tpre is proportional to the following quantity (see APPENDIX)
(reinforcement × plasticity trace × post − threshold) × pre

=[<IT>R</IT>&agr;(<IT>t − t</IT><SUP>pre</SUP>)<IT>r</IT><SUP>post</SUP>(<IT>t</IT>) − &psgr;]<IT>r</IT><SUP>pre</SUP>(<IT>t</IT><SUP>pre</SUP>)

Local circuit mechanisms

Activity within each neural population was based on very simple local circuitry. The output of each excitatory cell assembly was computed as a linear function of its input after subtracting a threshold value. RA included intrinsic excitatory connections that were used to store syllable representations in a manner analogous to other associative memory models or so-called attractor networks (Table 3, number 4A; Amit 1989). To minimize computation, only RA includes such connections. Each population also includes a single inhibitory assembly that is connected to all assemblies within the corresponding population (Table 3, number 4B). Inhibition is of two basic types. HVc_RA, HVc_AFP, and the AFP use "feedforward inhibition," in which inhibitory activity is equal to the average afferent input received by the population, minus a threshold. RA uses "feedback inhibition," in which inhibitory activity is driven by the average activity within the local population. Feedback inhibition allows tighter control of the activity in the local network but is computationally more expensive. Within a given population, all excitatory assemblies receive a similar level of inhibition. Since the only assemblies that get strongly activated are those that receive enough input to overcome this inhibition, inhibition mediates a form of "competition" among excitatory assemblies.

Decorrelating initial connectivity

Our simulations were designed to determine whether associational learning, guided by template matching signals from the AFP, could organize initially unstructured connections in the motor pathway to produce the stored tutor song. The dominant computational problem encountered in building the model was the positive feedback inherent in associational learning rules: correlated activity increases synaptic strength, which tends to further strengthen the correlation. Left unchecked, this learning will continually amplify initially weak associations, even spurious associations resulting from chance events. One of the most important factors contributing to spurious correlations was the limited size of our network simulations. The strength of random correlations is highly dependent on network size, roughly decreasing with the square root of the number of network units. Because of the computational expense of simulating intrinsic feedback dynamics within RA, we limited the number of RA assemblies to 40 (Fig. 6B). Independently choosing each connection within such a network will result in correlations that are an order of magnitude stronger than those expected in a more realistically sized network containing 4000 assemblies. The calculation of HVc_RA activity was computationally less expensive, and a larger number (5 × 40 = 200) of HVc_RA assemblies was included. While reducing correlations to some degree, these numbers still do not approach physiologically realistic numbers. We note here that the greater storage capacity of larger networks resulting from a reduction in random correlations (Amit 1989) may relate to reports of a relationship between the size of various song nuclei and the number of song syllables learned (reviewed in Brenowitz 1997; Nordeen and Nordeen 1997).

To address the problem of correlated connections, we chose initial patterns of connectivity specifically aimed at minimizing these correlations (Table 3, number 5). Initial connection strengths were chosen according to two basic strategies. For HVc_RA right-arrow RA and HVc_RA right-arrow HVc_AFP connections, we used a "single-projection" strategy, in which each presynaptic assembly connects with a single postsynaptic assembly. This ensures that the levels of input received by any two assemblies in the postsynaptic population are independent. However, the single-projection strategy does not prevent correlations arising from polysynaptic pathways within the recurrent circuitry in RA. For these intrinsic RA connections, we used a "uniform" strategy, in which each presynaptic assembly connects with all postsynaptic assemblies with equal strength. This ensures that all correlations result from a global signal shared by all assemblies. While such a signal will increase overall synaptic strengths, it will not lead to spurious patterns of correlations within the network. To ensure that our model was robust to some degree of correlation, zero-mean Gaussian perturbations were added to all plastic connections during the initialization process. The standard deviation of the perturbations was set to 10% of the strength of the nonzero synapses. After the perturbation, negative strengths were set to zero. Noise was not added to the three projections that did not undergo plasticity (the premotor drive, auditory feedback, and template storage connections from HVc_AFP to the AFP).

Homeostatic mechanisms

In addition to decorrelating initial connectivity patterns, we include two sources of homeostatic negative feedback to counteract the positive feedback inherent in associational learning (Table 3, number 6). The first is a normalization of synaptic strength: after applying associational change for each simulated syllable, the strengths of all synapses onto (or from) a given assembly are multiplied by a single number so that the total amount of postsynaptic (or presynaptic) strength for any one assembly remains nearly constant (see APPENDIX). This kind of multiplicative normalization controls total synaptic strength without altering the relative magnitude of the individual connections. Presynaptic normalization was applied before postsynaptic normalization (see APPENDIX). The strengths to which synaptic connections were normalized were chosen by hand so that 1) intrinsic RA circuitry contributed a large component (50%) of the input to RA assemblies, and 2) auditory feedback contributed a modest portion (20%) of the input to HVc_AFP. The mechanisms underlying homeostasis are just now beginning to receive focused attention. Multiplicative normalization of synaptic strength has been shown by Turrigiano et al. (1998) and was hypothesized to depend on mean levels of activity. An approximation to our postsynaptic normalization rule follows if mean levels of activity (calculated on long time scales) are related to total excitatory strength synapsing on that neuron. Mechanisms such as conservation of transmitter released and/or retrograde trophic factors could underlie presynaptic normalization.

The second source of negative feedback is inhibitory plasticity that is homeostatic, i.e., if an excitatory assembly becomes too active, the inhibitory connection onto that assembly is strengthened (Rutherford et al. 1997; see APPENDIX). We note that controlling feedback in the model was not always straightforward, since oscillatory instability results if negative and positive feedback mechanisms operate on similar time scales.

Quantifying learning time course

To quantify the learning time course, we divided the model output into 250 syllable epochs and computed the matrix Mact of co-fluctuations in activity between each pair of RA assemblies over each epoch. During the mth epoch
<IT>M</IT><SUP><IT>act</IT></SUP><SUB><IT>ij</IT></SUB><IT>=</IT><FR><NU><IT>1</IT></NU><DE><IT>250</IT></DE></FR> <LIM><OP>∑</OP><LL><IT>n</IT><IT>=1+250</IT>(<IT>m</IT><IT>−1</IT>)</LL><UL><IT>250</IT><IT>m</IT></UL></LIM> [<IT>r<SUB>i</SUB></IT>(<IT>n</IT>)<IT>−</IT><IT><A><AC>r</AC><AC>&cjs1171;</AC></A></IT>(<IT>n</IT>)][<IT>r<SUB>j</SUB></IT>(<IT>n</IT>)<IT>−</IT><IT><A><AC>r</AC><AC>&cjs1171;</AC></A></IT>(<IT>n</IT>)]
where ri(n) is the activity level in the ith RA assembly, and <A><AC>r</AC><AC>&cjs1171;</AC></A>(n) is the average activity across assemblies during syllable n. We compared Mact to an ideal syllable matrix, Msyl, characterizing the groupings of assemblies that characterize syllables in the tutor song. Following Fig. 6, the 40 vocal features were grouped into five syllables, indexed as follows: syllable A, 1-8; B, 9-16; C, 17-24; D, 25-32; E, 33-40. Mijsyl = 4, if i and j belong to the same syllable; Mijsyl = -1, if i and j belong to different syllables. This is the matrix of co-fluctuations that would be obtained from tutor song depicted in Fig. 6A, if each assembly had an average activity level of 1. Comparison between matrices was done by taking the correlation coefficient (CC) between the entries in two matrices. The CC between any two N × M dimensional matrices A and B was defined as follows. First the mean value is subtracted from each element in the matrix: Âij = Aij - (1/NM) Sigma  Aij; &Bcirc;ij = Bij - (1/NM) Sigma  Bij. Then
CC(<IT>A</IT><IT>, </IT><IT>B</IT>)<IT>=</IT><FR><NU><IT>&Sgr;</IT><SUB><IT>ij</IT></SUB><IT> <A><AC>A</AC><AC>ˆ</AC></A><SUB>ij</SUB><A><AC>B</AC><AC>ˆ</AC></A><SUB>ij</SUB></IT></NU><DE>[(<IT>&Sgr;</IT><SUB><IT>ij</IT></SUB><IT> <A><AC>A</AC><AC>ˆ</AC></A></IT><SUP><IT>2</IT></SUP><SUB><IT>ij</IT></SUB>)(<IT>&Sgr;</IT><SUB><IT>ij</IT></SUB><IT> <A><AC>B</AC><AC>ˆ</AC></A></IT><SUP><IT>2</IT></SUP><SUB><IT>ij</IT></SUB>)]<SUP><IT>1/2</IT></SUP></DE></FR>
Diagonal entries were excluded, i.e., all summations were taken over indices where not equal  j.

The CC was also used to monitor the connectivity appropriate for the efference copy mapping and for the intrinsic RA connectivity that underlies syllable encoding. For the efference copy, we measured the CC between the matrix of motor right-arrow sensory connection strengths from HVc_RA right-arrow HVc_AFP and the HVc_RA right-arrow RA connections in the motor pathway. To quantify the development of syllable-based connectivity, we calculated the CC between Msyl and the matrix of intrinsic RA connection strengths, again excluding diagonal entries.


    RESULTS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
APPENDIX
REFERENCES

In presenting the results of our computational model, we focus on the quantitative data produced by a single representative simulation. This allows a step-by-step illustration of song development and demonstrates the mutual consistency of the functional hypotheses described above. After presenting these results, we show how the model reacts to changes in important parameters.

Problem 1: Auditory feedback delay

The first step in our proposed solution to the problem of feedback delay is the learning of an efference copy mapping. At the beginning of each simulation, connections in the motor pathway are unstructured and random HVc_RA activity leads to random patterns of RA output (Fig. 6C). Efference copy learning results from associations between the random patterns of HVc_RA activity and HVc_AFP activity induced by auditory reafference (Fig. 4A). We examined the development of an efference copy map in two ways (Fig. 8). First, an accurate map should cause efference copy activity to match the auditory reafference. Figure 8A shows the pattern of vocal output (left column of each pair, marked V) and HVc_AFP efference copy activity (right column of each pair, marked EC) for five syllables spanning the period of initial efference copy learning. Note that, because of our simple encoding scheme, vocal output, RA motor activity, and auditory feedback have equivalent representations (see METHODS). Initially, both patterns of activity are highly distributed and unrelated (Fig. 8A, left pairs). As efference copy learning progresses, the activity remains distributed (significant syllable learning has not taken place), but the efference copy activity is highly correlated with the vocal output (Fig. 8A, right pair). Note that a perfect match is not required (Jordan and Rumelhart 1992); the efference copy estimate only has to be accurate enough so that, on average, the AFP will reinforce the proper correlations in RA (see Fig. 9, bottom). An accurate efference copy mapping can also be measured by determining the similarity between the mapping of HVc_RA onto motor features in RA and the efference copy mapping of HVc_RA onto sensory features in HVc_AFP. Figure 8B shows the correlation coefficient (see METHODS) between the connection strengths from HVc_RA right-arrow RA and those from HVc_RA right-arrow HVc_AFP. By syllable 500, efference copy correlation has reached 0.81, 84% of the maximum value (0.96) reached during the simulation.



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 8. Efference copy learning. A: vocal output (V, equivalent to RA activity) and efference copy activity (EC) for syllables 1, 251, 501, 1001, and 2001. Efference copy activity is determined as the average of HVc_AFP activity over the early and middle portions of each syllable (Fig. 7). Initially, vocal output and efference copy activity are uncorrelated. By syllable 2000, activity is still not organized according to tutor syllable (syllable learning has not taken place), but efference copy activity and vocal output are similar. B: development of efference copy connectivity. Correlation coefficient between the matrix of HVc_RA projections onto motor features in RA and onto sensory features in HVc_AFP (see METHODS, Quantifying learning time course, for definition).



View larger version (37K):
[in this window]
[in a new window]
 
Fig. 9. Reinforcement signal. A: vocal output (V) and efference copy (EC) for syllables 11,000-11,010. B: reinforcement signal calculated from efference copy shown in A. EC activity concentrated within assemblies encoding a single tutor syllable led to large reinforcement signals (syllable 11,006, D; syllable 11,009, B). Activity shared by two tutor syllables led to smaller reinforcement (11,000, D/C; 11,003, B/A). During syllable 11,007, the model produced a reasonably good rendition of tutor syllable D, but minimal reinforcement was given because the efference copy prediction was inaccurate.

Problem 2: Syllable learning in RA

CALCULATION OF THE REINFORCEMENT SIGNAL. The AFP guides syllable learning by transmitting a nonspecific reinforcement signal that uniformly modulates plasticity in all RA assemblies. To calculate the match to the template, each AFP assembly sums the input from HVc_AFP assemblies encoding a distinct tutor syllable (Fig. 6B). The competition mediated by mutual inhibition in the AFP ensures that significant activation of the AFP occurs only if HVc_AFP activity is mostly confined to assemblies corresponding to one (or a few) tutor syllables. The final reinforcement value was obtained by thresholding each AFP assembly's output and summing these thresholded outputs (see APPENDIX for details).

The outcome of this procedure is shown in Fig. 9. Figure 9A shows the vocal output (marked V) and efference copy (marked EC) for 11 consecutive syllables sung during the period of syllable learning. The black bars show the reinforcement signal. This reinforcement is obtained from evaluating the HVc_AFP efference copy activity on the right of each column but is used to modulate associational learning for the RA motor activity generating the vocal output shown on the left. Large reinforcement is obtained when efference copy activity is concentrated within assemblies encoding a single tutor syllable (e.g., syllable 11,006 and 11,009). Smaller reinforcement signals are computed when HVc_AFP activity is distributed among assemblies encoding two syllables (e.g., syllables 11,000, and 11,003). Note that the 11,007th syllable produced by the model was dominated by the motor assemblies encoding D, but the AFP signaled minimal reinforcement because of an inaccurate efference copy representation.

SYNAPTIC REORGANIZATION. Reinforcement-guided syllable learning is shown in Fig. 10. Initially, RA right-arrow RA connection strengths were set to be nearly equal (A, middle), minimizing the presence of randomly correlated connections that would have to be "unlearned" (see METHODS). Note that self-connections are not included in our model (diagonal entries are zero), since strong self-correlations would tend to dominate associational learning. Unstructured input from HVc_RA (A, left) resulted in random patterns of RA activity (A, right). Because AFP-mediated reinforcement 1) is greatest when assemblies corresponding to a common tutor syllable are co-active, and 2) results in large increases in synaptic strength onto active RA assemblies, RA assemblies began to develop strong connections with other RA assemblies encoding the same syllable (B, middle). Reinforcement also guided learning within the projection from HVc right-arrow RA, causing RA assemblies encoding the same tutor syllable to receive input from similar sets of HVc_RA assemblies and thus to receive correlated patterns of HVc input (B, left). Both the recurrent circuitry and HVc_RA input led to RA activity partially matched to the tutor syllables (B, right). After learning was complete, HVc_RA input was a mixture of tutor syllable representations (C, left). Strong intrinsic circuitry (C, middle) amplifies the activity within assemblies encoding the most strongly driven syllable, and inhibitory competition suppresses other responses (see METHODS and APPENDIX). As a result, the model produced motor output perfectly matched to the syllables in the tutor song (C, right). Because HVc_RA continues to be driven by the random premotor drive, syllables are produced in a random sequence. Sequence learning will be addressed in our companion paper (Troyer and Doupe 2000).



View larger version (76K):
[in this window]
[in a new window]
 
Fig. 10. Syllable learning. Left column: strength of synaptic input coming from HVc_RA for 10 consecutive syllables. Middle column: intrinsic connections within RA. The darkness of each square represents the connection strength from one presynaptic RA assembly (horizontal axis) to one postsynaptic RA assembly (vertical axis). Self-connections (diagonal entries) are set to zero to prevent domination of self-correlations. Right column: RA activity level for same syllables shown on left. A: at the start of the simulation, initial RA connectivity was nearly uniform, and HVc input and RA output were random. B: as development proceeded, assemblies encoding a single tutor syllable began to have similar patterns of connectivity. Because assemblies encoding the same tutor syllable are arranged next to each other, the pattern of RA right-arrow RA connections began to show "blocks" of strong connections along the diagonal (middle). These assemblies also began to receive similar patterns of input from HVc (left). C: after learning, HVc input was a random mixture of syllable representations, and RA assemblies were connected only with other RA assemblies encoding the same tutor syllable. This pattern of intrinsic RA connectivity, combined with global inhibition (see METHODS), resulted in the production of patterns of RA activity matched to the tutor template (right). Learning to produce these syllables in the proper sequence is addressed in the following companion paper (Troyer and Doupe 2000).

TIME COURSE OF LEARNING. The developmental time course of song learning in our model is shown in Fig. 11. To quantify convergence toward the tutor song, we first computed an "ideal" syllable covariance matrix, Msyl. This is a 40 × 40 matrix containing the covariance in the level of activity between each pairing of the 40 RA assemblies (sampled over a number of consecutive syllables), where it is assumed that the model is producing a perfect rendition of the tutor song. Msyl has strong positive entries for pairs of assemblies belonging to the same tutor syllable and negative entries for pairs belonging to different syllables. We then divided the model output into 250 syllable epochs and computed the matrix of co-fluctuations in activity between each pair of RA assemblies over each epoch. Convergence toward the tutor song was quantified by computing the correlation coefficient between the entries in Msyl and those in the co-fluctuation matrix (Fig. 11, solid line). For detailed definitions of these calculations, see METHODS, Quantifying learning time course. We also computed the correlation coefficient between the pattern of RA connectivity and Msyl (Fig. 11, dashed line). The development of intrinsic RA connectivity is mirrored by the appearance of the corresponding correlations in RA activity.



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 11. Summary of learning time course. Solid line: correlation coefficient between entries of covariance matrix calculated from 250 syllable epochs of model output and the "ideal" syllable covariance matrix, Msyl, corresponding to the tutor song (see METHODS, Quantifying learning time course, for definition). Dashed line: correlation coefficient between pattern of RA connectivity and Msyl. Dotted line: time course of efference copy learning (Fig. 8B). Syllable learning begins soon after the development of an accurate efference copy at around syllable 1500 and is largely completed by syllable 10,000.

Syllable learning is complete by the time the model has produced 20,000 syllables. Since each syllable is assumed to be 115-ms long, this represents 2,300 s or <40 min of continuous singing. Although quantitative data are not available, this is likely to be up to several orders of magnitude less than the quantity of song produced by young zebra finches during the period of sensorimotor learning. Of course, the model is solving a highly simplified task.

Problem 3: Separating motor and sensory signals in HVc

HVc_AFP receives two functionally distinct sets of inputs: efference copy inputs from HVc_RA and auditory feedback (Fig. 7). The unmixing of signals is addressed in our model by 1) using weak feedback, and 2) including "adaptation" in HVc_AFP (Fig. 5). The action of the HVc_AFP adaptation mechanism is shown in Fig. 12. Excitation within HVc_AFP assemblies recruits a negative current that decays exponentially (Fig. 12A, bottom). When the efference copy input from HVc_RA correctly predicts the pattern of audit