JN Email Content Delivery
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J Neurophysiol 84: 1224-1239, 2000;
0022-3077/00 $5.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (30)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Troyer, T. W.
Right arrow Articles by Doupe, A. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Troyer, T. W.
Right arrow Articles by Doupe, A. J.

The Journal of Neurophysiology Vol. 84 No. 3 September 2000, pp. 1224-1239
Copyright ©2000 by the American Physiological Society

An Associational Model of Birdsong Sensorimotor Learning II. Temporal Hierarchies and the Learning of Song Sequence

Todd W. Troyer1,3 and Allison J. Doupe1,2,3,4

 1Department of Psychiatry,  2Department of Physiology,  3W. M. Keck Center for Integrative Neuroscience, and  4Sloan Center for Theoretical Neurobiology at UCSF, University of California, San Francisco, California 94143-0444


    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
APPENDIX
REFERENCES

Troyer, Todd W. and Allison J. Doupe. An Associational Model of Birdsong Sensorimotor Learning II. Temporal Hierarchies and the Learning of Song Sequence. J. Neurophysiol. 84: 1224-1239, 2000. Understanding the neural mechanisms underlying serially ordered behavior is a fundamental problem in motor learning. We present a computational model of sensorimotor learning in songbirds that is constrained by the known functional anatomy of the song circuit. The model subsumes our companion model for learning individual song "syllables" and relies on the same underlying assumptions. The extended model addresses the problem of learning to produce syllables in the correct sequence. Central to our approach is the hypothesis that the Anterior Forebrain Pathway (AFP) produces signals related to the comparison of the bird's own vocalizations and a previously memorized "template." This "AFP comparison hypothesis" is challenged by the lack of a direct projection from the AFP to the song nucleus HVc, a candidate site for the generator of song sequence. We propose that sequence generation in HVc results from an associative chain of motor and sensory representations (motor right-arrow sensory right-arrow next motor ... ) encoded within the two known populations of HVc projection neurons. The sensory link in the chain is provided, not by auditory feedback, but by a centrally generated efference copy that serves as an internal prediction of this feedback. The use of efference copy as a substitute for the sensory signal explains the ability of adult birds to produce normal song immediately after deafening. We also predict that the AFP guides sequence learning by biasing motor activity in nucleus RA, the premotor nucleus downstream of HVc. Associative learning then remaps the output of the HVc sequence generator. By altering the motor pathway in RA, the AFP alters the correspondence between HVc motor commands and the resulting sensory feedback and triggers renewed efference copy learning in HVc. Thus, auditory feedback-mediated efference copy learning provides an indirect pathway by which the AFP can influence sequence generation in HVc. The model makes predictions concerning the role played by specific neural populations during the sensorimotor phase of song learning and demonstrates how simple rules of associational plasticity can contribute to the learning of a complex behavior on multiple time scales.


    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
APPENDIX
REFERENCES

Like many complex behaviors, birdsong is arranged in a temporal hierarchy. In zebra finches, song consists of a few short introductory notes, followed by several repetitions of a stereotyped sequence of vocal gestures, or "syllables," separated by brief periods of silence (Sossinka and Böhner 1980). Song is learned in two phases. First, birds listen to and memorize a tutor song, or "template" (Konishi 1965; Marler 1964). Later, during sensorimotor learning, birds use auditory feedback from their own vocalizations to gradually match their vocal output to the template. In the companion paper (Troyer and Doupe 2000), we focused on one level of the hierarchy for song and showed how simple associational (Hebbian) learning rules could be used to learn the motor representations for individual tutor syllables. The syllable learning model addresses the important problem of feedback delay and demonstrates that associational plasticity naturally leads to the learning of an efference copy, or internal prediction, of the auditory feedback. This internal prediction can then be compared with the memorized tutor song to guide sensorimotor learning.

In this paper, we address a second fundamental problem in motor learning, the question of serial order in behavior (Lashley 1951), by extending our syllable learning model to account for the learning of syllable sequence. As in our companion paper, we use simple rules of associational plasticity and assume that the template comparison signals that guide learning are provided by the Anterior Forebrain Pathway (AFP), a circuit that passes through avian basal ganglia, thalamic, and cortex-like nuclei before projecting back onto the motor pathway (see Troyer and Doupe 2000; Fig. 1). We also assume a functional segregation between the two known populations of projection neurons in song nucleus HVc (Nordeen and Nordeen 1988; HVc used as proper name, Margoliash et al. 1994), with AFP-projecting HVc neurons (HVc_AFP) receiving auditory feedback and encoding signals in sensory coordinates, and HVc neurons projecting to the robust nucleus of the archistriatum (RA; HVc_RA) more closely tied to a motor code (Troyer and Doupe 2000). The main biological constraint addressed in this paper is the hierarchical organization of the motor pathway (Fig. 1): the detailed motor programs for individual syllables are believed to be contained in nucleus RA (Vu et al. 1994; Yu and Margoliash 1996), whereas the central pattern generator for song sequence is likely to be found upstream of RA, perhaps within the song nucleus HVc (Vu et al. 1994). Our sequence learning model addresses two key questions left unanswered by current experimental data: what is the mechanism for sequence generation in HVc, and how can signals from the AFP guide sequence learning given that there are no known connections from the AFP to HVc?



View larger version (37K):
[in this window]
[in a new window]
 
Fig. 1. Encoding of motor hierarchy within the song circuit. The fine temporal structure within individual song syllables is believed to be encoded in RA (Yu and Margoliash 1996). The pattern generator for song sequence is believed to be located upstream of RA, possibly within HVc (Vu et al. 1994). We have shown the two populations of HVc projection neurons (Nordeen and Nordeen 1988) in separate ovals, although these are intermixed and interconnected in HVc. We assume auditory feedback enters the song system via inputs to HVc_AFP.

We propose that sequence generation results from a reciprocal chaining of motor and sensory representations [motor (HVc_RA) right-arrow sensory (HVc_AFP) right-arrow next motor (HVc_RA) right-arrow next sensory (HVc_AFP) ... ] between the two populations of HVc projection neurons. Our model differs from classic "associative chaining" models (James 1983) in that the "sensory" component in this chain is actually an efference copy, a motor signal that serves as a prediction of the expected sensory feedback (Sperry 1950; von Holst and Mittelstaedt 1980). We also propose that AFP-guided teaching signals act to remap the connections from HVc to RA, so that the output of the HVc pattern generator maps onto the sequence of motor features (encoded in RA) that matches the memorized tutor song (cf. Doya and Sejnowski 1998). However, simply remapping HVc outputs cannot explain AFP-guided learning within HVc. In our model, auditory feedback-driven efference copy learning provides the crucial link between the AFP and HVc. By altering the HVc outflow tract, the AFP alters the association between HVc_RA motor activity and the auditory feedback received by HVc_AFP. The resulting efference copy learning then changes the motor-sensory interaction underlying sequence generation in HVc.

Our model demonstrates that associational learning, distributed throughout the motor pathway, is sufficient for learning both individual syllables and their proper sequence. The model provides a specific hypothesis for how basal ganglia-forebrain loops could contribute to learning a sequential behavior and highlights key computational problems imposed by the functional anatomy of the song circuit. More generally, the model provides a framework that relates the neural mechanisms underlying song learning to fundamental problems in motor learning and speech production.

Model and approach

In this paper, we extend our previous model for learning individual syllables (Troyer and Doupe 2000) to address the learning of syllable sequence. Our sequence learning model subsumes our syllable model, accomplishing syllable learning as well as the learning of syllable sequence. The structure of this paper mirrors that of the preceding companion paper (Troyer and Doupe 2000) and relies on the same underlying biological assumptions. We present our results in the form of two closely related models: a "conceptual model" containing a self-consistent set of functional hypotheses, and a "computational model" that incorporates these hypotheses into a working computer algorithm. In this section, we describe the functional problems addressed by our sequence learning model and outline the key elements of our proposed solutions. Then, we present our conceptual model, which describes our functional hypotheses in greater detail. Quantitative results from our computational model are presented in the RESULTS section. Because our model is relatively abstract at the level of local circuits, implementation of these hypotheses was governed chiefly by considerations of computational simplicity. Related issues are described in the METHODS but are not crucial for understanding the main functional implications of the model. The details of our computer algorithm are confined to an APPENDIX.

Problems addressed

Our model explores how song learning can result from associational learning, guided by template comparison signals transmitted by the AFP. We do not address learning the detailed temporal structure within each syllable, nor learning the length of syllables and intersyllable gaps. Timing of song syllables is provided by a rhythmically clocked premotor drive arriving in HVc_RA (Troyer and Doupe 2000). While the timing of this drive is fixed, its pattern is completely random; the magnitude of each component of the premotor input is generated independently for each vocalization produced by the model. The model's task is to take this unstructured premotor timing signal and convert it to a sequence of syllables matched to the tutor template.

The learning of motor representations for individual song syllables was addressed in the preceding companion paper (Troyer and Doupe 2000). This model contained three key functional elements. By associating premotor commands in HVc_RA with auditory feedback arriving in HVc_AFP, a motor right-arrow sensory efference copy mapping develops between the two populations of HVc projection neurons (Fig. 2, marked 1). After this mapping develops, HVc_AFP activity driven by a given HVc_RA motor command encodes a sensory prediction of the vocal output resulting from that command. This prediction is then compared with the template in the AFP, resulting in a global reinforcement signal that modulates plasticity in all RA neurons (Fig. 2, marked 2). This reinforcement learning leads to a pattern of connectivity in RA in which neurons encoding the same tutor syllable become strongly connected (Fig. 2, marked 3). As a result, RA has a strong tendency to produce coherent patterns of motor activity matched to the syllables in the tutor template.



View larger version (38K):
[in this window]
[in a new window]
 
Fig. 2. Network architecture. White circles: functional connections addressed in our syllable learning model (Troyer and Doupe 2000). Black circles: functional connections important for sequence learning (Troyer and Doupe 2000). The association of HVc_RA premotor activity and auditory feedback input to leads to a motor right-arrow sensory efference copy mapping between these neural populations (1). Reinforcement signals from the Anterior Forebrain Pathway (AFP) (2) are used to reorganize intrinsic RA connections so that they encode the motor representations for individual tutor syllables (3). Sequence generation results from a reciprocal interaction involving the sensory right-arrow motor efference copy mapping followed by a slow "context" signal that flows from HVc_AFP right-arrow HVc_RA (4). The AFP uses template information to bias RA toward the appropriate syllable transitions (5). This alters associations in the motor pathway so that the connections from HVc_RA right-arrow RA map the output of the HVc pattern generator onto the correct syllable representations in RA (6). Alterations in the motor pathway lead to renewed efference copy learning (7). Black arrows: plastic connections. Thick arrows: new connections added to address sequence learning. Gray arrows: connections not subject to associational plasticity.

Given our adoption of the AFP comparison hypothesis, the most difficult problem regarding sequence learning is the following: how can the AFP guide learning given that 1) the only known output from the AFP projects to RA, and 2) the site of sequence generation is likely to be upstream of RA? Our solution involves the concerted action of multiple associational mechanisms acting at different levels of the motor hierarchy. For ease of presentation, we will break this problem into three smaller problems, described below (see Conceptual model). However, our choice of solution to each individual problem is affected by the other two, as well as constraints imposed by our solution to the problem of syllable learning. The key to our model is the concept of efference copy, which serves to link all model components into a coherent hypothesis regarding the multiple sensory-motor interactions involved in song learning.

The first problem we address is the problem of sequence generation, i.e., what is the nature of the central pattern generator for song? We propose that sequence generation results from a reciprocal interaction between the two populations of HVc projection neurons (Table 1, number 1). The solution naturally incorporates the mechanism of efference copy, which contributes one half of this interaction by providing a motor right-arrow sensory mapping from HVc_RA right-arrow HVc_AFP. The other half of the interaction depends on connections from HVc_AFP right-arrow HVc_RA. These are hypothesized to provide slow signals carrying information from one syllable to the next (Fig. 2, marked 4). We call such signals "context" signals. Thus, sequences are generated as a chain of mappings from motor right-arrow sensory right-arrow next motor right-arrow next sensory, etc. This hypothesis borrows from classical chaining ideas (James 1983), as well as more recent computational models (Kleinfeld and Sompolinsky 1988) of sequence generation.


                              
View this table:
[in this window]
[in a new window]
 
Table 1. Functional hypotheses for sequence learning

The second problem we address is the problem of how AFP signals guide sequence learning at the level of RA. The most straightforward method of directing associational learning toward a desired goal is to bias the pattern of neural activity toward the desired state. Associational plasticity then strengthens the connections consistent with this pattern. In our model, we assume that the AFP generates an expectation of the next syllable in the tutor sequence and uses this expectation to bias RA activity (Table 1, number 2; Fig. 2, marked 5). Associational plasticity then changes the pattern of connections between HVc and RA so that syllables are produced in the proper sequence (Fig. 2, marked 6). Note that this solution gives rise to an additional problem to be solved before the AFP can bias RA activity in the proper direction: template information is stored in sensory coordinates, but the required bias must be in motor coordinates. We propose that a sensory right-arrow motor mapping is learned between the AFP and RA soon after the initial period of efference copy learning (Table 1, number 3; see Conceptual model).

The third problem we address is the problem of sequence learning at the level of HVc. While the mechanism outlined above is sufficient for a rudimentary form of sequence learning, it fails as a complete model. In particular, it fails to account for any learned changes in the number or sequence of premotor commands formed upstream of RA. In our model, the efference copy provides the key link between learning at the level of RA and learning upstream of RA, in HVc. In particular, by altering connections between HVc and RA, the AFP changes the pattern of vocal output and hence auditory reafference. This in turn induces new efference copy learning in HVc (Table 1, number 4; Fig. 2, marked 7) via the same mechanism described in our syllable learning model (Troyer and Doupe 2000). Since efference copy mapping plays a key role in the HVc pattern generator, the new efference copy learning alters the sequence of HVc outputs (see Conceptual model). In addition to providing a specific mechanism for how the AFP affects sequence generation in HVc, the need for ongoing efference copy learning is consistent with experiments demonstrating that auditory feedback is required throughout development (Price 1979).

In addressing the problem of sequence learning, we have added two new sets of connections to our model for syllable learning (Fig. 2). The connections from HVc_AFP right-arrow HVc_RA are necessary for sequence generation. Without the context signals carried by these connections, activity within HVc_RA would not be affected by activity related to the previous syllable and the sequence of HVc outputs would be random (Troyer and Doupe 2000). Patterned connections from the AFP right-arrow RA are necessary for sequence learning in our model. Without these connections, information stored in the AFP related to the tutor sequence cannot be used to guide learning in the motor pathway.

Conceptual model

PROBLEM 1: SEQUENCE GENERATION. We propose that sequences of song syllables are generated by a reciprocal interaction between motor (HVc_RA) and sensory/efference copy (HVc_AFP) activity within HVc (Table 1, number 1): motor right-arrow sensory prediction right-arrow next motor right-arrow next sensory prediction right-arrow  ... (Fig. 3A). The motor right-arrow sensory component of this interaction is subserved by the efference copy mapping between HVc_RA and HVc_AFP. This mapping is learned early in development by associating HVc_RA motor commands with auditory feedback arriving back in HVc_AFP, as described in our model for syllable learning (Troyer and Doupe 2000). Figure 3B shows how these mappings result in the reproduction of the tutor song after learning is complete, using the transition from syllable A to syllable B as an example. Let SenA denote the sensory representation for syllable A in HVc_AFP. This representation is elicited by the efference copy mapping during production of A. Via the connections from HVc_AFP right-arrow HVc_RA, SenA elicits a context signal CtxtA that drives activity in HVc_RA during the syllable following syllable A. CtxtA maps onto the motor representation MotB in RA, and the model produces syllable B after syllable A. This is the sensory prediction right-arrow next motor component of the interaction. With an accurate efference copy mapping, CtxtA also elicits an efference copy representation SenB in HVc_AFP. This motor right-arrow sensory prediction component of the interaction completes the cycle. Thus, correct sequence learning in our model depends on learning the chain of mappings SenA right-arrow (CtxtA right-arrow MotB) right-arrow SenB right-arrow  ... . Note that our implementation of this functional circuit is highly simplified: HVc_RA right-arrow HVc_AFP connections transmit only fast motor right-arrow sensory (efference copy) signals, whereas HVc_AFP right-arrow HVc_RA connections transmit only slow sensory right-arrow next motor (context) signals. More realistic circuit models of HVc will be required to explore possible local circuit mechanisms subserving this reciprocal flow of activity.



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 3. Sequence generation. A: sequence generation results from a reciprocal interaction between representations in HVc_RA (motor) and HVc_AFP (sensory). The final motor output of the model depends on the mapping from HVc_RA right-arrow RA. B: schematic of the mappings necessary for correct reproduction of the tutor sequence  ... A right-arrow B right-arrow C ... . Suppose an efference copy, SenA, is represented in HVc_AFP (in sensory coordinates). This is followed by the HVc_RA context representation CtxtA, which is mapped onto MotB in RA. Syllable B follows syllable A. CtxtA also elicits SenB, the efference copy corresponding to MotB. SenB right-arrow CtxtB right-arrow MotC leads to the production of syllable C, etc.

PROBLEM 2: SEQUENCE LEARNING IN RA. In our model, the AFP uses template information to generate "sequence teaching" signals that bias RA activity toward the proper tutor sequence (Table 1, number 2). The details of how these signals reorganize the motor pathway to produce correct sequence transitions are illustrated in Fig. 4, using the transition from syllable A to syllable B as an example. In our model, the efference copy representation, SenA, that is registered in HVc_AFP during the production of syllable A, generates two distinct signals during the vocalization that follows syllable A. First, in HVc, due to the slow connections from HVc_AFP right-arrow HVc_RA, SenA results in a context signal, CtxtA, that is input to HVc_RA. Second, the AFP receives the efference copy SenA from HVc_AFP and generates the sequence teaching signal for syllable B, after an appropriate delay. This signal is input to RA and biases RA activity toward the next motor representation in the tutor sequence, MotB. Since both of these signals exert their effects with a one syllable delay, during the syllable following A, neurons in HVc_RA that are part of the context representation CtxtA tend to be co-active with RA neurons comprising the motor representation MotB. Associational learning then strengthens the connections between these sets of neurons (Fig. 4, white arrow). In this way, the context representation CtxtA gets mapped onto MotB, and the model learns the transition SenA right-arrow CtxtA right-arrow MotB.



View larger version (32K):
[in this window]
[in a new window]
 
Fig. 4. AFP-guided sequence learning. Schematic showing the learning of the transition from syllable A to B. The efference copy for A, SenA, results in a context signal, CtxtA, that arrives in HVc_RA after a delay. SenA is also passed on to the AFP. Using previously stored template information, the AFP generates, after an appropriate delay, the sensory representation for the next syllable in the tutor song, SenB. SenB biases RA toward the motor pattern MotB. Associational learning (white arrow) between the context signal CtxtA in HVc_RA and MotB in RA ensures that future productions of syllable A will evoke the composite mapping SenA right-arrow CtxtA right-arrow MotB, resulting in the transition from A to B.

SENSORY right-arrow MOTOR MAPPING FROM THE AFP right-arrow RA. If the sequence teaching signal for syllable B, which we assume to be encoded in sensory coordinates in the AFP, is to bias RA motor activity toward syllable B, a sensory right-arrow motor mapping between the AFP and RA is required (Table 1, number 3). In our sequence learning model, the required map develops soon after the initial period of efference copy learning, and before syllable learning is complete. With an accurate efference copy, HVc_RA excites a sensory representation in the output neurons of the AFP (via HVc_AFP) that corresponds to the motor activity in RA. For example, if HVc_RA drives motor activity in RA that is relatively well matched to tutor syllable A, it will also drive an efference copy within HVc_AFP that leads to excitation within the AFP output neurons encoding tutor syllable A (Fig. 5). Associative learning then strengthens connections between the AFP neurons encoding syllable A in sensory coordinates and the RA neurons encoding A in motor coordinates. Note that to develop the appropriate mapping between the AFP and RA, the output neurons in the AFP must encode a sensory representation of the current syllable. To use the map to bias RA activity toward the tutor sequence, these same AFP output neurons must encode a representation of the next syllable. Our model simply assumes that AFP efferents contain a combination of these signals. Possible explanations for how the components of this mixed signal could exert distinct functional influences in RA are described in the METHODS.



View larger version (34K):
[in this window]
[in a new window]
 
Fig. 5. Learning a sensory right-arrow motor mapping between AFP and RA. After the initial phase of efference copy learning, the HVc_RA activity that produces motor activity for syllable A in RA (MotA) will also produce a sensory prediction (SenA) of that motor activity in the AFP (black arrows). This leads to associational learning between AFP assemblies encoding syllable A in sensory coordinates and the RA assemblies encoding syllable A in motor coordinates (white arrow).

PROBLEM 3: SEQUENCE LEARNING IN HVC. Even though the model has learned the correct efference copy right-arrow next motor transition, SenA right-arrow CtxtA right-arrow MotB, sequence learning is not yet complete. This is because by altering synapses in RA, the AFP has perturbed the motor right-arrow sensory matching necessary for an accurate efference copy in HVc. In particular, HVc_RA neurons belonging to the representation for CtxtA originally mapped onto some particular combination of motor representations in RA. For example, perhaps CtxtA originally mapped most strongly onto syllable D. With an accurate efference copy, these same HVc_RA neurons were mapped onto the corresponding combination of sensory representations in HVc_AFP, SenD. Remapping CtxtA onto MotB in RA alters this correspondence, and the HVc sequence generator produces the following set of mappings: SenA right-arrow CtxtA right-arrow SenD right-arrow CtxtD. Presumably, the context signal from syllable D, CtxtD, is mapped onto MotE in RA. Therefore, syllable B (produced by CtxtA) will be followed, not by C, but by E. However, such errors in the efference copy component of the HVc sequence generator are continually corrected by renewed auditory feedback-driven learning in the HVc_RA right-arrow HVc_AFP connections (Table 1, number 4): CtxtA excites MotB in RA, leading to an auditory feedback signal SenB arriving in HVc_AFP (Fig. 6). Therefore, HVc_RA right-arrow HVc_AFP connections between HVc_RA neurons belonging to CtxtA and HVc_AFP neurons belonging to SenB are strengthened (Fig. 6, white arrow), supplanting the "old" connections from CtxtA right-arrow SenD. In this way, the HVc sequence generator is able to track the AFP-induced changes in RA. By combining the appropriate sensory right-arrow motor and motor right-arrow sensory mappings, the model learns the chain of sensory-motor associations that reproduces the tutor sequence: SenA right-arrow (CtxtA right-arrow MotB) right-arrow SenB ... .



View larger version (39K):
[in this window]
[in a new window]
 
Fig. 6. Renewed efference copy learning. Since AFP-guided sequence learning alters the projection from HVc_RA right-arrow RA, renewed efference copy learning (white arrow) is required so that CtxtA projects onto SenB in HVc_AFP (cf. Troyer and Doupe 2000, Fig. 4A). Thus, auditory feedback is necessary throughout development to maintain an accurate efference copy.


    METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
APPENDIX
REFERENCES

The model presented in this paper is an extension of the syllable learning model described in the preceding companion paper (Troyer and Doupe 2000). To account for the generation and learning of song sequence, we added two new sets of synaptic connections to this model (Fig. 2B). Because our model is relatively abstract at the level of local circuits, the choice of how these connections were embedded in our computer algorithm was governed chiefly by considerations of computational simplicity (a variety of biological mechanisms could contribute to their functionality). An understanding of the theoretical issues related to our implementation is not necessary to understand our simulation results. Most features of the model are described in detail in Troyer and Doupe (2000). We discuss here only new additions to the model. The final subsection in the METHODS describes the method we used for quantifying the time course of model development.

Most simulations of the complete model contained 25,000 syllables, over 5,000 more than were typically needed for model output to become stereotyped (see APPENDIX). Computer simulations were written using the MATLAB simulation environment (version 5.3; The Mathworks, Natick, MA). Typical simulations took approx 3 h when run using a 400-MHz Pentium II processor. Details regarding simulations and parameters are contained in the APPENDIX.

HVc_AFP right-arrow HVc_RA connections

To account for sequence generation, connections from HVc_AFP to HVc_RA were added (Fig. 2B). These connections are assumed to be functionally "slow synapses" that carry information from one syllable to the next (cf. Kleinfeld and Sompolinsky 1988). For computational simplicity, the functional separation of HVc connections was strict: HVc_RA right-arrow HVc_AFP connections carried only efference copy information related to the current syllable, and the HVc_AFP right-arrow HVc_RA connections broadcast signals that affected only the subsequent syllable. However, our general approach requires only a functional imbalance between the two populations of HVc projection neurons. A strict separation is not crucial. To match the functional delay in the HVc_AFP right-arrow HVc_RA pathway (approx 50 ms), a corresponding delay was introduced in the time window for synaptic plasticity in these connections (see APPENDIX). In general, we followed the principle that the time window for synaptic plasticity should be roughly proportional to the time scale of encoding for the information passed over that synapse. RA connections, which encode the detailed motor programs within each syllable, had the shortest plasticity window, and the HVc_AFP right-arrow HVc_RA context synapses had the longest.

Since it relies on reciprocal excitatory connections, the pattern generator within HVc tended to be unstable. To help control this positive feedback, we 1) normalized the size of the context signal during each syllable (see APPENDIX), and 2) included "adaptation" in the HVc_RA assemblies. HVc_RA adaptation was of the same form as the HVc_AFP adaptation included to cancel the delayed auditory feedback (Troyer and Doupe 2000). However, because HVc_RA adaptation was included to counteract an overall build up of HVc activity, its decay time (225 ms) was considerably longer than the decay time of HVc_AFP adaptation (115 ms).

AFP right-arrow RA connections and signals

The circuitry within the three song nuclei that make up the AFP could, in principle, subserve a variety of complex processing tasks. Our model treats the entire AFP as a "black box" performing the necessary calculations related to template comparison (see APPENDIX for details). Our algorithm was governed chiefly by computational simplicity, but most calculations could be implemented relatively easily by a variety of biologically plausible circuits.

Processing within the AFP is shown in Fig. 7. Each AFP "input assembly" receives input from the HVc_AFP assemblies encoding sensory features related to the corresponding tutor syllable (the nature of the encoding scheme used in our model is described in Troyer and Doupe 2000; Fig. 6). Input is also received by a single inhibitory unit that broadcasts its output to all input assemblies. This "feedforward inhibition" implements a form of competition in which the only active AFP assemblies are those that receive significantly more input than average.



View larger version (36K):
[in this window]
[in a new window]
 
Fig. 7. Processing in the AFP (see METHODS for details). Input from HVc_AFP excites "feedforward" inhibition (filled circle, top) that implements a competition between AFP input assemblies (only those assemblies receiving significantly more than the average amount of input will be active). Three different calculations are performed on the results of this competition. 1) The match between this efference copy activity and the tutor song is determined (see Troyer and Doupe 2000). The results in a single "reinforcement" value that is strongly broadcast to all AFP output assemblies, accounting for 75% of their activity. 2) Patterned activity related to the current syllable is passed on unchanged, accounting for 15% of AFP output assembly activity. 3) Patterned activity related to the current syllable is delayed for the duration of one syllable and then delivered to AFP output assemblies in a pattern that is shifted forward one syllable in the tutor sequence. This shifting mechanism is how tutor sequence is stored in the AFP, and the shifted signal accounts for 10% of AFP output assembly activity. Feedforward inhibition in RA (filled circle, bottom) counteracts the strong reinforcement signal, leaving the patterned signal to affect RA activity via the pattern of AFP right-arrow RA connections. After the initial period of sensory-motor matching (see RESULTS), signal (2) is redundant with the strong motor input from HVc_RA, leaving signal (3) to be the main contribution to altering activity in RA.

The main difficulty for our model is that the AFP is assumed to simultaneously broadcast three distinct signals that are important for separate aspects of sensorimotor learning. Each of these calculations is represented by a separate box in the middle of Fig. 7: 1) to guide syllable learning, the AFP transmits a nonspecific reinforcement signal that modulates plasticity in RA; 2) to organize a sensory right-arrow motor mapping between the AFP and RA, the AFP forms a sensory representation related to the current syllable; 3) to guide sequence learning, the AFP must generate, with a one syllable delay, a sequence teaching signal that biases RA activity toward the next syllable in the tutor sequence. A possible neural substrate for this delayed sequence teaching signal is the axon collaterals that transmit information from the lateral portion of the magnocellular nucleus of the anterior neostriatum (LMAN), the output nucleus of the AFP, to area X, the input nucleus of the AFP (see Fig. 1C, Troyer and Doupe 2000). The appropriate delay is roughly 75 ms, the length of a typical song syllable (approx 115 ms) minus the processing delay contributed by the AFP (approx 40 ms). Note that signals 1 and 2 are used to guide plasticity in RA but are not required to influence RA activity. In contrast, the purpose of signal 3 is to guide activity, but in principle, could disrupt learning in the AFP right-arrow RA pathway.

In our implementation, the three signals are not segregated at the level of AFP outputs: the activity within the AFP output assemblies is just a summation of signals 1-3. The input to each RA assembly is then calculated as a sum of AFP outputs, weighted by the pattern of synaptic strengths from the AFP right-arrow RA. This input serves both as a source of additive external input summed with RA input coming from HVc, and as a modulatory term in the RA plasticity rule (see APPENDIX). The modulation of RA plasticity in our model is completely phenomenological. Candidate mechanisms include release of trophic factors by AFP efferents (Johnson et al. 1997) or downstream effects of calcium entering through AFP glutamatergic synapses, which are dominated by NMDA receptors (Mooney and Konishi 1991).

How does the superposition of signals 1-3 in AFP output neurons exert separate effects in RA? The nonspecific reinforcement component of the AFP activity (signal 1) is separated from the two patterned components by its magnitude: we assume that the reinforcement signal contributes 75% of the input to AFP output assemblies. AFP output is then dominated by this reinforcement signal, and the resulting modulation of RA plasticity can be used to guide syllable learning. To allow the two patterned signals to play their role in song learning, we assume that the AFP also excites a population of inhibitory interneurons local to RA (Fig. 7, filled circle, bottom). This feedforward inhibition counteracts the nonspecific (reinforcement) component of the AFP input to RA, causing this nonspecific input to have little effect on spiking activity in RA. However, inhibition would not be expected to cancel trophic effects of AFP inputs and hence would not block reinforcement mediated by neurotrophins. In an alternative scenario, inhibition that is proximal to the cell body might eliminate spiking but not prevent the depolarization within distal dendrites by inputs from HVc_RA or other RA neurons. Thus, calcium entry through NMDA receptors at AFP synapses could still be used to modulate plasticity within the dendritic tree, even though the currents flowing through these receptors are counteracted by inhibition arriving at the soma.

In addition to explaining how the nonspecific reinforcement component of the AFP activity is prevented from disrupting patterns of RA activity, we must explain how to prevent it from disrupting the learning in the AFP right-arrow RA pathway. By definition, a large reinforcement signal that is expressed as high activity in all AFP output assemblies will also lead to increased plasticity within all RA assemblies. This correlation between nonspecific presynaptic firing in the AFP and nonspecific modulation of plasticity in RA tends to strengthen all synapses from the AFP right-arrow RA. To counteract this tendency, AFP right-arrow RA synapses were assigned a higher plasticity threshold (see APPENDIX).

The action of the AFP activity related to the current efference copy (signal 2) is straightforward: after the efference copy mapping from HVc_RA to HVc_AFP gives an accurate prediction of the motor input from HVc_RA to RA (Troyer and Doupe 2000), the AFP assembly corresponding to the current syllable will be most active when RA assemblies corresponding to that syllable are also active. Sensory right-arrow motor associational learning follows, causing AFP assemblies encoding a particular tutor syllable to project most strongly to RA assemblies encoding the same syllable. (Fig. 5). After the sensory right-arrow motor matching is accomplished, the input from the AFP activity related to signal 2 will be redundant with the (stronger) input to RA from HVc.

Our functional requirements for the sequence teaching signal (signal 3) are that it biases RA activity toward the next syllable in the tutor sequence, but does not disrupt the learning in the AFP right-arrow RA pathway driven by signal 2. To implement the proper bias, the processing box marked "Sequence Template" in Fig. 7 accepts a pattern of input, waits for one syllable, and then excites AFP output assemblies in a pattern that is shifted one syllable forward in the tutor sequence. Since the AFP right-arrow RA connections perform a sensory right-arrow motor mapping, this signal will bias RA toward the next motor command in the tutor sequence (Fig. 4). The reason that this signal does not disrupt the associations necessary to develop a sensory right-arrow motor mapping to RA is that, before sequence learning is accomplished, the inputs from HVc_RA to RA are strong and their sequence is random. Therefore, AFP activity for the subsequent syllable (signal 3) will not be strongly correlated with RA activity and hence will not contribute significantly to plasticity in the AFP right-arrow RA connections. After the model begins to produce the proper sequence, the motor patterns in RA driven by HVc_RA will be matched to the sequence teaching signal syllable (signal 3). Hence, the associational plasticity related to signal 3 will simply reinforce the sensory right-arrow motor mapping originally organized by signal 2.

Our implementation represents only one of many plausible ways in which different signals could exert different effects in RA. A conceptually simple solution to the problem of segregation would be to have different functional signals carried by distinct classes of AFP projection neurons. However, developing such a separation could be difficult. Another alternative is for different signals to be encoded in different temporal patterns of AFP activity (e.g., bursting versus tonic). These could preferentially excite separate receptors in RA and/or trigger different plasticity mechanisms in RA. Finally, since the three signals make crucial contributions to learning at different times during song learning (see Fig. 11 in RESULTS), their functions could be subserved by mechanisms tied to developmental critical periods. Our model makes predictions regarding the functional information carried by the AFP right-arrow RA pathway. Further experiments will be required to determine the possible neural substrate for these signals.

Quantifying learning time course

To obtain quantitative results regarding the time course of learning in the model, we measured how closely the statistics of RA motor output matched the statistics of the tutor song, as well as measuring how closely important patterns of connectivity matched the properties of an "ideal" model that would accurately reproduce the tutor song. The measure used to compute these matches was the correlation coefficient (CC) applied to the elements of the relevant connection matrices (see METHODS in Troyer and Doupe 2000). Syllable-related activity was quantified as in Troyer and Doupe 2000. Sequence-related activity was quantified by dividing the model output into 250 syllable epochs and constructing Mnext, the matrix of co-fluctuations between patterns of RA activity for a given syllable and the patterns of RA activity for the next syllable
<IT>M</IT><SUP><IT>next</IT></SUP><SUB><IT>ij</IT></SUB><IT>=</IT><FR><NU><IT>1</IT></NU><DE><IT>250</IT></DE></FR> <LIM><OP>∑</OP><LL><IT>n</IT><IT>=250</IT>(<IT>m</IT><IT>−1</IT>)<IT>+1</IT></LL><UL><IT>250</IT><IT>m</IT></UL></LIM> [<IT>r<SUB>i</SUB></IT>(<IT>n</IT><IT>−1</IT>)<IT>−</IT><IT><A><AC>r</AC><AC>&cjs1171;</AC></A></IT>(<IT>n</IT><IT>−1</IT>)][<IT>r<SUB>j</SUB></IT>(<IT>n</IT>)<IT>−</IT><IT><A><AC>r</AC><AC>&cjs1171;</AC></A></IT>(<IT>n</IT>)]
where ri(n) is the activity level in the ith RA assembly, and <A><AC>r</AC><AC>&cjs1171;</AC></A>(n) is the average activity across assemblies during syllable n. We used the CC to compare Mnext to an ideal syllable transition matrix, Mseq: Mijseq = 4, if assembly j forms part of the representation for the syllable following the syllable coded by assembly i; Mijseq = -1, otherwise. Diagonal entries were included.

In addition to monitoring patterns of RA activity, we monitored development in four sets of connections. 1) The accuracy of the efference copy map was quantified by calculating the correlation coefficient between the pattern of HVc_RA right-arrow motor connections (HVc_RA right-arrow RA) and HVc_RA right-arrow sensory connections (HVc_RA right-arrow HVc_AFP). 2) To quantify the development of the sensory right-arrow motor mapping (Fig. 5), we computed the CC between the pattern of AFP right-arrow RA connection strengths and the ideal pattern of connectivity, in which the AFP assembly representing a given tutor syllable would have connections only onto RA assemblies encoding the motor features belonging to that syllable. 3) To quantify the progress of syllable learning, we computed the CC between the ideal syllable correlation matrix, Msyl, and the pattern of intrinsic RA connections as in Troyer and Doupe 2000. Mijseq = 4, if assembly j forms part of the representation for same syllable as assembly i; Mijseq = -1, otherwise. Diagonal entries were excluded. 4) To evaluate sequence-related connectivity, we multiplied the HVc_AFP right-arrow HVc_RA and HVc_RA right-arrow RA connection matrices. The resulting matrix represents the influence of each HVc_AFP assembly on each RA assembly via the context signal in HVc (Fig. 3). The correlation coefficient between this matrix and Mseq was used to measure the development of sequence-related connectivity.


    RESULTS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
APPENDIX
REFERENCES

Our model explores how song learning can result from associational plasticity, guided by template comparison signals transmitted by the AFP. The representation of the sensory and motor aspects of song in our model is described in detail in our companion paper (Fig. 6 in Troyer and Doupe 2000). Briefly, the information encoded within each neural population (HVc_RA, HVc_AFP, RA, and the AFP) is represented by the activation value of a number of processing units, each meant to capture the average level of activity within a connected set of neurons or "cell assembly" (Hebb 1949). For most simulations, the tutor song contains five syllables, with each syllable composed of eight abstract vocal features. The features encoding different syllables are assumed to be unique, so we number the features according to tutor syllable (syllable A, features 1-8; syllable B, features 9-16; etc.). Each of 40 RA assemblies encodes the motor aspect of one vocal feature, and each of 40 HVc_AFP assemblies encodes the sensory aspect of one feature. The template for syllables is stored in the connections from HVc_AFP right-arrow AFP, and the template for tutor sequence is stored by circuitry internal to the AFP (see METHODS).

Sensorimotor learning is accomplished in three stages. The first two stages were explored in our companion paper (Troyer and Doupe 2000). At the beginning of the simulation, all connections in the motor pathway are unstructured, and the premotor drive initiating each syllable drives unorganized patterns of RA activity (Fig. 8A). During the initial, efference copy learning stage, associations between the HVc_RA motor activity and the resulting auditory feedback input to HVc_AFP cause a motor right-arrow sensory efference copy mapping to develop between these two populations (stage 1; Figs. 4A, 8 in Troyer and Doupe 2000). In the second, syllable learning stage, the AFP evaulates the efference copy signals and broadcasts template matching "reinforcement" signals that reorganize synaptic strengths in RA so that assemblies corresponding to individual tutor syllables are co-active (stage 2; Fig. 8B; Figs. 4A, 10 in Troyer and Doupe 2000). In this paper, we focus on the final, sequence learning stage, in which "sequence teaching" signals from the AFP act in concert with the sequence generation mechanism in HVc so that syllable representations are produced in the correct order, A right-arrow B right-arrow C right-arrow D right-arrow E right-arrow A ... (stage 3; Fig. 8C). It is important to note that a segregation between developmental stages is not embedded within our learning rule or network architecture. Rather, all synapses in HVc and RA are plastic, and this plasticity lasts throughout the simulation. Thus, development is driven by interdependent patterns of association that emerge during song learning.



View larger version (49K):
[in this window]
[in a new window]
 
Fig. 8. Overview of model behavior. RA assemblies (40 total) are grouped along the vertical axis according to the tutor syllable to which they correspond (labeled A-E). Bar shows color scale for this and subsequent figures. A: RA activity during the first 10 simulated syllables (numbered from start of simulation). RA activity is unorganized and random. B: syllable learning. RA activity during each syllable is well-matched to one of the tutor syllables, but syllables are produced in a nearly random order. C: sequence learning. By syllable 25,000, activity is matched to the tutor representation, with syllables produced in the proper sequence.

Sequence learning

The key to sequence learning in the model is the ability of signals from the AFP to bias RA activity toward the proper syllable transitions (Fig. 9A, arrows). Acting over multiple syllables, this in turn biases the association between HVc_RA and RA activity. The resulting change in connections from HVc_RA right-arrow RA connectivity leads to the production of appropriate syllable transitions (Fig. 4). Auditory feedback ensures that an accurate efference copy mapping is maintained (Fig. 6). The gradual improvement of syllable transitions is shown in Fig. 9B.



View larger version (36K):
[in this window]
[in a new window]
 
Fig. 9. Sequence learning. A: AFP-guided syllable transitions. HVc input to RA (H), AFP input to RA (A), and RA activity (R) for syllables 14,001-14,007. For syllables 14,002 and 14,006, the input from the AFP ensures proper syllable transitions, overriding "incorrect" input from HVc (arrows). To emphasize differences in the input to various RA assemblies, the density of shading for H and A represents the amount of input that exceeds the mean for that pathway; inputs weaker than the mean are not shown. B: convergence toward proper sequence. Model output for 51 consecutive syllables is shown at 5 different developmental time points. Syllable transitions are initially random but eventually begin to be produced in small strings matching the tutor song. Eventually the entire sequence is learned.

Time course of learning

To examine the time course of learning, we considered the properties of an "ideal" solution, in which patterns of connectivity were set so that this ideal model would accurately reproduce the tutor song (see METHODS for detailed definitions). We then quantified how closely important sets of connections matched the ideal model. The match was calculated using the correlation coefficient, a method that gives a value of one for identical connection patterns and values near zero for connection patterns that are uncorrelated. We measured four sets of connections, the efference copy map from HVc_RA right-arrow HVc_AFP, the sensory right-arrow motor map from the AFP right-arrow RA, syllable storage in the RA right-arrow RA connections, and the sensory right-arrow next motor pathway from HVc_AFP right-arrow HVc_RA right-arrow RA. We also measured how closely the motor output from RA matched the tutor song. These calculations were performed for "epochs" consisting of 250 consecutive syllables produced by the model. To quantify the development of tutor syllables, we calculated the matrix of co-fluctuations, whose ijth entry indicates whether assembly i and assembly j have similar patterns of activity. To quantify the development of tutor sequence, we calculated a similar matrix, except that the ijth entry indicates whether activity in RA assembly i during syllable n co-fluctuated with the activity in assembly j during syllable n + 1. These matrices were matched to the corresponding matrices computed from the tutor song, again using the correlation coefficient (see METHODS).

The developmental time courses of the multiple, interacting associations underlying model development are summarized in Fig. 10A. Figure 10B shows which connections are most important during each of the song learning stages traced in Fig. 10A. Initially, the only consistent pattern of association in the network is between motor activity and delayed auditory feedback, and the corresponding efference copy mapping develops rapidly (stage 1, dotted line). As accurate efference copies are passed onto the AFP, a sensory right-arrow motor mapping also develops between the AFP and RA (stage 1a, dashed-dotted line; see Fig. 5). An accurate efference copy also causes the AFP to produce consistent reinforcement signals, which reorganize intrinsic RA connections so that RA assemblies corresponding to the same tutor syllable begin to receive common patterns of synaptic input (stage 2, thin solid line). As this happens, the model begins to produce RA activity patterns matched to the tutor syllables (thin dashed line). As syllables are learned, efference copy activity in HVc_AFP becomes increasingly confined to patterns matched to the relatively small number of tutor syllables. These aspects of the model (with the exception of stage 1a) were described in detail in our companion paper (Troyer and Doupe 2000). As syllable learning proceeds, clearly defined sequence teaching signals begin to be produced by the AFP. These begin to bias RA activity toward the tutor sequence (stage 3, thick solid line; see Fig. 9A). This altered activity then remaps the connections from HVc_RA to RA, so that the polysynaptic pathway from HVc_AFP right-arrow HVc_RA right-arrow RA (thick dashed line) yields correct sensory right-arrow next motor syllable transitions. Note that improvement in the sequencing of RA activity happens before the learning of the appropriate connectivity from HVc_AFP right-arrow HVc_RA right-arrow RA, since AFP-driven sequence transitions are necessary to drive sequence related learning. The reorganization of the HVc_RA right-arrow RA pathway disrupts the efference copy mapping, which begins to degrade slightly during the period of sequence learning (dotted line, syllables 8,000-17,000). This tension between AFP-guided changes in the motor pathway and renewed efference copy learning continues until both are in rough agreement. This agreement causes a transient decline in the efference copy match (near syllable 16,000), since the HVc right-arrow RA connection races ahead to the final solution. The efference copy makes a final recovery, and the model produces a stereotyped sequence of song syllables.



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 10. Summary of developmental time course. A: 3 basic stages of development. The initial stage of efference copy learning is nearly complete by syllable 1000 (stage 1, dotted line). As accurate efference copy signals are passed on to the AFP, a sensory right-arrow motor mapping is learned in the connections from the AFP right-arrow RA (stage 1a, dashed-dotted line; see Fig. 5). Accurate efference copy signals also allow the onset of syllable learning (stage 2). The development of motor activity matched to the tutor song (thin solid line) mirrors the development of appropriate connectivity intrinsic to RA (thin dashed line). Because sequence learning (stage 3) is driven by correct transitions guided by the AFP (Fig. 4), correct sequence activity (thick solid line) occurs before the development of the appropriate composite mapping (HVc_AFP right-arrow HVc_RA right-arrow RA) in the motor pathway (thick dashed line). Note that reorganization in the HVc_RA right-arrow RA pathway that underlies sequence learning disrupts the efference copy match during syllables 8,000-17,000. The correlation coefficients computed are defined in the RESULTS. B: involvement of connections in the different stages of learning shown in A.

Range of model behavior

By presenting results from a single representative simulation, we have demonstrated the plausibility of our core hypothesis that associational learning, distributed widely throughout the song system, is sufficient for sensorimotor matching to a previously memorized template stored in the AFP. Because each stage of the learning is dependent on previously developed associations, a complete assessment of the reaction of our model to changes in model parameters is beyond the scope of this paper (see Troyer and Doupe 2000 for some important manipulations).

Overall, sequence learning was significantly less robust than syllable learning, since it results from continual interplay between the changes in the HVc to RA projection and the efference copy mapping in HVc. The robustness of model behavior at the default set of parameters was assessed by running 10 simulations, each with different random seeds determining the initial pattern of synaptic connectivity and the sequence of premotor drives. All simulations eventually learned the tutor song perfectly. Nine of these simulations followed a similar time course, completing sequence learning near syllable 17,000 (Fig. 11A). However, in one of the simulations, correct learning took significantly longer and was not complete until syllable 25,000 (Fig. 11B). Examination of the output of this simulation reveals that during the period between syllable 15,000 and 20,000 when the other simulations were stringing together series of transitions to match the tutor song, this simulation began to repeat the subsequence A-D, omitting syllable E (Fig. 11C). Since the strong homeostatic mechanisms in the model prevent any RA assemblies from becoming permanently inactive, the model compromised, occasionally inserting a strong version of syllable E in place of syllable D. However, by syllable 23,000, the model began to insert syllable E in its proper place in the sequence, but sometimes syllable E was repeated and sometimes syllable A was dropped. By syllable 25,000, the model had converged on the correct sequence. Personal observation of many simulations revealed that such temporary "compromise" solutions to the competing requirements of associational change in the HVc_RA right-arrow RA projection and the maintenance of an accurate efference copy mapping within HVc were not uncommon.



View larger version (32K):
[in this window]
[in a new window]
 
Fig. 11. Variability of learning time course. A: Development of syllable-related and sequence-related activity for 9 of 10 repeated simulations. Parameters were fixed at their default values and simulations were run using different random seeds to determine the initial connectivity and the sequence of premotor drives. Output was quantified as in Fig. 10A. B: convergence in 1 of the 10 simulations was not complete until syllable 25,000 (solid lines). The average time course of the 9 simulations shown in A is plotted for comparison (dotted lines). C: model output for simulation plotted in B. The model first converged on a suboptimal solution by repeating syllables A-D and occasionally substituting syllable E for D. Due to homeostatic mechanisms that act to keep average activity in all assemblies constant, syllable E had large activity (black rectangles). By syllable 23,000, E was inserted in the proper position but was often repeated 2-3 times. Syllable A was sometimes dropped. Repetitions eventually ceased and by syllable 25,000 the model produced the proper sequence.

To further assess the range of model behavior, we increased the number of syllables to eight, thereby increasing the range of possible sequence transitions. The number of vocal features in each syllable was reduced to five, so that the simulations contained the same number of RA assemblies as before (8 × 5 = 40). AFP circuitry was adjusted for the different template, and AFP right-arrow RA learning was slightly adjusted to ensure that an accurate sensory right-arrow motor mapping was learned (see APPENDIX). To push the model to make mistakes, all learning rate parameters were increased by a factor of 5. No other parameters were readjusted. The range of RA output for a set of 10 simulations is shown in Fig. 12. Perfect learning occurred in six of the ten simulations. An example is shown in Fig. 12A. In one simulation, the model produced a stereotyped sequence of eight syllables, but this motif consisted of two "chunks" of appropriately copied song, separated by a string of three syllables sung in reverse order (Fig. 12B). In the three other simulations, the full sequence was broken into two repeated subsequences (Fig. 12, C-E). These were sung in alternation, with the rate of alternation controlled by the interaction between associational learning and homeostatic mechanisms that prevent the elimination of either subsequence. In versions of the model with weaker homeostatic mechanisms, syllables outside of the most commonly sung subsequence were simply dropped (not shown).



View larger version (37K):
[in this window]
[in a new window]
 
Fig. 12. Imperfect sequence learning. A-E: outcomes of 10 simulations with increased numbers of syllables. Perfect learning (A) occurred in 6 simulations. In one simulation (B), a full sequence of 8 syllables was produced, but the sequence was broken into three "chunks" of syllables. In 2 of these chunks (syllables A-C and G-H), syllables were sung in the proper order. In the 3 other simulations (C-E), the sequence was broken into two subsequences, with subsequences sung in alternation. Transition times between subsequences are determined by the interaction of learning with slow homeostatic mechanisms.


    DISCUSSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
APPENDIX
REFERENCES

Principal findings and predictions

By constructing a computational model, we have demonstrated that simple rules of associational plasticity, operating throughout the song system, are sufficient to support sensorimotor learning at multiple levels of the temporal hierarchy for song. Learning proceeds in a series of stages, with efference copy learning followed by syllable learning and then sequence learning. These developmental stages are not predetermined by our learning rule, but follow a cascade of interrelated associations that are guided by template matching signals from the AFP.

In this paper, we focused on the problem of learning song sequence. We propose that sequence generation results from a reciprocal sensory-motor interaction between the two populations of HVc projection neurons: the motor component is encoded primarily in RA-projecting HVc neurons, whereas the sensory component is encoded primarily in AFP-projecting neurons (Katz and Gurney 1981; Kimpo and Doupe 1997; Lewicki 1996; Saito and Maekawa 1993). This mechanism predicts that the participation of neurons in both populations is required for normal sequence generation. We also predict that the slow "context" signals linking one syllable to the next flow primarily from AFP-projecting to RA-projecting neurons. While we have not explored possible neural substrates for this functionally slow connection, Kubota and Taniguchi (1998) have reported that RA-projecting neurons possess an ionic current that delays the initiation of action potentials.

The absence of a direct projection from the AFP to nuclei upstream of RA, the likely site of sequence generation (Vu et al. 1994), poses a significant challenge to the hypothesis that the AFP guides learning of song sequence. One strategy for overcoming this challenge is for the AFP to guide learning within the connections from HVc to RA, so that the outputs from the pattern generator are mapped onto the appropriate sequence of syllable representations in RA (Doya and Sejnowski 1998). Viewed in isolation, this hypothesis predicts the existence of an autonomous pattern generator that is unaffected by outputs from the AFP. In our model, however, a motor