A key discovery that has emerged from studies of the vocal system in songbirds is that neurons in these regions respond preferentially to playback of the bird's own song (BOS). This BOS selectivity is not a general property of neurons in primary and secondary auditory forebrain regions, field L and caudolateral mesopallium (CLM). Moreover, anatomical studies have been unable to conclusively define a direct projection from field L and/or CLM to HVC, a central structure for integrating sensory and motor information in the vocal system. To examine the communication between these regions, we used simultaneous dual-electrode recording in anesthetized male zebra finches and cross-correlation analysis to estimate the functional connectivity between auditory areas, field L and CLM, and HVC. We found that ≥18% of neurons in field L and 33% of neurons in CLM are functionally connected to HVC, most with auditory forebrain leading-HVC latencies ranging from 0.5 to 15 ms. These results indicate that field L and CLM communicate extensively with HVC through both direct and indirect anatomical connections. To further explore the role of the auditory forebrain cells that are functionally connected with HVC, we assessed their responsiveness and selectivity for a variety of natural and synthetic auditory stimuli. We found that field L and CLM neurons that are functionally connected to HVC exhibit generic auditory forebrain properties including the lack of BOS selectivity. This finding puts further constraints on the neural architecture and the nature of the nonlinearity that leads to BOS-selective auditory responses in the vocal control nuclei.
Songbirds learn to produce complex vocalizations in a manner that is reminiscent of some aspects of speech learning in humans. Song learning is a two-stage process in which a juvenile songbird listens to and memorizes the song of a tutor bird (sensory phase), begins practicing his own vocalizations, and then uses auditory feedback to adjust his copy of the tutor's song until he is able to produce a stable song that is similar but not identical to the tutor's song (sensory-motor phase; Marler 1981). The system of interconnected brain nuclei specialized for the learning and production of song has been well characterized over the past several decades (Brenowitz et al. 1997; Nottebohm et al. 1976). This “song system” is composed of two pathways: the motor pathway, essential for song production, and the anterior forebrain pathway, important for learning and maintaining song (Fig. 1). HVC, the song system structure, is situated at the junction of both pathways, functioning in both sensory and motor processing. Electrophysiological recording studies have found that in anesthetized, sedated, and sleeping birds, HVC neurons respond selectively to playback of the bird's own song (BOS) over other auditory stimuli (sedated: Cardin and Schmidt 2003; anesthetized: Margoliash 1983; sleeping: Nick and Konishi 2001). This selectivity emerges during vocal development (Doupe 1993; Solis and Diupe 1995; Volman 1993) and is found also in HVC's efferent targets (Doupe and Konishi 1991). In contrast, during wakefulness, the responses of HVC neurons to passive presentation of sounds diminishes, becomes more variable, and is less selective (Cardin and Schmidt 2003, 2004; Rauske et al. 2003). At the same time, in awake birds, HVC neurons exhibit robust premotor activity during active singing (McCasland and Konishi 1981; Nick and Konishi 2001; Rauske et al. 2003; Yu and Margoliash 1996).
These findings suggest that HVC may play a key role in modulating the bird's motor activity based on auditory feedback during the sensory motor phase of song learning. To understand how this high degree of selectivity arises and what its role could be in shaping song learning behavior, recent studies have focused on areas presynaptic to HVC. Such studies have centered on two main lines of inquiry: determining the primary source of auditory input to HVC, and assessing the level of stimulus-specificity found in said auditory area.
The field L complex is a large auditory region afferent to HVC (Kelley and Nottebohm 1979). This avian analog of primary auditory cortex contains several subregions (L1, L2a, L2b, and L3) that can be distinguished based on cytoarchitecture and connectivity (see Fig. 1; Fortune and Margoliash 1992, 1995; Vates et al. 1996). Auditory nucleus ovoidalis in the thalamus projects to subregions L2a and L2b both of which project to subregions L1 and L3. L1 and L3 make bidirectional connections with two secondary auditory areas in the pallium: nidopallium caudal medial (NCM) and caudolateral mesopallium (CLM) (Vates et al. 1996). NCM projects to CLM by caudal medial mesopallium (CMM). Ascending auditory information then passes from CLM to HVC through nucleus interfacialis (NIf) (Vates et al. 1996). To date, no direct connections between field L and HVC have been observed, but fibers of passage in the HVC shelf region have made such observations extremely difficult (Vates et al. 1996). Tentative observations indicate that sparse connections may exist between field L and HVC shelf and HVC shelf and HVC proper, but confirmation is still pending (Fortune and Margoliash 1995; Gurney 1981; Katz and Gurney 1981; Vates et al. 1996). Physiological studies of field L have found that whereas units in this region are responsive to complex auditory stimuli, such as conspecific song (Con), field L neurons are not selective for the BOS over other conspecific song (Amin et al. 2004; Bonke et al. 1979; Grace et al. 2003; Leppelsack and Vogt 1976; Lewicki and Arthur 1996). The four subregions of field L show little difference in stimulus-specificity (Amin et al. 2004; Grace et al. 2003).
Simultaneous dual-electrode recording studies in NIf and HVC have found that NIf neurons respond to the BOS, the bird's own song played in reverse (Rev), and conspecific song (Con) stimuli, but show a stronger response to the BOS than to any other natural stimulus (Cardin and Schmidt 2004; Coleman and Mooney 2004; Janata and Margoliash 1999). This type of BOS selectivity is not as strong as that seen in HVC, where cells typically respond nearly exclusively to the BOS. Dual-electrode studies of NIf and HVC have also measured auditory neural activity in HVC before and after deactivation of NIf. The results show that deactivating NIf with γ-aminobutyric acid (GABA) or muscimol greatly reduces or completely abolishes auditory activity in HVC (Cardin and Schmidt 2004; Coleman and Mooney 2004). Such results point to NIf as the primary source of auditory input to HVC.
Although NIf's contribution to the selectivity seen in HVC is becoming clear, some questions remain regarding the extent to which field L and CLM also contribute. In particular, since connectivity with HVC was not explored in previous electrophysiological studies of field L and CLM, it is possible that there exists a subset of cells within these areas that are connected with HVC and display a preference for the BOS. This hypothesis was proposed by Amin et al. (2004) and is consistent with the idea that such cells could represent the putative sparse connections between field L, HVC shelf, and HVC discussed earlier, or a class of cells that have not yet been characterized anatomically. The present study sought to investigate these possibilities. Our goal was twofold. First, we set out to estimate the degree of functional connectivity between the field L complex and HVC and CLM and HVC by measuring the cross-correlation between spike trains recorded simultaneously from pairs of cells or cell clusters in each of these areas. A pair of cells or cells clusters was considered functionally connected if there was a significant peak in the normalized cross-correlation (see methods). We eliminated the possibility of finding significant cross-correlations that resulted exclusively from stimulus-driven activity through a rigorous normalization procedure (see methods) and therefore we use the term functional connectivity to refer to a pair of cells or cell clusters that are likely to be anatomically connected through one or a small number of synapses. Second, we sought to evaluate stimulus selectivity for the BOS in field L and/or CLM cells or cell clusters that were functionally connected to HVC.
The Animal Care and Use Committee at University of California Berkeley approved all animal procedures. Adult male zebra finches (Taenopygia guttata) were raised in social/family contexts within a multifamily colony room at UC Berkeley. Song learning was assessed in a subset of the colony families and found to be normal (Amin et al. 2004). Adult males over 100 days of age were used for all experiments.
Two days before the acute physiological recording experiments, birds were anesthetized with intramuscular injections of 0.03–0.04 ml Equithesin (0.85 g of chloral hydrate, 0.21 g of pentobarbital, 0.42 g of MgSO4, 8.6 ml of propylene glycol, 2.2 ml of 100% ethanol, with a total volume of 20 ml H2O) and immobilized in a stereotax with ear bars and a beak holder. Small sections of the outer layer of skull were removed above the two areas we wished to explore in each experiment. In most experiments, the two areas that we targeted were HVC and field L/CLM and ink dots were made on the lower layer of skull at the precise coordinates for these regions (2.4 mm lateral from the dorsal bifurcation point of the midsagittal sinus for HVC and 1.2 mm lateral and 1.2 mm rostral from the dorsal bifurcation point of the midsagittal sinus for the field L complex and for CLM). For comparative purposes, we also targeted HVC and NIf in five separate experiments and in these instances, ink dots were placed at 1.7 mm lateral and 1 mm rostral from the dorsal bifurcation point of the midsagittal sinus for NIf, and at the HVC coordinates mentioned earlier. A metal post was then glued to the skull with dental cement. After completion of the surgical procedure and a period of monitored recovery, the bird was placed in its own recovery cage in the breeding colony until the experiment was to take place.
On the day of the experiment, the bird was anesthetized intramuscularly with three doses (25–35 μl) of 20% urethane administered at 0.5-h intervals. The bird was stabilized by affixing the metal head post to a stereotaxic device. The lower layer of skull and the dura were removed from the area surrounding the ink-marked locations. Tungsten extracellular electrodes of resistance 1–4 MΩ were lowered into each of the two exposed brain regions using a microdrive. The bird was then placed in a double-walled anechoic sound-attenuated chamber at a distance approximately 20 cm away from the speaker used to present all stimuli during recording sessions. The volume of the speaker was set to deliver zebra finch song at peak levels of 65 to 80 dB SPL (Ban dK Sound Level Meter, RMS weighting type B). Body temperature was monitored and adjusted to about 37° using a heating pad and a thermometer was placed under the bird's wing.
The stimulus ensemble consisted of both natural and synthetic sounds. The natural sounds were 1) the bird's own song (BOS), 2) the bird's own song played in reverse (Rev), 3) the bird's own song played in reverse syllable order (Revorder), and 4) conspecific song (Con). The synthetic stimuli were 1) random pure tones (Pips); 2) compound pure tones in which 20 samples from the random pure tones group were added together (Tones); and 3) broadband white noise (WN). All synthetic stimuli were exactly 2 s in length and all natural stimuli were chosen to be approximately 2 s in length. We used 20 different samples from each synthetic stimulus class. The overall peak power of the synthetic stimuli was matched to the overall peak power of the natural stimuli. Further details about the design of the synthetic stimuli can be found in Grace et al. (2003).
The BOS was recorded and digitized using TDT2 hardware (Tucker-Davis Technologies, Alachua, FL) with a custom-made graphical interface several days before the experiment. Song recordings were obtained by placing the bird in a sound-attenuated chamber equipped with a microphone for several hours until many song renditions were collected. If the bird did not sing over a 24-h period, a female was then introduced into the cage with the male for anywhere from a few minutes to 1 h to encourage song production. If a bird still did not sing, he was excluded from the study. Thereafter, the various song renditions were played and viewed in spectrographic form. The song containing introductory notes from the two to three most frequently sung motifs, and lasting approximately 2 s, was chosen as the playback stimulus. Undirected song was used in the majority of subjects, but directed song was also used in some experiments. The conspecific songs consisted of a set of 20 unrelated undirected adult male zebra finch songs. It has been shown that such a sample is a good representation of the spectral and temporal patterns occurring in zebra finch song (Singh and Theunissen 2003).
Experimental protocol and extracellular recording
Two different experimental procedures were implemented in this study. In procedure A, we ran a search stimulus set followed by a cross-correlation stimulus set. In procedure B, we ran a combined search–cross-correlation stimulus set and followed this, in a small number of cases, with a selectivity protocol (see following text). We first describe the details of procedure A. In procedure A, the search stimulus set was used to probe for auditory units or unit clusters (two to five units) in each area. The cross-correlation stimulus set was played after the search protocol and was used to collect data for later cross-correlation analysis. All stimuli in this procedure were interleaved and randomly presented with an interstimulus interval of 7 to 8 s. Two seconds of spontaneous activity were recorded before each stimulus to establish a baseline firing rate for the neuron or neuron cluster. In addition, 2 to 3 s of spontaneous activity were recorded after the stimulus and 1 to 3 s (uniform random distribution) was added between stimuli. Ten to 20 trials of two to four of the following stimuli were used as search stimuli: the BOS, the Rev, Revorder, Con, and WN. If the neuron or neuron cluster in one of the two regions was found to show either an increase or a decrease in spiking activity to either the BOS or WN compared with its spontaneous activity as determined by an on-line t-test, then the cross-correlation stimulus set was presented. In certain cases, we opted to present the cross-correlation stimulus set even though the on-line t-test was not significant. Visual inspection of the responses in these instances suggested a change in the spike patterning even though the mean rate was the same as the spontaneous rate. The cross-correlation stimulus set consisted of 50 trials of the BOS and 50 trials of a 2-s-silence stimulus interleaved and randomly presented with an interstimulus interval of 7 to 8 s. The silence stimulus gave us the opportunity to collect additional spontaneous activity necessary for calculating the cross-correlation (see Data analysis). On completion of the cross-correlation stimulus set, the cross-correlation during stimulus presentation and the cross-correlation during spontaneous activity were analyzed separately, off-line.
After analyzing the data collected from the cross-correlation data sets in procedure A, we found that we had obtained more than enough spikes to reliably calculate cross-correlations at each recording site. Consequently, to maximize our sample size and to minimize adaptation to any one particular stimulus in HVC, we changed the experimental protocol as follows. We consolidated the search stimulus set and the cross-correlation stimulus set, changed the number of trials, and increased the duration of the interstimulus interval. These changes allowed us to run experiments more efficiently and to maximize the number of field L–HVC and CLM–HVC paired sites recorded during each individual experiment. It also increased our yield of cells in HVC that were selective for the BOS. In this procedure (procedure B), 15 trials of the BOS, Rev, and WN were interleaved and randomly presented with an interstimulus interval of 8 to 9 s. Two seconds of spontaneous activity were recorded before each stimulus, 4 to 5 s of silence were tagged on after the stimulus, and an additional 1 to 3 s (uniform random distribution) was added between stimuli. This interstimulus interval change served two main functions. We have observed and it has been reported elsewhere (Margoliash et al. 1994; Sutter and Margoliash 1994) that HVC neurons tend to integrate over long periods of time; therefore increasing the interstimulus interval provided time for each HVC cell or cell cluster to complete its response to one stimulus before being presented with another. Additionally, this interval change permitted us to eliminate the use of silence as a stimulus because we were able to obtain enough spikes to calculate the cross-correlation using the spontaneous activity obtained during the 2 s preceding stimulus presentation. On completion of the 15 trials, we assessed the cells for auditory activity using an on-line t-test. If one of the units or unit clusters in each of the two areas was considered auditory, then the data were analyzed on-line for cross-correlation.
For a subset of the paired sites, we played another set of stimuli—the selectivity protocol—following the search–cross-correlation stimulus sets, to assess the selectivity of the auditory forebrain cells. In this stimulus set, 10 trials of the following were played: the BOS; the Rev; Revorder; three examples each of Con, Pips, and Tones; and two examples of WN. As in procedure A, these stimuli were interleaved and randomly presented with an interstimulus interval of 7 to 8 s. We opted to use a different interstimulus interval for the search–cross-correlation stimulus set and the selectivity stimulus set because each stimulus set focused on a different region. In particular, the search–cross-correlation stimulus set was used mainly to target HVC cells, whereas the selectivity set was used to target the auditory forebrain areas field L and CLM. As a result of their shorter integration times, auditory forebrain neurons can be presented with stimuli more rapidly than HVC neurons. This reduces the amount of time spent at each recording site and increases the data yield per experiment.
Window discriminators were used to obtain the arrival times for spikes recorded simultaneously from each of the two electrodes. Using a digital oscilloscope with memory and average functions, we estimated, based on visual inspection of spike shapes, whether our recordings were from single units, small multiunit clusters (two to five units), or an unclassifiable number of units in each area. Neural activity was systematically sampled by moving the electrode through 50-micron (in HVC or NIf) or 100-micron (in field L or CLM) interval depths until one or a small cluster of units was isolated in each area. We collected cross-correlation data from a single unit or a multiunit cluster in field L or CLM only once before moving to the next unit or set of units 100 microns away. However, for HVC, we often collected data from a single unit or a multiunit cluster for several hours at a time while we continued to move the other electrode through CLM or field L. We adopted this strategy for practical reasons because, given the relatively small depth of HVC, a systematic search in that nucleus would require a series of electrode penetrations that would quickly lead to tissue damage. Our estimates of connectivity are therefore estimates of connectivity between units in the auditory system and one particular region in HVC. Given the range and complexity of responses found in HVC (Leonardo and Fee 2005; Mooney et al. 2002), the actual percentage of functionally connected neurons between the two areas will therefore be higher than what can be assessed with this approach. On the other hand, extracellular recordings in HVC in anesthetized birds have also revealed a high degree of interconnectivity (Margoliash et al. 1994; Sutter and Margoliash 1994), justifying both the feasibility of the approach and the validity of the lower-bound estimate. Typically between one and six electrode penetrations in each area were made per recording day. At the end of each recording pass, two electrolytic lesions (100 μA for 5 s) spaced 200–400 microns apart and placed well beyond the last recording site were made to aid in the later reconstruction of the recording sites in each area.
Histology and anatomical reconstructions
At the end of each recording experiment, the bird was killed with 0.06 ml Equithesin and then transcardially perfused with 0.9% saline followed by 3.7% formalin in 0.25 M phosphate buffer. After perfusion, the brain was postfixed in 3.7% formalin overnight or for several days. Forty- to 50-micron parasagittal sections were prepared using a freezing microtome. Alternate sections were stained with cresyl violet and silver stain to aid in the visualization process. The borders of the relevant regions of the auditory forebrain and HVC, electrode tracks, and lesion sites in each area were viewed at 10× magnification through a dual-frequency interferometric confocal microscope (DICM) and drawn using a drawing tube (courtesy of J. Winer, University of California, Berkeley). Unless stated otherwise, all recordings reported here were determined to have taken place in HVC and either field L, CLM, or NIf on inspection of the histology.
ANALYSIS FOR RESPONSIVENESS, EXCITATION AND INHIBITION, AND SELECTIVITY.
The analysis used for determining the responsiveness of a particular recording site and classifying it as either stimulus-excited or stimulus-inhibited has been described in a previous study (Amin et al. 2004). Briefly, recording sites in each area were considered responsive to auditory stimuli if the firing rate to the BOS or WN was significantly different from the spontaneous firing rate (P < 0.05, two-tailed paired t-test). To classify each site as stimulus excited or stimulus inhibited, we then calculated the Z scores for all responsive sites. The Z score represents the normalized difference between the stimulus-driven mean firing rate and the baseline spontaneous firing rate collected in the 2 s preceding stimulus presentation. We averaged all responses to a particular stimulus class (e.g., all three exemplars of Pips) to calculate the Z score. If a recording site's Z score to the BOS was >0, then the site was considered stimulus excited. Conversely, if a recording site's Z score to the BOS was <0, then the site was considered stimulus inhibited.
Finally, the psychophysical d′ measure was used to quantify the selectivity of each recording site for one stimulus class over the other. This measure has been used previously to quantify neural selectivity in the avian brain (Amin et al. 2004; Janata and Margoliash 1999; Solis and Doupe 1997; Theunissen and Doupe 1998). The d′ measure for preference between two stimuli, A and B, is calculated as follows (1) where μA and μB are the mean responses to stimulus A and stimulus B, respectively, and σ2 is the variance of the response. A d′ value was calculated for all pairwise comparisons before averaging a unit's response to one particular stimulus comparison. For example, we obtained responses to one exemplar of the BOS and three exemplars of Con, yielding three d′ values for the BOS–Con comparison. To obtain one final d′ value for all BOS–Con comparisons, we averaged the three d′ values originally calculated for that particular unit. Recording sites in each area were considered responsive to auditory stimuli if the firing rate to the BOS or WN was significantly different from the spontaneous firing rate (P < 0.05, two-tailed paired t-test). A group of cells in our study was considered selective for the BOS if the average d′ to the BOS was significantly >0 (P < 0.05, one-tailed paired t-test). A criterion of d′ >0.5 has sometimes been used to classify single units as selective (Solis and Doupe 1997). However, when a mean d′ is calculated for a small number of single units that are considered selective on their own, it is possible that the statistics will show that the means d′ of the group is not significantly different from zero. The statistics reported here are with respect to the subset of cells that are functionally connected with HVC and not the individual units within the group.
Data collected from the presentation of each cross-correlation stimulus set were analyzed for synchronized activity by calculating the coherency (Rosenberg et al. 1989), which is based on the cross-covariance function (Aertsen et al. 1989; Perkel et al. 1967), normalized by the product of the autocovariance. The cross-correlation of a spike train rB(t) relative to a second spike train rA(t) as a function of τ [time delay relative to spikes in rA(t); we examined τ values of ≤100 ms] is given by (2) where T is the duration of the signal being analyzed and 〈 〉 indicates that the measure is averaged across all trials. A schematic representation of this calculation can be seen in Fig. 1B.
The cross-covariance corrects for mean firing rates in each neuron, effectively measuring how deviations in firing rate from the expected mean in one recording site are correlated with deviations in firing rate from the expected mean in another recording site. The cross-covariance between neurons A and B is given by (3) where r̄A(t) and r̄B(t) are the time-varying mean firing rates of the neurons. Both the cross-correlation and the cross-covariance are in units of (spikes/s)2, and their absolute values depend on the firing rates of each cell (in the case of the cross-covariance, the mean firing rates). To obtain a normalized measure, the cross-covariance (or the cross-correlation) can be divided by the variance in the firing rates of each cell, effectively obtaining a cross-correlation coefficient measure. The cross-correlation coefficient is given by (4) where and similarly for σB2. This cross-correlation coefficient represents the probability of firing in one cell (the “target” neuron) relative to the firing in the “reference” cell, and varies between −1 and 1, with 1 reflecting perfect correlation and −1, anticorrelation. A cross-correlation coefficient of zero indicates zero linear correlation between the two trains of spikes.
When using cross-correlations to assess functional connectivity, it is critical to correct for correlated firing that results simply from direct stimulus effects causing correlated fluctuations in time-varying mean firing rates (i.e., neurons in two entirely unconnected brain areas might show a correlation if they both fired to BOS). The cross-covariance corrects for these fluctuations because it measures only how trial-to-trial deviations from the time-varying mean rates of each cell are correlated with each other. The cross-covariance can be estimated by calculating the shuffle-corrected cross-correlogram. We calculated the shuffle corrector by correlating the response from A during the ith trial (of N total trials) with the response from B during the i + 1 trial. For i = N, i + 1 is set to be 1. We also calculated the average of all permutations of the shuffled corrector and found that the resulting distribution of coherency peaks quantified by their time delays, widths, and average strengths was very similar to what we observed when we used only one shuffle permutation. We therefore used the single permutation of shuffle corrector for the data here. This shuffle corrector is an estimate of how the mean time-varying rate in neuron A covaries with the mean time-varying rate in neuron B, across trials. In other words, it estimates the second term on the right side of Eq. 3
In practice, the integrals in Eqs. 2 and 3 are estimated by summing over small time bin windows, dt. In our study we used dt values of 10 and 5 ms. The results observed for dt values of 10 ms were similar to those seen for dt values of 5 ms; thus we report only the results for dt values of 10 ms. Smaller dt values yield cross-covariance curves with higher resolution but require more data. Given the bin window dt, the number of trials N, and Tn, the length of the signal in integer units of dt, the shuffle-corrected cross-correlogram is (5) where rAi(j) is the number of spikes recorded from neuron A during trial i in the j th time bin and, similarly, rBi(j + k) in the (j + k) th time bin for neuron B. The shuffle-corrected cross-correlogram can then be normalized by the variance of spike firing rates as shown in Eq. 4, to provide a measure between −1 and 1.
Another possible source of cross-covariance between two neurons that does not reflect true neuronal interaction between these cells is the temporal structure of firing within each response. For instance, assume a spike in neuron A triggers a spike in neuron B; however, neuron A is a bursting neuron and has a high probability of firing again after it has fired once. Thus the second spike in A's burst will also be correlated to the spike in B, although it was actually triggered by the first spike in A. To correct for this type of correlation, we calculated the coherency function (Rosenberg et al. 1989). The coherency function extends the normalization by replacing the variance in the denominator of Eq. 4 by the autocovariance function of each of the two spike trains. This additional normalization takes into account bursting or other temporally structured behavior in either neuron A or B (or both) that would otherwise result in additional, or artificially large and wide peaks in the cross-covariance function. In practice the coherency is calculated in the frequency domain. The coherency is given by (6) where CA−B(ω) is the Fourier transform of the cross-covariance between the responses from A and B, and CA−A(ω) and CB−B(ω) are the Fourier transform of the autocovariance of activity from neurons A and B, respectively. For plotting purposes, the coherency in the time domain is then calculated by taking the inverse Fourier transform of Eq. 6.
STRENGTH OF CORRELATED ACTIVITY.
The peak amplitude or the area underneath the peak of the cross-correlation function is often used to estimate the strength of the correlation (Abeles et al. 1993; Bair et al. 2001; Brecht et al. 1998). However, a better estimate of degree of association is to calculate the average strength across all time delays within the peak. Because correlations at different time delays are not independent in the time domain, this is a complicated calculation in the time domain but it is relatively simple in the frequency domain. To calculate this average for the coherency, one takes the root mean square of the average coherency square in the frequency domain for frequencies below the Nyquist limit given by dt (the time bin window). From Parseval's theorem, however, the mean square of the coherency can also be obtained in the time domain by integrating the square of the coherency over the time bins. To estimate the mean square coherency for each peak, the area under the square of the coherency for that peak was divided by the time bin dt. The area under the coherency squared was estimated from the amplitude square of the peak multiplied by 2.5 times the width of the peak (the factor 2.5 is required to estimate the area under a Gaussian curve). Thus the average coherency strength represented by a peak is The average coherency strength as a measure of the association between two time series is essentially equivalent to the correlation coefficient between two variables and indicates the degree of linear relationship between the variability of two firing rates. Like correlation coefficients, this measure is unitless. It should be noted that, in general, measures of correlation strength are strongly dependent on the size of the time bin, and this must be taken into account when comparing such values across different studies. The coherency for each pair was calculated across trials separately for spontaneous activity and stimulus-evoked activity during the cross-correlation stimulus set.
Finally, for all the cross-correlation measures, the sampling error was estimated using the jackknife resampling technique (Thomson and Chave 1991). In brief, for experimental data based on N trials, one estimates N values of the cross-correlation measures each based on N − 1 trials. The variance in the estimate is then obtained using Tukey's formula where ϑ̂i is the estimate of the cross-correlation measure with the i th trial deleted, ϑ̂All is the estimate obtained with all the trials, and Paired sites were considered to be significantly correlated if peaks in the cross-coherency exceed 3SE.
Our goal in this study was to investigate the possibility that a subset of neurons in two regions of the auditory forebrain, the field L complex and caudolateral mesopallium (CLM), is functionally connected to HVC and that the cells in this subset have different vocalization response properties than other neurons in field L and CLM. In particular, we hypothesized that functionally connected auditory forebrain cells would show a preference for the BOS over Con. Previous electrophysiological studies of field L have shown that although neurons in this region are responsive to complex auditory stimuli including Con, they do not respond preferentially to the BOS over Con (Amin et al. 2004; Bonke et al. 1979; Grace et al. 2003; Lewicki and Arthur 1996; Muller and Leppelsack 1985). We estimated the degree of functional connectivity between HVC and the field L complex or CLM, by measuring the coherency between spike trains recorded simultaneously from HVC and from one of two auditory forebrain areas, field L or CLM (Fig. 1). Based on the coherency peaks, we evaluated the directionality and strength of connections between the two areas. Although the latencies of the coherency peaks do allow for assessment of the directionality of connectivity between the paired sites in our study, they do not allow for conclusive interpretations regarding direct or indirect anatomical connectivity. For this reason, apart from directionality, we limit our discussion of the timing information provided by our analysis to a comparison between short and long latencies and a speculation about how these differences might relate to anatomical connectivity. For a subset of the paired sites that were found to be functionally connected, we also assessed the BOS selectivity of auditory forebrain cells.
Field L and HVC and CLM and HVC show functional connectivity during spontaneous activity
We recorded simultaneously from 338 paired sites of cells or cell clusters in HVC and a putative field L or CLM area in 45 anesthetized male zebra finches. To assess functional connectivity, we calculated first, the cross-covariance, and then the coherency of HVC and field L/CLM activity (see methods). We calculated the cross-covariance and the coherency during both spontaneous and stimulus-evoked activity in all paired sites of cells or cell clusters. For simplicity, throughout the paper, we will use the terms “functional connectivity,” “correlation,” “correlated,” and “functionally connected” to describe the results of our coherency analysis. Unless specified otherwise, it should be assumed that we are using these terms to refer only to the coherency peaks and not to the raw cross-correlation or nonnormalized cross-covariance peaks. As will be further elaborated in the discussion, we consider functionally connected cells to be cells that are probably connected through one or a small number of synapses.
Fifty-nine of the 338 paired sites (17.45%) were found to be significantly correlated during spontaneous activity. Histological reconstruction confirmed that 54% of the correlated paired sites (32/59) were field L–HVC paired sites, 32% (19/59) were CLM–HVC paired sites, and 2% (1/59) were NIf–HVC paired sites (see Table 1). The remaining 12% (7/59) were HVC–unknown area paired sites and were excluded from further analysis. We recorded from a total of 181 field L–HVC paired sites, 57 CLM–HVC paired sites, and 2 NIf–HVC paired sites. Seven paired sites were found to be on the border of CLM and L1 (CLM/L1–HVC paired sites), one was on the border of NIf and L2a (NIf/L2a–HVC paired sites), and the remaining 89 paired sites (HVC–unknown area paired sites) could not be fully characterized as a result of the loss of the brain sections belonging to several subjects in which histological reconstruction had not yet been completed.1 Due to the small number of NIf–HVC paired sites, and paired sites where one of the cells or cell clusters was on the border of two different regions, CLM/L1–HVC and NIf/L2a–HVC, we chose to exclude these paired sites from our analyses and instead focus on the CLM–HVC and field L–HVC paired sites we found.
Typical data examples of field L–HVC and CLM–HVC coherency functions can be seen in Fig. 2. In these sites (Fig. 2, A and B), we found significant coherency peaks during spontaneous activity but not during stimulus-evoked activity. This effect was typical of our data set (46 or 78%). Significant spontaneous activity coherency peaks were well fit by a Gaussian function (mean r2 = 0.87 ± 0.13, n = 59). Figure 2C shows an instance in which we found a significant coherency peak during both spontaneous and pooled-stimulus–evoked activity. This result will be subsequently discussed in more detail.
The correlation strengths of field L–HVC and CLM–HVC paired sites did not differ significantly (Wilcoxon signed-rank test, P = 0.61). Strengths ranged from 0.025 to 0.447 for field L–HVC paired sites and from 0.029 to 0.376 for CLM–HVC paired sites, with most strength values clustering around the lower end of the range (Fig. 3). The correlation latencies of field L–HVC and CLM–HVC paired sites had a very wide range (−22.98 to 29.60 ms in field L–HVC paired sites and −20.04 to 14.65 ms in CLM–HVC paired sites), but most of the peaks fell within a few milliseconds of zero. Peaks displaced positively in time occurred 76% of the time. Positive time delays are consistent with the anatomical evidence, suggesting that field L and CLM project either directly or indirectly to HVC (Vates et al. 1996). In 4 of 51 cases, coherency peaks with long negative latencies (greater than −5 ms) were observed in both field L–HVC and CLM–HVC paired sites. Peaks with these features are consistent with longer feedback loops through the outer areas of HVC, RA, and nucleus ovoidalis (Mello et al. 1998; Vates et al. 1996).
The effect of unit type on correlated activity in field L and CLM
We recorded from both single units and multiunit clusters in the auditory forebrain and HVC (see Table 1 for a detailed breakdown). In HVC, most of the recordings (263/338 or 78%) consisted of multiunit data. In the auditory forebrain 187/338 (55%) were multiunit, 101/338 (30%) were single units, and 50/338 (15%) were unclassified. Units were considered unclassified if visual inspection of spike shapes using window discriminators did not allow for the experimenter to clearly discriminate between one or more spike waveforms.
The nature (single unit vs. multiunit) of the recording has the potential to affect the results as a consequence of the correlation among the units in the cluster recorded in the multiunit data. In the case of a single direct connection between the units recorded from the two sites, the correlation signal is attenuated by multiunit recordings and this attenuation is greater if the local units are independent (Gerstein 2000). On the other hand, if the units in the cluster are not independent and if multiple direct or indirect connections between units in the two recording sites exist, then the correlation signal relative to the noise can add up and facilitate detection. In the first scenario, multiunit recordings would lead to an underestimate of functional connectivity, whereas in the second scenario it would facilitate the detection of true positives (or reduce the chance of a Type II error). Our data suggest that we might be dealing with the second scenario because the percentage of connected neurons is higher for the multiunit recordings than for the single-unit recordings: for the 253 paired recordings where we had multiunit activity in HVC, 8/79 (10%) recordings showed a significant correlation when single-unit activity was obtained in the forebrain and 34/174 (20%) showed a significant correlation when multiunit activity was obtained in the forebrain. A chi-square test for independence indicates that this difference is significant at the 5% level (χ2 = 3.84, df = 1, P = 0.05). An estimate of functional connectivity based solely on single-unit recordings might therefore be an underestimate of the actual number of functional connections because of Type II errors. For the cases in which we detected significant cross-correlations, we found no significant differences between either the strengths or the latencies of field L–HVC [strengths: F(3,27) = 0.835, P = 0.486; latencies: F(3,27) = 0.915, P = 0.447], or CLM–HVC sites [strengths: F(2,14) = 1.05, P = 0.377; latencies: F(2,14) = 1.05, P = 0.576] whether recordings were from single–single, multi–single, multi–multi, or unclassified–unclassified recordings.
The effect of stimulus excitation and inhibition on correlated activity in field L and CLM
At each recording site, a search or a search–cross-correlation stimulus set was played to assess the responsiveness of each cell or cell cluster in the pair. Because we were interested in assessing 1) the overall amount of functional connectivity between HVC and auditory forebrain areas field L and CLM, irrespective of stimulus preference, and 2) the BOS selectivity of auditory forebrain cells or cell clusters that are functionally connected with HVC cells or cell clusters, we did not limit our recordings to cells or cell clusters that were responsive to our particular stimulus set. As a result, many recordings were from putative auditory forebrain–HVC stimulus-excited–unresponsive paired sites (EN, 44; see Table 2). The greatest number of paired sites involved stimulus-excited units in both the putative auditory forebrain areas field L or CLM and HVC (EE, 155). Histological analysis confirmed that there were 92 field L–HVC EE paired sites, 44 field L–HVC EN paired sites, 30 CLM–HVC EE paired sites, and 12 CLM–HVC EN paired sites. Seventeen percent (16/92) of the field L–HVC EE paired sites and 27% (12/44) of the EN paired sites showed correlated activity. Higher percentages of correlations were found in CLM–HVC paired sites, with 37% (11/30) of EE paired sites and 42% (5/12) of EN paired sites showing significant correlations. There were no significant differences between the correlation strengths found in EE paired sites and EN paired sites in either field L–HVC [t(26) = 0.210, P = 0.836] or CLM–HVC correlations [t(14) = 0.565, P = 0.581]. There was also no difference between the latencies of EE or EN paired sites for field L–HVC [t(26) = −0.5315, P = 0.600] or CLM–HVC cases [t(14) = −1.4292, P = 0.1749].
Differences in correlations across subregions of the field L complex
Field L is a large auditory region with several distinct subregions (L1, L2a, L2b, and L3). To fully characterize field L–HVC functional connectivity, we sampled each of the individual subregions of field L over the course of our recordings. The highest percentage of significant correlations during spontaneous activity was in L1 with 13 of 49 (27%) L1–HVC paired sites showing functional connectivity. L1–HVC paired sites had latencies that were fairly evenly spread from 0.4 to 9 ms with a median value of 3.7 ms and one outlier on each end of the distribution (−2.2 and 10.6; see Fig. 4). These L1–HVC paired sites also had a median coherency strength of 0.12 with a distribution skewed toward much higher values. These strength values were generally greater than those seen in field L–HVC paired sites localized to other subregions of field L.
We recorded from 61 L2b–HVC paired sites and found only 9 (15%) significant correlations. L2b–HVC paired sites had correlation latencies that completely overlapped with the L1–HVC latency distribution (median = 1.4 ms, range = −1.7 to 5.6 ms), although the distribution was less evenly spread. The strengths of these correlations ranged from 0.03 to 0.29 (median = 0.06) and were generally lower than those seen in L1–HVC paired sites.
Four of 37 (11%) L3–HVC paired recordings demonstrated functional connectivity. These L3–HVC paired sites generally showed longer positive latencies than those seen in any of the other field L–HVC paired sites (range = 5.6–29.6; median = 8.36). The strengths of these L3–HVC paired sites were typical of most of the field L–HVC paired sites we found (excluding L1–HVC paired sites), ranging from 0.06 to 0.14 with a median of 0.10. The longer latencies seen in these L3–HVC paired sites suggest a multistage route of information travel from field L to HVC.
Next to L1, L2a had the highest percentage of sites showing significant correlations (3 of 18, or 17%). The strengths of all three correlations were typical of those found in field L subregions L2b and L3 (range = 0.04–0.16). Two of the three L2a correlated sites had long negative latencies (−16.88 and −22.98 ms), whereas the third site had a latency fairly close to zero (0.013). Thus only 1/18 (∼5%) of L2a cells could be considered as providing input to HVC. The long negative latencies were unexpected and could represent a feedback route that travels from HVC through the outer regions of HVC (shelf), RA (cup), and Ov (shell) before reaching field L (L1, L2b, or L3) and then, finally, L2a (see Fig. 1).
Our histological review revealed a small number of field L–HVC paired sites on the border of L1 and L2a and on the border of L2a and L3. These cases showed that one of three (33%) L1/L2a–HVC paired sites and two of five (40%) L2a/L3–HVC paired sites were significantly correlated. Because these border field L–HVC paired sites could not be localized to any particular subregion of field L, they were excluded from statistical analysis of field L subregions. However, because these cells were certainly in the field L complex, they were included in statistical analysis that considered the entirety of the field L complex.
Based on putative anatomical connections between L1 and HVC, and L3 and HVC (Fortune and Margoliash 1995; Vates et al. 1996), we expected to find more paired sites with correlations in these areas than in any of the other subareas of field L. Despite a trend toward a greater number of correlations in L1–HVC paired sites, statistical analysis indicated that there was no significant difference in the number of correlations across field L subregions [χ2 (3, n = 165) = 4.23, P > 0.05 when all three significant correlations in L2a are taken into account; χ2 (3, n = 165) = 6.19, P > 0.05 when only the positive delay correlation in L2a is counted].
Field L–HVC and CLM–HVC functional connectivity changes between spontaneous and stimulus-evoked firing
Of the 32 field L–HVC paired sites with significant correlations during spontaneous activity, only 9 were significantly correlated during stimulus-evoked activity when the data were pooled across stimuli in each cross-correlation set. Likewise, of the 19 CLM–HVC paired sites with significant correlations during spontaneous activity, only 4 of them showed a significant correlation during pooled-stimulus–evoked activity. There were no instances in which a paired site showed a significant correlation peak during pooled-stimulus–evoked activity and not during spontaneous activity. In general, significant spontaneous activity coherency peaks and stimulus-evoked activity coherency peaks closely resembled each other with no significant differences between the strengths or the latencies of the coherency functions in the two categories for field L–HVC [strengths: t(39) = 0.0241, P = 0.9809; latencies: t(39) = −0.3896, P = 0.6989] or CLM–HVC paired sites [strengths: t(21) = −0.2486, P = 0.8061; latencies: t(21) = −1.0230, P = 0.3180].
Figure 2C shows one case in which a CLM–HVC paired site was significantly correlated during both spontaneous and pooled-stimulus–evoked activity. Paired sites that showed a significant correlation during both spontaneous and stimulus-evoked activity exhibited stronger peaks in their spontaneous activity coherency functions than those that showed a significant correlation only during spontaneous activity (Wilcoxon signed-rank test, P = 1.98 × 10−4). There was no significant difference between the latencies (Wilcoxon signed-rank test, P = 0.56) or the spreads of the spontaneous activity coherency functions in each of the two groups (Wilcoxon signed-rank test, P = 0.95). Paired sites that demonstrated functional connectivity during both spontaneous activity and stimulus-evoked activity had peaks in their spontaneous activity coherency functions that were better fit by Gaussian functions than those that showed functional connectivity during spontaneous activity alone (Wilcoxon signed-rank test, P = 0.0012).
To test for the effect of stimulus type on functional connectivity, correlations were calculated by stimulus type for all paired sites presented with more than one stimulus type that were found to be significantly correlated during pooled-stimulus–evoked activity. We found that correlations that were significant during pooled-stimulus–evoked activity were not necessarily significant during playback of each individual stimulus. Three of the six field L–HVC paired sites and two of the four CLM–HVC paired sites that showed a significant correlation during pooled-stimulus–evoked activity also demonstrated a significant correlation during playback of the BOS. One of the six field L–HVC paired sites did not display a significant correlation during the BOS, but displayed one during playback of the Rev. None of the paired sites that were significantly correlated during pooled-stimulus–evoked activity demonstrated functional connectivity during playback of white noise. Moreover, only one field L–HVC pair revealed a significant coherency peak during playback of the BOS and the Rev stimulus.
Although not all paired sites that were significantly correlated during pooled-stimulus–evoked activity had coherency peaks that reached significance during playback of individual stimuli, many of them still contained peaks from which the latencies and strengths of the correlations could be measured. The average difference between the coherency strength of the correlation found during each individual stimulus (the BOS, the Rev, and WN) and the pooled-stimulus–evoked activity for eight paired sites that were played the same stimuli across 15 trials is shown in Fig. 5. The average BOS–pooled stimulus coherency strength difference was significantly different from the average Rev–pooled stimulus coherency strength difference and the average WN–pooled stimulus coherency strength difference [repeated-measures ANOVA: F(2,7) = 6.40, P = 0.011]. This suggests that for the small number of paired sites that maintain functional connectivity while processing auditory stimuli, this connectivity is stronger during processing of the BOS than during the processing of each of the other stimuli.
Assessing stimulus selectivity in HVC and field L
To assess the responsiveness of each particular cell or cell cluster in a pair, we played a search or a search–cross-correlation stimulus set at each recording site. We classified a cell as either responsive or unresponsive based on the difference between its response to either the bird's own song or white noise and its spontaneous activity (see methods). In 90% of the histologically identified auditory forebrain–HVC paired sites (222/248), at least one member of the pair was classified as responsive (see Table 2). The remaining 10% were classified as unresponsive in both regions and were excluded from further analysis. Of the 222 responsive paired sites, 99 were responsive in both the auditory forebrain and HVC, 89 were responsive in the auditory forebrain and unresponsive in HVC, and 44 were responsive in HVC and unresponsive in the auditory forebrain. We attribute this unusually high number of unresponsive HVC units to the fact that in procedure A, our interstimulus interval was not optimized for HVC's temporal integration time (see methods). Once we modified our protocol to improve stimulus timing, we saw a decrease in the number of unresponsive HVC cells; 78 of 157 or about 50% of the HVC cells that were part of procedure A were classified as responsive, whereas 165 of 181 or 91% of the HVC cells that were part of procedure B were classified as responsive. In general, responsiveness was usually indicative of excitation to the BOS. In only 17 cases did the auditory forebrain unit or unit cluster show stimulus inhibition and in only 7 instances did the HVC cell show stimulus inhibition. For this reason, our discussion will focus on the selectivity analysis of stimulus-excited cells.
A particularly robust example of an EE pair of simultaneously recorded auditory forebrain–HVC cell clusters is shown in Fig. 6. Qualitatively speaking, the auditory forebrain cell cluster responds to several different natural stimuli with very consistent trial-to-trial spiking. This particular unit cluster was localized to CLM, but similar responses were observed in the various subregions of field L. In contrast, by visual inspection, one notes that HVC cells respond vigorously only to the bird's own song and show greater trial-to-trial variability in spike timing. As mentioned earlier, auditory forebrain–HVC EE and EN paired sites accounted for the majority (44) of significant correlations found during spontaneous activity. More specifically, 16 field L–HVC and 11 CLM–HVC EE paired sites were significantly correlated during spontaneous activity, whereas 12 field L–HVC and 5 CLM–HVC EN paired sites also showed a significant correlation (see Table 2). The remaining correlations were distributed among the three categories of responsiveness (stimulus-excited, stimulus-inhibited, and unresponsive) and were excluded from further analysis.
Stimulus selectivity in HVC
We used the search and search–cross-correlation stimulus sets to identify BOS-selective cells in HVC. After the completion of each experiment, stimulus-excited HVC cells were analyzed for stimulus selectivity by calculating the d′ of pairwise comparisons between the BOS and the Rev, and the BOS and WN. The d′ measures the normalized difference between responses to two stimuli in pairwise comparisons. Twenty-seven HVC cells or cell clusters were part of a significantly correlated (during spontaneous activity) pair that was played the BOS, Rev, and WN during the search or search–cross-correlation stimulus set.
We calculated the mean d′ for the BOS–Rev comparison in these 27 HVC units and found a d′ significantly greater than zero (d′ = 1.68 ± 0.31, P < 0.05). We calculated the mean d′ for the BOS–WN comparison in these 27 cells as well as a cell presented with BOS and WN, but not Rev, and similarly found a significantly positive mean d′ (BOS–WN d′ = 1.96 ± 0.36, P < 0.05), indicating that, on average, an HVC cell or cell cluster preferred the bird's own song over either the Rev or WN (Fig. 7). This BOS selectivity is similar to that reported in previous studies (for a summary, see Theunissen et al. 2004). There was no difference between the BOS selectivity found in HVC cells or cell clusters with significant auditory forebrain correlations and those in HVC cells or cell clusters that had the same stimuli played to them and were not significantly correlated during spontaneous activity [t(95) = 1.0014, P = 0.3192].
Stimulus selectivity for natural over synthetic stimuli in field L and CLM
One of the goals of this study was to assess the selectivity of field L cells that were functionally connected to HVC. To accomplish this, we presented a selectivity stimulus set containing natural and synthetic stimuli, similar to that presented in previous studies (Amin et al. 2004; Grace et al. 2003), to a subset of the simultaneously recorded paired sites. This set contained the song stimuli BOS, the Rev, Revorder, and Con, and the synthetic stimuli Pips, Tones, and WN (see methods). The song stimuli can be used to directly measure selectivity for the BOS, whereas analysis of the responses to synthetic stimuli provides an additional assessment of neuronal tuning properties and selectivity for conspecific songs in general. We evaluated the preference for Con over synthetic stimuli, Pips, Tones, and WN, in stimulus-excited auditory forebrain cells or cell clusters that were significantly correlated with HVC by calculating the d′ relative to Con and comparing it to d′ values calculated for cells that were not significantly correlated with HVC (Fig. 8). For the most part, our results were similar to those reported in a previous study (Grace et al. 2003). We found a mean d′ significantly greater than zero for the Con-Pips comparison in both field L and CLM cells, regardless of functional connectivity (field L correlated, d′ = 3.46 ± 0.85 vs. uncorrelated, d′ = 2.31 ± 0.69; CLM correlated, d′ = 1.57 ± 0.59 vs. uncorrelated, d′ = 2.34 ± 0.88). We also found a preference for Con over Tones in field L cells that were part of a significantly correlated pair (d′ = 1.88 ± 0.45) and in CLM cells that were part of a pair that did not show a significant correlation (d′ = 2.27 ± 1.21). The mean d′ was similarly positive for the Con–Tones comparison in the other two groups (field L uncorrelated, d′ = 1.16 ± 0.57; CLM correlated, d′ = 1.55 ± 0.79), but was not significantly different from zero at the 1% level. There were no significant differences between the responses to Con and the responses to WN in any of the four groups (field L correlated, d′ = 0.54 ± 0.93 vs. field L uncorrelated, d′ = −0.30 ± 0.82; CLM correlated, d′ = 1.06 ± 1.29 vs. CLM uncorrelated, d′ = 0.55 ± 1.11). A similar result was reported previously (Grace et al. 2003) and was attributed mainly to the strong onset response characteristic of field L and CLM cells presented with white noise. Similar onset responses were also observed in the present study.
Stimulus selectivity for the bird's own song in field L and CLM
We calculated the mean d′ for the BOS–Rev and BOS–Con comparisons for all stimulus-excited field L (23 correlated, 21 uncorrelated) and CLM cells (13 correlated, 3 uncorrelated) that were played the stimulus selectivity set. The results showed that field L and CLM cells that were members of a significantly correlated auditory forebrain–HVC pair had a slight preference for responding to the BOS over the Rev (field L, d′ = 0.46 ± 0.16; CLM, d′ = 1.03 ± 0.36; Fig. 9). Field L stimulus-excited cells that were members of an uncorrelated auditory forebrain–HVC cell pair did not show this preference (d′ = −0.05 ± 0.15), whereas CLM stimulus-excited cells that were members of an uncorrelated pair had a positive mean d′ (d′ = 0.67 ± 0.33), but it was not significant at the 0.01 level. The d′ for the BOS–Con comparison was not significantly different from zero in any of the four groups (field L correlated, d′ = −0.66 ± 0.37; CLM correlated, d′ = −0.21 ± 0.61; field L uncorrelated, d′ = −0.19 ± 0.39; CLM uncorrelated, d′ = −0.99 ± 0.76), indicating suppression of the response to the BOS relative to Con. This trend has been reported and discussed in a previous study (Amin et al. 2004). Although the slight preference for BOS over Rev may be interpreted as BOS selectivity in brain regions that do not respond well to conspecific songs other than the bird's own song played in the forward condition (e.g., the song system nuclei), in field L and CLM where strong responses to forward conspecific song are typical, this “preference” for the BOS over the Rev is more readily attributed to a preference for the natural order found in conspecific song (Amin et al. 2004; Woolley et al. 2006).
Fourteen field L cells and nine CLM stimulus-excited cells were part of a significantly correlated EE paired site. Mean d′ calculations for the BOS–Rev comparison in these instances yielded similar results to those of all stimulus-excited auditory forebrain cells with a slightly positive d′ for the BOS–Rev comparison in the field L and CLM cells that were part of a significant correlation (field L, d′ = 0.63 ± 0.15; CLM, d′ = 1.48 ± 0.40) and a d′ close to zero for the BOS–Rev comparison in uncorrelated cells (field L, d′ = −0.06 ±.015; CLM, d′ = 0.22 ± 0.0). The tendency to suppress the response to BOS relative to Con was also seen in all four groups (field L correlated, d′ = −0.88 ± 0.55; CLM correlated, d′ = −0.64 ± 0.82; field L uncorrelated, d′ = −0.44 ± 0.43; CLM uncorrelated, d′ = −0.56 ± 0.0). Because auditory forebrain–HVC paired sites of stimulus-excited–nonresponsive cells contained the second largest number of significant correlations, we calculated the d′ for the BOS–Rev and BOS–Con comparisons for the auditory forebrain cells in these groups (data not shown). We found that there was no significant difference between either the BOS and the Rev, or the BOS and Con comparison in either field L or CLM, correlated or uncorrelated cells (BOS–Rev; field L correlated, d′ = 0.27 ± 0.35; CLM correlated, d′ = 0.01 ± 0.50; field L uncorrelated, d′ = 0.03 ± 0.56; CLM uncorrelated, d′ = 0.90 ± 0.41; BOS–Con; field L correlated, d′ = −0.36 ± 0.50; CLM correlated, d′ = 0.76 ± 0.64; field L uncorrelated, d′ = 1.36 ± 0.0; CLM uncorrelated, d′ = −1.20 ± 1.26).
The key measure in our study is the extent to which auditory forebrain cells that are significantly correlated with BOS-selective HVC cells demonstrate BOS selectivity. We hypothesized that this subset of auditory forebrain cells might be more selective for the BOS than other auditory forebrain cells because these cells coordinate their response timings with BOS-selective HVC cells. This selectivity analysis was performed on six field L–HVC paired sites and seven CLM–HVC paired sites. Again, we found that there was a slight preference for the bird's own song over the bird's own song played in reverse in field L and CLM correlated but not uncorrelated units (field L correlated, d′ = 0.83 ± 0.23; CLM correlated, d′ = 1.63 ± 0.51; field L uncorrelated, d′ = −0.09 ± 0.22; CLM uncorrelated, d′ = 0.23 ± 0.0) and an insignificant negative d′ for the BOS–Con comparison in all four groups (field L correlated, d′ = −0.67 ± 1.05; CLM correlated, d′ = −0.55 ± 1.07; field L uncorrelated, d′ = −0.60 ± 0.39; CLM uncorrelated, d′ = −0.56 ± 0.0). This suggests that there is no difference between the BOS selectivity of the auditory forebrain cells or cell clusters that are functionally connected with BOS-selective HVC cells or cell clusters and those that are not.
Relationship between functional connectivity and stimulus preference for individual cells
Although we did not find any differences in the average selectivity between correlated and uncorrelated neurons in the auditory forebrain, it is possible that the degree of selectivity within correlated paired sites shares a functional relationship with the degree of correlation. To check for this possibility, we plotted the BOS–Rev d′ values of HVC cells that were members of a significantly correlated EE pair against the strengths of the correlations (Fig. 10). Visual inspection of the scatterplot did not show a linear relationship between these variables and this was verified by estimating the linear regression coefficient (r = 0.24, P = 0.28). We plotted the BOS–Con d′ values for field L and CLM cells that were members of a functionally correlated EE pair against these correlation strengths as well. Again, we found that there was no significant linear relationship between these two values (r = 0.19, P = 0.53). Finally, we wanted to assess the relationship between stimulus selectivity in individual HVC cells and stimulus selectivity in individual auditory forebrain cells. To this end, we plotted the relationship between the BOS–Rev d′ for auditory forebrain cells and the BOS–Rev d′ for HVC cells that were members of a significantly correlated EE pair. As seen in Fig. 10C, the BOS–Rev d′ values for HVC are high and are characteristic of the preference for BOS in that nucleus. The BOS–Rev d′ values in field L and CLM are much lower and, as explained earlier, characteristic of a preference for the natural order of sound in all conspecific songs. In this correlation analysis, we tested whether the subset of neurons that are functionally connected to HVC would show larger d′ values suggestive of a preference for the BOS and whether such putative selectivity would be correlated in field L or CLM and HVC. Visual inspection of Fig. 10C and statistical analysis both show that there was in fact no relationship between the selectivity for the BOS in the two areas (r = 0.30, P = 0.25).
We used simultaneous extracellular recordings and cross-correlation analysis to quantitatively determine the degree of functional connectivity between two auditory forebrain areas, field L and CLM, and song system nucleus HVC. When single-unit and multiunit data are analyzed together, functionally connected paired sites were found nearly 18% of the time in field L–HVC recordings and nearly 33% of the time in CLM–HVC recordings. When single-unit data are analyzed separately, the percentage of paired sites that were functionally connected, according to our statistical criterion, was smaller because of lower signal-to-noise levels (see results): about 10% of paired recordings are found to be functionally connected in field L–HVC and around 20% in CLM–HVC. Given the chance of Type II errors and the fact that these percentage numbers were obtained for a single recording location in HVC, the actual percentage of neurons in field L or CLM that are functionally connected to HVC are bound to be higher than these estimates. Therefore neurons in HVC appear to receive convergent input from a significant fraction of auditory cells in field L and CLM.
Coherency strengths for field L–HVC and CLM–HVC paired sites were similar to each other and to those reported for the song system (Kimpo et al. 2003), where anatomical connectivity is well established. For a subset of the functionally connected auditory forebrain–HVC paired sites found in the present study, we investigated the responses of auditory forebrain cells to song, matched-synthetic, and white noise stimuli. In accordance with a model for the hierarchical processing of sounds that are important for song learning, we hypothesized that field L and CLM cells that are functionally connected to HVC would show differential processing of song stimuli relative to cells in these regions that are not connected to HVC. Our results suggest that that is not the case. In particular, auditory forebrain cells in field L and CLM that are functionally connected to HVC demonstrate similar stimulus preferences for natural stimuli over matched synthetic stimuli as those seen in cells that are not functionally connected to HVC.
In retrospect, the finding that auditory forebrain cells that are functionally connected with HVC are not specialized for processing the bird's own song might not be entirely surprising. Our study shows that the song system nucleus HVC appears to receive auditory input from a significant fraction of auditory neurons in CLM and field L. Given that these auditory areas are not specialized for processing the bird's own song, it is therefore somewhat expected to find that the extensive auditory input to the song system from field L and CLM is not particularly selective. These results, however, raise three related questions: How does the functional connectivity measured here relate to anatomical measures of connectivity? Where does the selectivity for the BOS arise? And what is the role of this nonselective auditory input?
Auditory forebrain connectivity with HVC
Studying the connectivity between the auditory forebrain and HVC at a functional level gives us the opportunity to compare our findings with previous studies of anatomical connectivity between these regions. Such anatomical studies have been unable to provide a clear picture of all the pathways that lead from the auditory forebrain to HVC. An indirect route from the field L complex to HVC through caudal lateral mesopallium (CLM) and nucleus interfacialis (NIf) has been well documented (Fortune and Margoliash 1995; Vates et al. 1996; see Fig. 1), but a direct route remains uncertain. Although our range of field L–HVC and CLM–HVC latencies are consistent with the accepted indirect pathway through NIf, they also provide support for the presence of a direct connection between these regions.
An early study using anterograde labeling in field L in canaries concluded that field L projects to a fibrous area surrounding HVC (HVC shelf) but not directly to HVC proper (Kelley and Nottebohm 1979). Other evidence from tracing experiments in zebra finches suggests that whereas many field L axons extend into HVC shelf rather than HVC proper, L1 and L3 may project a small number of axons into HVC proper (Fortune and Margoliash 1995; Vates et al. 1996). Fibers of passage in the HVC shelf region have precluded definitive conclusions from being drawn about whether L1 and L3 projections synapse in HVC. An additional indirect pathway that has been put forth involves contact between HVC spiny dendrites extending into HVC shelf, axonal projections from the field L complex to HVC shelf, and HVC shelf projections into HVC (Katz and Gurney 1981; Vates et al. 1996). It has been suggested that auditory information entering the shelf from field L (or CLM) could potentially be passed onto HVC after being processed in the shelf.
The majority (76%) of our functionally connected field L–HVC paired sites showed positive latencies or latencies close to zero, suggestive of a feedforward connection from field L to HVC. L1–HVC paired sites had strengths that were generally greater than those seen in field L–HVC paired sites localized to other field L subregions (see Fig. 5). Functionally connected L1–HVC paired sites with short positive latencies (≤5 ms) and high strength values could represent the class of L1 cells that have been proposed to pass auditory information directly onto HVC. Given the anatomical evidence for an indirect connection from CLM to HVC by NIf, it seems likely that L1–HVC paired sites with latencies at the higher end of the range pass information through a minimum of two processing stages (e.g., CLM and NIf) before reaching HVC. Although detecting a correlation between two areas across multiple synapses is not common, such correlations have already been reported in the zebra finch song system using very similar cross-correlation analyses (Kimpo et al. 2003). Furthermore, previous studies that have calculated cross-correlations or spike-triggered averages between NIf and HVC have found correlation latencies or spike-triggered average onsets that were nearly 0 ms, revealing a very tight correlation between the spike timing in these two areas (Coleman and Mooney 2004; Janata and Margoliash 1999). If this tight correlation is also indicative of travel across field L–CLM synapses and CLM–NIf synapses, then a correlation between field L and HVC that has a latency of 5–10 ms would be well within the range expected from such a polysynaptic connection.
L2b–HVC paired sites had similar latencies but lower strengths than those found in L1–HVC paired sites. These weaker connections could represent an anatomically unidentified class of L2b cells that project directly to HVC, but because it has been repeatedly shown that this region does not project directly to HVC through retrograde and anterograde tracing experiments (Fortune and Margoliash 1995; Vates et al. 1996), this possibility seems unlikely. As in the case of L1–HVC paired sites, it seems possible that L2b–HVC paired sites with positive latencies close to 5 ms reflect L2b cells that send information to HVC indirectly, by CLM and NIf.
The L3–HVC paired sites found in our data set generally showed longer positive latencies than those seen in any of the other field L–HVC paired sites, whereas their strengths were typical of most of the field L–HVC paired sites we found (excluding L1–HVC paired sites). Given the connectivity found within the field L complex and between field L, NCM, and CLM, it is possible that information traveling from L3 to HVC passes through many processing stages before reaching HVC (e.g., L3 → CLM → NIf → HVC as for L1–HVC paired sites but also L3 → NCM → CMM → CLM → NIf → HVC). The fact that no L3–HVC paired sites were found to have latencies very close to zero suggests that we did not find any functionally connected L3–HVC paired sites that have been proposed to send axons directly into HVC (Fortune and Margoliash 1995). Given our relatively small sample size in this subregion (37 cells or cell clusters), we cannot exclude the possibility that we did not record from such putative neurons.
Our study of functional connectivity between the auditory forebrain and HVC also revealed several correlated CLM–HVC paired sites of cells or cell clusters. The latencies and strengths of these CLM–HVC paired sites overlapped with those of the field L–HVC paired sites discussed earlier. Although there is no anatomical evidence suggesting that CLM projects directly to HVC, the very short latencies observed suggest that this possibility should be revisited. The longer latencies found between CLM and HVC are consistent with the known feedforward connections between CLM and HVC by NIf and potentially through some reciprocal connectivity involving field L, CMM, and/or NCM.
We found four functionally connected auditory forebrain–HVC paired sites within our data set that had long negative latencies. Histological analysis revealed that two of these paired sites were L2a–HVC paired sites: one was a field L–HVC pair identified as being on the border of L1 and L2a and one was a CLM–HVC pair. The strengths of these paired sites did not differ from field L–HVC paired sites in other subregions (excluding L1–HVC paired sites). These long negative latencies were unexpected and could represent a feedback route that travels from HVC through the outer regions of HVC (shelf), RA (cup), and Ov (shell) before reaching field L (L1, L2b, or L3) and then, finally, L2a (see Fig. 1; Mello et al. 1998). For the CLM–HVC pair, this feedback route would likely be the same but with CLM rather than L2a as the endpoint. Given that these negative latency connections were very rarely detected, it is probable that with a much larger data set, we would find similar paired sites of cells in other field L subregions.
Our data set contained a small number of significantly correlated paired sites with latencies that were within 1 ms of zero (4 or 13% of field L–HVC paired sites and 3 or 16% of CLM–HVC paired sites). Correlation latencies close to zero indicate that the auditory forebrain and HVC cells fire simultaneously on a regular basis. Such closely timed spiking often implies a common source of input to the two regions. This could be true in our case, but because there are currently no candidate areas for this role, it seems more likely that using cross-correlation analysis on extracellular recordings from multiunit clusters of cells does not permit fine enough time resolution to distinguish between zero and slightly positive or slightly negative time delays. A future study using intracellular recording and cross-correlation analysis in these two areas might help resolve this issue.
One of the most surprising results of our study was the relatively large fraction of auditory neurons in field L and CLM that were functionally connected to neurons in the song system. These numbers show that, although direct anatomical connectivity between two regions can be small or absent (as is the case for the direct connection between field L or CLM and HVC), functional connectivity can remain high. We postulated earlier that this functional connectivity arises because it includes indirect pathways and because of a high degree of interconnectivity within each area. A similar observation was made regarding the functional connectivity in the song system (Kimpo et al. 2003). These facts exemplify the importance of using complimentary methods to assess connectivity in neural circuits.
The absence of specialized processing in functionally connected auditory forebrain cells and the emergence of BOS selectivity in the song system
Previous studies have shown that cells in the field L complex do not respond preferentially to the BOS over Con and therefore the general pool of neurons in this region does not contribute to the BOS selectivity seen in NIf or HVC (Amin et al. 2004; Lewicki and Arthur 1996). Nucleus interfacialis (NIf) cells have been shown to coordinate their firing with cells in HVC (Coleman and Mooney 2004; Janata and Margoliash 1999), confirming the presence of a functional connection that corresponds to the well-established anatomical connection. Moreover, NIf cells exhibit a high degree of BOS selectivity with d′ values that were either slightly lower than those seen in HVC (Janata and Margoliash 1999) or equivalent (Coleman and Mooney 2004).
We hypothesized (Amin et al. 2004) that cells within the auditory forebrain could contribute to the selectivity seen in the song system through a subset of cells in field L and/or CLM that demonstrate selectivity for the bird's own song and are connected to HVC or NIf. Support for such an idea can be found in studies of other systems that have shown that functionally connected neurons share similar receptive field properties (Alonso et al. 2001; Usrey et al. 1999) and frequency ranges (Creutzfeldt et al. 1980; Miller et al. 2001). Our test for this putative subset of auditory forebrain cells revealed that there is a large subset of neurons dispersed throughout field L and CLM that are functionally connected with HVC neurons but that this subset is not more selective than the entire population of auditory neurons: functionally connected auditory forebrain cells show the same preferences for natural song stimuli over matched synthetic stimuli as those cells that are not functionally connected with HVC. There are two consequences to this finding. First, it appears that song representation, including representation of the BOS, at the level of the auditory system is still highly distributed, requiring a large number of neurons. Second, the construction of the BOS-selective responses at the single-neuron level must be built into either the song system nucleus, NIf, or the circuitry immediately presynaptic to NIf.
If BOS-selective responses are built into the circuitry immediately preceding NIF, then auditory forebrain cells that are functionally connected with BOS-selective song system cells could be selective for particular features of the BOS stimulus rather than the stimulus as a whole: these feature-selective neurons would still respond to many conspecific songs as well as to the song played in reverse and therefore not be distinguished in our analysis. A form of constructive convergence would then be present at the interface between CLM and NIf, leading to BOS-selective neurons either in NIf or in a still smaller subset of neurons in CLM that are directly connected to NIf neurons. A greater exploration of the response properties of functionally connected auditory forebrain cells and of the connectivity between CLM and NIf would be required to validate this hypothesis. In particular, calculating the spectrotemporal receptive fields of individual cells in CLM that are functionally connected with cells in NIf could tell us whether although not selective for the BOS stimulus as a whole, correlated auditory forebrain cells are selective for certain features of the BOS that are not typically found in other stimuli. In addition, one could then attempt to find functionally and anatomically connected CLM–NIf paired sites that show direct monosynaptic connections and test their responses to the BOS among various other stimuli. But, although a subset of cells within CLM that are selective for the BOS and directly connected to NIf neurons remains a possibility, we believe that it will be unlikely given that we did not find BOS-selective cells in our functionally connected CLM–HVC subset. In addition, the projection from CLM to NIf appears, from tracing experiments, to be substantial and relatively distributed (Vates et al. 1996), thus decreasing the chances of finding a particular specialized subset of neurons within CLM that is connected to NIf.
It thus seems that BOS selectivity might first be observed in the song system and in NIf in particular. This suggests that the auditory input to the song system might be completely generic or unselective to BOS features. If true, then the selectivity for the BOS would be an exclusive signature of the song nuclei. As subsequently explained, an examination of the correlations during evoked activity also supports this hypothesis.
Implications of the decorrelations of functionally connected neurons during auditory stimulation
The correlations measured between field L and HVC and CLM and HVC in our study were found mainly during spontaneous activity and not during stimulus-evoked activity. For most of our functionally connected paired sites, it appears that the presence of a stimulus acted as a decorrelator, causing significant spontaneous activity coherency peaks to fall out of significance in the presence of a stimulus. Similar cases of stimulus-modulated functional connectivity have been reported elsewhere (Eggermont 1994; Frostig et al. 1983; Kvasnak et al. 2000; Wood and Glantz 1980).
Even though most of the individual field L and CLM cells or cell clusters in our functionally correlated pairs responded to one or more of the stimuli presented, on an individual basis, these functionally connected neurons were only weakly involved in generating the auditory responses observed in HVC. This means that either the song nuclei are receiving auditory input from other sources or, more probably, a high degree of nonlinear processing leads to the observed auditory responses in HVC. In this second scenario, a succession of a relatively small number of selective inputs from auditory areas might be sufficient to drive large responses in the song nuclei and effectively render ineffective any additional input (Doupe and Konishi 1991; Lewicki and Arthur 1996). Along these lines, it is possible that after being properly triggered, neurons in the song nuclei would be entrained to “play back” a sensory copy of the motor program and therefore be decoupled from additional sensory input. Indeed, the potential of neurons in HVC to spontaneously replay responses that resemble those during motor production and also passive BOS stimulation has been shown in zebra finches (Dave and Margoliash 2000). In either case, whether auditory input throughout the BOS is needed for generating auditory responses or whether auditory responses exhibit a form of entrainment, our data suggest that the strong nonlinearity that is responsible for BOS selectivity will be found in the song nuclei starting at the level of NIf and that very little selectivity for the BOS might be found in the auditory system. This hypothesis is also consistent with the idea that the BOS selectivity found in the song system is a result of motor learning and not of perceptual learning (Theunissen et al. 2004). Recent physiological and behavioral studies have also suggested that there might be a dichotomy between motor memories and perceptual memories such as those for the tutor song (Bao et al. 2003).
To further investigate whether the nonselective auditory input to NIf or HVC from field L or CLM is required to elicit auditory responses and BOS-selective responses in song nuclei, one could also attempt to inactivate CLM or subregions of field L by GABA or the GABA agonist muscimol. Similar experiments by Cardin and Schmidt (2004) and Coleman and Mooney (2004) have recently shown that inactivation of NIf greatly reduces or completely abolishes spontaneous, auditory-evoked, and subthreshold activity in HVC or HVC projection neurons. If inactivation of CLM does not abolish auditory responses in NIf, then the possibility of a second source of auditory input would become more probable.
In summary, the present study of functional connectivity between the auditory regions field L and CLM and the song system nucleus HVC has shown that the auditory and song systems show a high degree of functional connectivity. This connectivity is likely a reflection of the succession of anatomical connections between field L, CLM, NIf, and HVC and the high level of interconnectivity. A second connectivity path between field L and HVC is also consistent with our data set. The subset of cells in the auditory areas that are functionally connected to the song system do not appear to be different from nonconnected cells in their overall response properties to complex sounds. This result suggests that NIf plays a key role in the emergence of BOS selectivity and, by extension, in shaping the auditory feedback signal necessary for song learning. However, aside from learning and maintaining the quality of their own song throughout life, songbirds need to be able to listen to the sounds of other members of their own species and respond appropriately to those vocalizations. It is conceivable that a less-selective signal might be necessary for triggering responsive processes such as beginning to sing, modulating song level or rate, or interrupting song. For example, if a male is singing to attract a female, and another male approaches the female and begins to sing to her as well, the initial male must “decide” whether to modulate certain features of his song based on the approaching male's vocalizations and behaviors. This “decision” is probably based on auditory, visual, and social cues and therefore communication between the auditory and vocal systems is essential for the process. Zebra finches and other songbirds have been shown to modulate the amplitude of their song (Brumm and Slater 2006; Cynx and Gell 2004) and evidence from other systems suggests that auditory cortex cells alter their physiological properties in response to alterations of their vocal apparatus (Cheung et al. 2005; Eliades and Wang 2003), demonstrating that the auditory system is at least partially involved in triggering the modulation of song. Perhaps the connections seen here are also involved in this process. Future electrophysiological studies of the auditory and song systems in awake, behaving songbirds will further shed light on the role of this apparently nonselective auditory input to the song system for both song learning and song triggering.
This study was supported by National Institute of Mental Health Grant MH-051899 to F. E. Theunissen.
We thank N. Amin, R. Moore, T. Elliot, and P. Gill for invaluable comments on an earlier version of this manuscript and two anonymous reviewers for insightful comments and criticism. We also thank J. Winer for assistance with creating the histological reconstructions.
↵1 A backpack containing boxes of histological tissue was stolen from the laboratory.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Copyright © 2007 by the American Physiological Society