|
|
||||||||
J Neurophysiol (January 1, 2003). 10.1152/jn.00088.2002
Submitted on Submitted 7 February 2002; accepted in final form 3 September 2002
Department of Psychology, University of California, Berkeley, California 94720-1650
| |
ABSTRACT |
|---|
|
|
|---|
Grace, Julie A., Noopur Amin, Nandini C. Singh, and Frédéric E. Theunissen. Selectivity for Conspecific Song in the Zebra Finch Auditory Forebrain. J. Neurophysiol. 89: 472-487, 2003. The selectivity of neurons in the zebra finch auditory forebrain for natural sounds was investigated systematically. The principal auditory forebrain area in songbirds consists of the tonotopically organized field L complex, which, by its location in the auditory processing stream, can be compared with the auditory cortex of mammals. We also recorded from a secondary auditory area, cHV. Field L and cHV are auditory processing stages that are presynaptic to the specialized song system nuclei where auditory neurons show an extremely selective response for the bird's own song, but weak response to almost any other sounds, including conspecific songs. In our study, we found that neurons in field L and cHV had stronger responses to conspecific song than to synthetic sounds that were designed to match the lower order acoustical properties of song, such as their overall power spectra and AM spectra. Such preferential responses to natural sounds cannot be explained by linear frequency tuning or simple nonlinear intensity tuning and requires linear or nonlinear spectro-temporal neuronal transfer functions tuned to the acoustical properties of song. The selectivity for conspecific songs in field L and cHV might reflect an intermediate auditory processing stage for vocalizations that then contributes to the generation of the very specific selectivity for the bird's own song seen in the postsynaptic song system.
| |
INTRODUCTION |
|---|
|
|
|---|
Although, our understanding
of the neural basis of sound localization (Knudsen 1999
;
Konishi et al. 1988
) and echolocation (Pollak et
al. 1977
; Suga et al. 1978
) are relatively
advanced, we know little about how complex sounds used in communication are processed by the auditory system. In particular, we have growing evidence that recognition of species-specific vocalizations could be
mediated by high level auditory neurons (Langner et al.
1981
; Leppelsack and Vogt 1976
; Newman
and Wollberg 1978
; Plummer and Striedter 2000
;
Rauschecker et al. 1995
; Scheich and Bonke
1979
; Wang et al. 1995
), but we only have a
limited understanding of the nature of the auditory processing in these
selective high-level auditory areas.
To begin to look at the processing of behaviorally relevant complex
sounds, we systematically quantified neural selectivity for conspecific
vocalizations in primary and secondary auditory forebrain areas in a
songbird, the male zebra finch (Taenopygia guttata). The
study of the auditory forebrain of songbirds is particularly well
suited to address this question for two reasons. First, acoustical
perception in songbirds has been well characterized and appears to be
particularly precise. Not only are songbirds able to discriminate among
very similar behaviorally relevant sounds (Dooling et al.
1992
; Lohr and Dooling 1998
), but it is also
known that young male songbirds must also be able to store a template
of a model song, the tutor song, that they then learn to match during
the song learning process (Marler 1981
). Second, songbirds have evolved a set of specialized sensori-motor areas called
the song-system (Nottebohm et al. 1976
) where some of
the most selective responses to natural sounds have been described. The
auditory neurons in the song system of adult male songbirds respond
selectively and preferentially to the sound of the bird's own song
(Doupe 1997
; Margoliash 1986
;
Theunissen and Doupe 1998
).
It is believed that at least part of the auditory information coming
from the auditory thalamus en route to the song system is first
processed in the avian primary and secondary auditory forebrain. The
avian auditory thalamo-recipient is a tonotopically organized region of
the neostriatum called field L (Zaretsky and Konishi
1976
). Field L is further divided based on cytoarchitecture and
connectivity into different sub-regions that are also presumably hierarchically organized (Fortune and Margoliash 1992
;
Vates et al. 1996
). Tracing experiments suggest that
auditory information from field L then enters the song system through
the song nucleus Nif (Nucleus interfacialis) and potentially directly
through HVc (Fortune and Margoliash 1995
; Kelley
and Nottebohm 1979
; Vates et al. 1996
). The
auditory input from field L to Nif comes via a secondary auditory area
found in the ventral caudal hyperstriatum, cHV (not to be confused with
HVc). Field L also projects directly to the shelf area of HVc where
dendrites from HVc neurons have been found. However, since HVc also
receives strong input from Nif, most of the auditory input to HVc could
be coming from Nif (see Fig. 1A).
In support of a theory of hierarchical processing of sounds, it was
found that the extreme selectivity for the bird's own song found in
the song system was absent in field L of songbirds (Janata and
Margoliash 1999
; Lewicki and Arthur 1996
;
Margoliash 1986
). Nonetheless, subsets of neurons in
avian field L show complex response properties. Some cells respond
poorly to pure tones and are maximally responsive to temporally and
spectrally complex sounds (Langner et al. 1981
;
Leppelsack and Vogt 1976
; Scheich and Bonke
1979
). These response properties result in a subset of neurons
that are selective for particular conspecific sounds, either calls or
parts of songs, relative to other conspecific sounds (Leppelsack
and Vogt 1976
; Scheich and Bonke 1979
). These data raise the possibility that the responses of forebrain auditory neurons are matched to natural or, more specifically, conspecific sounds.
To test this hypothesis, we compared the responses to conspecific songs with those to matched synthetic sounds. The matched synthetic sounds consisted of simple and complex sounds commonly used in auditory research that matched different aspects of the spectral and temporal structure of the natural song. In particular, we generated narrow-band and wide-band synthetic sounds composed of tone pips that had the same power spectrum as song and a similar AM spectrum. If the response of the neurons can be explained by their frequency and AM tuning curve, then these synthetic stimuli would elicit similar firing rates as the natural sounds. However, a difference in responses would prove that the transfer function of neurons cannot be simply reduced to their spectral and temporal tuning properties. Moreover, if the differences are systematically in favor of the conspecific songs (in the sense of higher responses), we could conclude that the actual transfer function of the neurons are to some extent matched to the higher order spectral-temporal structure of the natural song. To further investigate the possible contribution of such higher order structure, we obtained responses to sounds composed of harmonic stacks designed to match the harmonic sounds found in song. Finally, to guarantee that we did not over-restrict the sampled acoustical space, we also obtained responses to band-passed white noise.
The second goal of our research was to assess whether there was
hierarchical processing within the different sub-fields of field L. Within field L, L2a is the primary recipient of thalamic input and L2a
neurons project to L1, L2b, and L3 (see Vates et al.
1996
and Fig. 1A).
Previous physiological research also suggested that neurons in L1 and
L3 were more selective than neurons in L2a (Langner et al.
1981
; Leppelsack and Vogt 1976
; Scheich
and Bonke 1979
). Additionally, we were able to measure any
differences in selectivity between field L and the secondary auditory
area cHV.
|
| |
METHODS |
|---|
|
|
|---|
Animal procedures
Animal procedures were approved by the Animal Care and Use
Committee at UC Berkeley. Adult male zebra finches (Taenopygia guttata) of
100 days of age were used in all experiments. Two days prior to the physiological recording experiments, birds were anesthetized with intramuscular injections of 0.03-0.05 ml of modified
Equithesin (0.85 g of chloral hydrate, 0.21 g of pentobarbital, 0.42 g of MgSO4, 8.6 ml of propylene glycol,
and 2.2 ml of 100% ethanol, to a total volume of 20 ml with
H2O) for a surgical procedure that is required
for the physiological recordings. The Equithesin injection puts the
bird in a deep state of anesthesia that is required for the preparatory
surgery but also greatly suppresses auditory-evoked neural activity in
the forebrain. The preparation involved cutting the skin covering the
skull, marking the locations of electrode penetrations and gluing a
miniature holding steel post. More specifically, we first immobilized
the head of the bird in a stereotax with ear bars and a beak holder. A
small section of skin and the top layer of skull underneath were then
removed and a reference point for electrode penetrations was made in
ink at a distance of 1.5 mm lateral and 1.2 mm rostral from the
bifurcation point of the midsagittal sinus using a head angle of 40°.
A stainless steel post was then glued over or near the midsagittal
sinus with dental cement. The bird fully recovers from this surgical
procedure in 24 h and is then ready for the physiological experiment.
On the day of the experiment, the bird was anesthetized with three
injections of 20% urethane administered intramuscularly at half-hour
intervals (75 µl total). The bird's head was immobilized by
attaching the steel post to a frame located on the stereotax. The lower
layer of the skull and the dura were removed from the area surrounding
the ink-marked location. Tungsten extracellular electrodes of 1-4 M
resistance were lowered into the brain using a microdrive. The bird was
then placed in a calibrated double-walled anechoic sound-attenuated
chamber facing the speaker that was used for sound presentation. The
speaker was 20 cm away from the beak, and the bird was elevated to be
at the same level as the center of the speaker cone. The sound at this
position was calibrated before the experiments to ensure that our sound
delivery system had a flat transform function (within ±5 dB).
Stimulus design
The stimulus repertoire consisted of conspecific songs and
synthetic sounds. The conspecific stimulus set consisted of the representative song of 20 unrelated adult male zebra finches. We have
shown that such an ensemble of songs is representative of the spectral
and temporal patterns occurring in zebra finch song, in the sense that
sampling error in both power spectrum and the modulation spectrum (see
following text) is minimal (Theunissen et al.
2000
).
We used four different synthetic ensembles: a succession of pure tones
(pips), combination tones (tones), spectrally modulated harmonic stacks
(ripples), and band-pass white noise (white noise or WN). Spectrograms
of exemplars from each ensemble are shown in Fig.
2. The synthetic stimuli were based on
stimuli commonly used to characterize auditory neurons but with the
additional constraint that they were designed using the power spectrum
of songs as well as other parameters characterizing the AM and spectral modulation observed in the conspecific songs. In this way, we were able
to directly compare the responses to song with those to the synthetic
stimuli. To design and compare our stimulus ensembles, we
calculated both the traditional power spectra (Fig.
3) and the power spectra of the
spectrogram of the sounds (Fig. 4).
The 2-d power spectrum shows the AM spectrum (projections on the
x axis), the spectral modulation spectrum (projections
on the y axis), and joint spectral-temporal structure
(off axis). We refer to these 2-d spectra as the modulation spectra.
The AM characterizes how the envelope of the sound changes as a
function of time. The spectral modulation characterizes the
spectral structure of multi-band sound signals such as harmonic stacks
or speech formants. The joint spectral-temporal structure characterizes
complex sounds with frequency modulations such as down frequency sweeps
(right quadrant) and up-frequency sweeps (left quadrant). In summary, the modulation power spectrum shows the amplitude values of a decomposition of the sound in terms of ripple stimuli (see Klein et al. 2000
; Shamma 2001
; Theunissen et
al. 2001
for more technical details).
|
|
|
The power spectrum of song shows that songs are broadband, covering the
frequencies between 1 and 8 kHz and peaking around 4 kHz (Fig. 3, Con).
The modulation power spectrum (Fig. 4, Con) shows that the AM spectrum
of zebra finch song is mostly below 50 Hz (x axis). Also,
the zebra finch song has strong spectral modulations
1.5 cycles/kHz
(y axis). This energy peak (around 1.5 cycles/kHz)
corresponds to the harmonic stacks frequently present in zebra finch
songs with a fundamental around 700 Hz. The modulation power spectrum
of song is also slightly asymmetric with more energy in down-sweeps
than up-sweeps.
The pips ensemble was designed to have the same overall power spectrum as the conspecific song via random sampling of the power spectrum distribution of song to obtain frequencies of pure tones. The pips sound consisted of a succession of such pure tones in time. The length of the tone pips and the inter-pip silences were drawn from a gaussian distribution that approximated the distribution of the length of song syllables [95 ± 66 (SD) ms] and inter-syllable silences (37 ± 25 ms). The onset and offset ramp of each tone pip was a 25-ms cosine function, loosely matching the amplitude envelope of song syllables. The intensity of the individual pips was randomly varied uniformly over a logarithmic scale, and the overall power of the pips sound was matched to that of the conspecific songs. The result is a sound ensemble with the same power spectrum as song, within sampling error (Fig. 3, pips vs. con), and similar AM spectrum (Fig. 4 x axis, pips vs. con). In this sense, our pip ensemble is the closest sound to zebra finch song that could be made with simple tone pips such as those used in traditional auditory physiological experiments. However, since tone pips are narrow-band signals and the zebra finch song is broadband, the maximum intensity of individual tone pips at any particular frequency had to be greater than the intensity of sound in the corresponding narrow frequency band in any particular syllable of song. If the response of an auditory neuron can be explained by its frequency tuning and a simple linear energy response, then the mean response to our tone pip stimuli and to zebra finch song would be identical and the maximum response would be greater to tone pips than to songs. For nonlinear and saturating intensity-response curves, the mean response to the pips would be smaller and the maximal response could either be greater or equal. For nonmonotonic intensity response curves, both the mean and max rates could be smaller for the tone pip ensemble. A more notable difference between the pips and the song ensemble is the lack of multi-band or broadband sounds in the pips stimuli. In particular, the pips lack the harmonic stacks found in zebra finch song. This effect can clearly be seen in the differences between the modulation power spectra of pips versus songs: the pips ensemble does not have any energy in the spectral modulation at 1.5 cycles/kHz along the y axis.
The tones ensemble is the broadband extension of the pips ensemble. The
tones were synthesized by adding 20 different pips sounds together and
normalizing the result to retain the same overall power as song. The
power spectra of tones and conspecific song are therefore also
identical (Fig. 3, tones vs. con). The tones stimuli are sometimes
called sparse noise, and in this case, it would be sparse-colored noise
in reference to their song-like power spectra. Similar sounds have been
used to obtain spectral-temporal receptive fields of cortical auditory
neurons (deCharms et al. 1998
). For our tones ensemble,
the range of intensity in any narrow frequency band was similar to
those found in song. An auditory neuron whose response could be reduced
to its frequency tuning curve and a linear or nonlinear
stimulus-intensity response curve would then exhibit the same neural
responses, both in terms of mean and maximum rates, to tones and to
conspecific song. The modulation spectrum of tones (Fig. 4) shows that
it covers the entire range of spectral-temporal modulations observed in
song. As was the case for the pips, the AM spectrum is practically
identical to that of song. The tones ensemble, however, covers the
spectral-temporal space uniformly in a low-pass fashion, whereas the
conspecific sound ensemble particularly emphasizes sounds along the
x axis and y axis, corresponding respectively to
noisy sounds at a range of amplitude modulations and slow-varying
complex sounds with spectral structure (harmonics).
To investigate the responses to the slow-varying spectral modulations,
we generated a synthetic sound ensemble composed of harmonic stacks
with amplitude modulations along the frequency axis (called ripples
here for short, although it spans only a small subset of all possible
ripple sounds). Voiced speech sounds, musical sounds, and many animal
vocalizations are characterized by harmonic sounds that will yield
particular peaks power on the y axis of the modulation
spectra, corresponding to the fundamental frequency of the harmonic
sounds. Zebra finch song, as well as the song of other grass finches,
contains many song syllables composed of harmonically related frequency
components. The fundamental of the harmonic stacks in our ripple
ensemble was chosen from a Gaussian distribution with a mean of
700 ± 100 Hz to match the range of fundamental frequencies in the
harmonic stacks found in zebra finch song. The amplitude of each
harmonic was then modulated with a cosine function, enhancing
particular frequency components and suppressing others, thereby
effectively generating harmonic sounds with different timbre. The
period of the cosine function was also chosen from a Gaussian
distribution with a mean of 4,000 ± 3,000 Hz (0.25 cycles/kHz).
Similar enhancement and depression have been measured in zebra finch
song (Williams et al. 1989
), albeit with a particular
distribution which we did not attempt to match. The spectral structure
of our synthetic ripple sound is clearly reflected in its modulation
spectrum (Fig. 2, Ripples). Most of the energy describing the spectral
structure is along the y axis with concentrations of power
below 0.7 cycles/kHz corresponding to the timbre and around 1.43 cycles/kHz (=1/0.7) corresponding to the fundamental. Like the pure
tone ensemble, the duration of the harmonic stacks and the inter-stacks
interval had the same mean and SD as zebra finch song syllable and
inter-syllable duration. In this manner, we obtained a similar AM
spectrum as found in song (Fig. 4, x axis). The overall
power spectrum of the ripples was flat from 700 Hz to 8 kHz (Fig. 3).
These sounds are similar to those used by Calhoun and Schreiner
(1998)
to probe the sensitivity of auditory cortical neurons to
spectral modulation, but different in that the parameters were made to
match the harmonic stacks found in zebra finch song.
Finally, we used white noise stimuli band-passed from 16 Hz to 8 kHz
(Fig. 3). The modulation spectrum of the white noise ensemble
illustrates the fact that white noise covers spectral and temporal
structure uniformly (Fig. 4). The overall power of white noise was
matched to those of song and also to those of the other stimulus
ensembles. White noise stimuli have the advantage of sampling the
entire space of possible sounds, and would therefore in theory, be
useful to drive neurons that are not sensitive to the particular sounds
found in songs or in our matched synthetic stimuli. Conversely, because
of its large bandwidth, any arbitrary and spectrally and temporally
complex sound is embedded in the white noise stimulus with low signal
intensity (Klein et al. 2000
).
The synthetic stimuli lasted exactly 2 s, while the duration of the songs varied and had a mean of 2.08 ± 0.63 s. We used 40 different ripple stimuli, 20 different pips stimuli, 20 different tones stimuli, and 20 different white noise stimuli in total. The volume of the speaker was set to deliver song at peak levels of 85 dB SPL (Ban dK Sound Level Meter, RMS weighting type B).
Electrophysiology and experimental protocol
Spike arrival times were obtained by thresholding the
extra-cellular recordings with a window discriminator. By visual
comparison of the triggered spike shapes to an average or clearly
isolate spike obtained using a digital oscilloscope, we classified our recordings sites as either single units or multi-units. The multi-unit data were always obtained with a high window threshold relative to the
noise level and consisted mostly of a small cluster of units. The data
from single units and multi-units were analyzed separately. Since
similar conclusions were obtained in both cases, we were able to
combine the data for some of the analysis to increase statistical
power. The neural activity was recorded in a systematic fashion at
approximately 100-micron interval depths in the zebra finch auditory
forebrain to estimate the number of responsive versus nonresponsive
recording sites. The position of the electrode was varied from its 100 micron step position if this repositioning allowed for better isolation
of a single unit. Electrode penetrations in a given bird were
300
microns apart. Between 1 and 4 electrode penetrations were achieved per
bird. The exact location of the electrode was chosen
systematically to uniformly sample the auditory forebrain. Figure
1C shows all the recording sites from our data set.
The bird's own song and white noise were used as search stimuli to determine whether a particular unit appeared to be responsive to our auditory stimuli. If the response to either of these stimuli was significantly different from the baseline firing rate, determined by an on-line t-test, then we acquired 10 trials each to the presentation of three conspecific songs, three pips stimuli, three tones stimuli, three ripple stimuli, and two white noise stimuli. The trials for each stimulus type were interleaved and the stimulus presentation order was randomized for each trial number. For each unit, three different cons, pips, tones, and ripples each, and two variations of white noise stimuli were picked from the entire ensemble of sounds in a systematic fashion. To control for a possible lack of stationarity in the responses, the responses to white noise during the search was not included in the selectivity analysis. Two seconds of background spontaneous activity were recorded before and after the presentation of each stimulus. A random inter-stimulus interval with a uniform distribution between 7 and 8 s was used.
At the end of the first recording pass, a single electrolytic lesion (100 µA for 5 s) was made to aid in the later reconstruction of the recording sites. The electrolytic lesions were made 400 microns after the last recording site and in regions well below the auditory forebrain. We did not observe any differences in response properties between recordings prior and after lesions. In the last recording pass of a typical experiment, two lesions 100-300 microns apart were created for calibrating our depth measures (see Histology and anatomical reconstructions).
Histology and anatomical reconstructions
At the end of the electrophysiological recordings, the bird was deeply anesthetized with either metofane or 0.08 ml of Equithesin. The bird was then transcardially perfused with 0.9% saline, followed by 3.7% formalin in 0.025 M phosphate buffer. Parasagittal 40-µm sections were prepared using a freezing microtome. One-half of the sections were later stained with cresyl violet and the other half with silver stain. Electrode tracks and electrolytic lesions could then be identified and were photographed using an Axiophot microscope (Zeiss) with an attached video camera (UC Berkeley, Biological Imaging Facility).
To account for histological shrinkage of the fixed brain, we calibrated
our depth measures by comparing the distances in the sections between
the top of the brain and the electrolytic lesion, as well as the
distance between successive lesions for the last recording pass with
the distances in microns given by our independently calibrated
microdrive used during the experiment. Using SCION image, recording
sites were obtained with the help of the log kept during recording
sessions, denoting the distances measured (with the microdrive) between
subsequent recording sites. The borders of the sub-divisions of field L
and cHV were then determined using prominent landmarks such as the
dorsal medullary lamina (LMD) and the hyperstriatal lamina (LH), as
well as using the differences in cell size, shape, and density between
sub-areas of field L as described by Fortune and Margoliash
(1992)
. Recording sites falling into one of the regions of
interest were then documented and only those units falling into one of
these areas were used for the analysis.
Three standardized schematics of these regions of the brain were then created representing distances 1100-1300, 1300-1600, and 1600-1900 microns from the midline. The mapped photographs of the slices containing electrode tracks were then assigned to one of these three schematics based on the distance of the slice from the midline. The slice was then mapped (via rotation and scaling) onto the schematic using the scaled distance of the electrode track from its intersection with a line drawn from the "V" point of the LMD to the caudal most part of sub-field L2a, as well as the angle of this intersection. The locations of all recording sites could then be defined using a standard Cartesian coordinate system and these coordinates were then used to create functional maps of neural response and selectivity (Fig. 5). These maps were obtained by assigning a color value corresponding to the response strength or selectivity of each responsive unit shown in Fig. 1C. For visual purposes, the color value was spread around the exact location of the recording site, effectively making large dots. Overlapping values were then averaged taking into account the distance to the recording site with a Gaussian decay function. The resulting picture looks like a continuous functional map that enabled us to investigate whether we could detect functional sub-areas distinct from the anatomically defined sub-areas.
|
Data analysis
Neurons were first classified according to whether they were responsive or not. To be classified as responsive, a unit had to have an average firing rate for either white noise (WN) or bird's own song (bos) that was significantly different (P < 0.05, 2-tailed paired t-test) from its average spontaneous rate, in this case the mean firing rate during the 2 s preceding stimulus onset. WN and bos were selected as search stimuli for various reasons. First, these search stimuli include a prototypical synthetic sound used in auditory physiology and the prototypical natural sound used in songbirds. Second, WN and bos seem to elicit the most variable auditory responses from any given neuron. Third, in our experience, it is very rare to find auditory units that do not respond to WN and bos and do respond to other sounds.
The neural response of all units classified as responsive was expressed
as a z-score. This measure represents the normalized difference between
the firing rate during the stimulus and that during the 2-s baseline
period preceding the stimulus and is calculated as follows
|
S2 is the
variance of the response during the stimulus, and
BG2 is the variance of the
response during baseline. When multiple exemplars of a particular
stimulus were presented, the average responses of the unit to all
presentations for that particular stimulus were used to calculate the
z-score.
The maximum firing rate and its variance were also calculated for each stimulus. We first obtained a post-stimulus time histogram (PSTH) by convolving the spikes with a 30-ms Gaussian window. The mean of the maximum firing rate and the time of maximum firing were obtained from the PSTH. To calculate the variance, the maximum firing rate for each trial in the 30-ms window around the time of maximum firing was determined. z-Scores for the mean and maximum firing rates were also calculated after excluding the first 100 ms of the response. As shown in the results, the majority of neurons showed a strong onset response to white noise that was absent in the other stimuli and we wanted to evaluate the response and selectivity of the neurons with and without this particular onset response.
Units in the auditory forebrain areas were further divided according to whether they were excitatory or inhibitory. A z-score value greater than zero to a particular stimulus represents an excitatory response whereas values less than zero represent inhibitory responses. The sign of the z-score for the bos stimulus was used to classify all auditory units in the study as excitatory or inhibitory. This choice is somewhat arbitrary because, for example, we found a few units that were excitatory to bos and were inhibitory to WN (14/352). However, for all cells, an excitatory response to bos correlated with excitatory responses to the majority of sounds in our ensemble.
The selectivity of each unit for one stimulus class over another
stimulus class was quantified using the psychophysical d' measure. This
measure has more recently been used to quantify neural selectivity in
the avian song system and auditory forebrain (Janata and
Margoliash 1999
; Solis and Doupe 1997
;
Theunissen and Doupe 1998
). The d' measure for the
discriminability between two stimuli (A and B) is calculated as
|
2 is the variance of the response. If
the d' value is positive then stimulus A elicited a greater response,
if it is negative then stimulus B elicited a greater response. d'
values approximately zero indicate no difference in the response evoked
by the two stimuli. Since d' is sensitive to the sign of the difference
in magnitude of the absolute responses, it will also give negative values when stimulus A elicited a greater inhibition than the inhibition obtained to stimulus B. For this reason, when interpreting d' values for selectivity, it is necessary to distinguish the excitatory from the inhibitory units. For each unit and two stimulus types, a d' value is obtained for all pair-wise comparisons before averaging. For example, for most units, we obtained responses to three
conspecific sounds and two white noise sounds, yielding 6 d' measures
for the con-white noise comparison. An average d' value is then
obtained from these six values to measure the average con-white noise
selectivity for that particular unit. d' values were calculated for
both the mean firing rate during the entire stimulus and for the
maximum firing rate, estimated with a 30-ms window as explained above.
For the con-WN comparison, we also calculated d' values for mean and
max rates with and without the first 100 ms.
To determine whether the variances in the distributions of d' were significantly different from the null hypothesis, d' values for the con-con comparison were also calculated. In this comparison, three different conspecific songs were presented to each unit, resulting in three independent d' values for each unit, which were then averaged together. The mean of this distribution is necessarily 0 (for large numbers), but assessing its variance was informative in determining whether particularly high or low d' values found for other comparisons would be within the expected range.
| |
RESULTS |
|---|
|
|
|---|
The primary goal of our project was to compare neural responses in the auditory forebrain areas of songbirds elicited by natural and matched synthetic sounds. In particular, we wanted to investigate whether these brain areas were selective for conspecific song relative to three types of synthetic sounds: tone pips (pips) and combination tones (tones) with matched power spectra and similar AM spectra to song, and harmonic stacks (ripples) with a similar amplitude and spectral modulation spectrum (see METHODS). We also measured responses to white noise.
Overview of auditory responses
Responses to auditory stimuli were obtained in 647 recording sites
from 24 birds in a mixture of single-unit and multi-unit recordings.
Table 1 and Fig. 1C show the
total number and the exact location of the recording sites within each
of the anatomically defined sub-regions of the field L complex
(sub-fields L, L1, L2a, L2b, L3) and the caudal hyperstriatum ventrale
(cHV) (Fortune and Margoliash 1992
, 1995
; Vates
et al. 1996
). Based on their responses to bos or WN, we
classified sites as responsive or not responsive (see
METHODS). Table 1 and Fig. 1C also show the
number of responsive sites recorded within each area. Three hundred
fifty-two recording sites (approximately 55%) were found to be
responsive (Table 1). Although sub-fields L2a and L2b had the highest
percentage of responsive sites as might have been expected from their
location in the auditory processing stream (Fig. 1A), the
difference in numbers of responsive sites across the different
sub-areas was not statistically significant
(
2 test for independence:
2 = 4.5, df = 5, P > 0.1). In all areas, the most lateral section had a
smaller number of responsive sites, potentially because our lateral
section is at the edge of the auditory forebrain (Vates et al.
1996
). The differences in percentages across the three sections
are not statistically significant (
2
test for independence:
2 = 3.23, df = 2, P > 0.1).
|
The sites that were classified as responsive were further broken down
according to whether their mean firing rate response to the bird's own
song was greater than or less than the baseline spontaneous firing
rate. According to this classification, 69% of the responsive units
were excitatory (91% or 30 units in L2a, 88% or 77 units in L2b, 66%
or 51 units in L3, 63% or 42 units in cHV, 61% or 19 units in L1, and
58% or 39 units in L). It should be noted, however, that the measured
percentage of excitatory neurons is affected by urethane anesthesia,
which has been shown to depress spontaneous activity (Capsius
and Leppelsack 1996
).
Of the 352 recording sites categorized as being responsive, 165 were
classified as single units (47%) and 187 were classified as
multi-unit. The percent of single units was significantly lower from this average in L1 (7/31) and L2a (9/33)
(
2 = 12.15, df = 5, P = 0.03). The mean response for the excitatory sites to conspecific song, averaged across all areas, was 16.4 ± 14.2 spikes/s for the multi-unit data and 7.6 ± 9.1 spikes/s for
the single unit data. The mean background rate was 4.4 ± 4.7 spikes/s for the multi-units and 2.8 ± 2.7 spikes/s for the
single units. From these numbers and from our visual estimation of the distribution of spike shapes, we estimated that the multi-unit data
were composed of a small number of single units.
An example of responses to all stimuli is shown in Fig. 2 for a multi-unit recording site in L2b. In general, neurons responded to all stimuli although for each site, differences in mean responses could be observed for the different stimuli. Qualitatively the responses to conspecific song and the matched synthetic stimuli were characterized by reliable spike patterns from trial to trial. In contrast, the majority of the responses to white noise lacked any clear phase-locking. The onset response to white noise was also significantly different from the onset responses to the other stimuli. As shown in Fig. 6, both single- and multi-unit sites showed on average a strong onset response to WN that was absent for conspecific song. The matched synthetic stimuli showed weak (in the case of tones) or no onset responses. This strong onset to WN is presumably due to the fast onset of broadband energy that is present in WN and absent in the other stimuli. We also noted that other stimuli could elicit maximum firing rates that were similar in size to the onset response to white noise but different in that they could occur anywhere during the stimulus presentation, unlike white noise where the maximum response was always present at stimulus onset. For this study, we did not quantify the reliability of the spike patterns across trials for the different stimuli but we did want to compare the mean responses to conspecific song to those of white noise both including and excluding the strong onset response observed for WN. Thus we measured mean and max responses that also excluded the first 100 ms after onset time.
|
Response strength
The z-scores were calculated from mean firing rates during stimulus presentation and background to determine the strength of a unit's response to any given stimulus. The average z-scores obtained for all stimuli are shown in Fig. 7 for both excitatory and inhibitory units, both with and without the 100 ms after stimulus onset. The stimuli have been ordered on the x axis according to increasing order of response for the excitatory units. When the onset is included in the calculation of the response, the weakest response strength was obtained for pips, followed by tones, ripples, and white noise, and the strongest response for conspecific. Without the onset, the response to white noise decreases relative to the other stimuli and approaches in response strength to the tones sounds. The recording sites from multi-units gave higher z-scores than single units, but the trend in the response strength versus stimulus type was similar for both types of recordings. The higher z-scores for the multi-units are a reflection of the increase in signal-to-noise ratio, as is expected from summing responses from neighboring neurons with similar properties. For the excitatory units, a one-way ANOVA shows that differences in z-scores for the effect of stimulus type are significant both for single units and multi-units and whether or not the onset response is included. A similar trend was observed when the data were combined from both types of units. For the inhibitory units, the differences in average response strength across stimulus types were not statistically different (Fig. 7). We did notice, however, that the inhibitory units had the most reliable inhibition to conspecific song and the strongest inhibition to white noise. In the selectivity analysis in the section below, we will directly assess the selectivity of individual units by comparing their response to conspecific song with their responses to the synthetic stimuli using the appropriate pair-wise measure of d'.
|
By obtaining the precise anatomical location of our recording sites, we were able to obtain functional maps of z-scores in the avian forebrain and to test for differences in responses in the different anatomically defined sub-regions of field L. Figure 5A shows functional maps illustrating the anatomical distribution of z-scores for conspecific song and pips. The strongest excitatory responses were generally observed within the central region of field L (around L2a), while the weakest responses tended to be at the dorsal most extremes of cHV and the ventral most extremes of sub-field L. In general, the strength of the auditory response was correlated across different stimuli. For instance, units that responded strongly to conspecific song also responded strongly to synthetic sounds, with the possible exception of WN. The z-scores averaged over all stimuli and calculated for each sub-region of field L and cHV are shown in Fig. 8. A one-way ANOVA showed significant differences in mean z-scores across sub-regions for excitatory units but not for inhibitory units. In particular, the response strength in cHV was smaller than in any of the field L sub-regions for excitatory units.
|
Selectivity for natural versus synthetic sounds
To directly compare the responses to conspecific song with those obtained in response to our matched synthetic sounds, we quantified the selectivity of any given unit by calculating a d' value: the normalized difference between the neural responses to two stimuli being compared. d' values are calculated for each unit and each stimulus pair and allow for selectivity analysis using the equivalent of within-subject statistical measures. To correctly interpret the distribution of d' values, units were separated into excitatory and inhibitory groups. d' values were also calculated for both mean responses and max responses for each comparison, including the con-white noise comparison, for which the first 100 ms of the response is both included (labeled as WN) and excluded (labeled WN-T).
Figure 5B shows maps of the d' values for the comparisons between conspecific song and synthetic sounds for all the excitatory units for mean responses. These maps reveal that a majority of the neurons responded more strongly to conspecific song (con) than to pips, tones, and ripples. In addition, the units that preferred con over one of the synthetics also tended to prefer con over the other synthetics. The pattern of d' values in the con-WN map is unique in that most units appeared to strongly prefer con or strongly prefer white noise, with a much smaller number of units showing weak or no selectivity unlike the other con-synthetic selectivity maps. Also, some of the units that showed a preference for con over the other synthetics had even stronger responses to WN than to con.
To quantify the strength of the selectivity, we calculated average d'
values for both excitatory and inhibitory units and for both mean and
maximum rates (Figs. 9 and
10). For excitatory units, the
average d' value for the mean firing rate for con versus pips, tones,
and ripples was significantly >0 in all cases: for single units,
multi-units, and all units combined (2-tailed t-test
= 0.01), as illustrated in Fig. 9 (left). These
data show that the auditory forebrain of zebra finches is selective for
conspecific song relative to our subset of matched synthetic sounds.
The strongest preference for con was over pips, followed by compound
tones and ripples. The mean d' value for con over white noise was also
positive (in favor of con) even when the strong onset response observed in white noise was included, but the d' comparison was not
significantly different from zero at the 1% level. However, this
difference became significant for multi-units and combined units when
the onset response was excluded from the calculation.
|
|
In the con versus matched synthetic comparisons, a similar pattern emerged using maximum rates with higher rates for con, but in this case the difference in responses between con and ripples was not statistically significant (Fig. 9, right). For the con-WN comparison, the maximum response, in the case where the onset response is included, is still significantly greater for con when all units are combined. Since the maximum response to WN always occurred at the onset, this analysis implies that maximum responses to conspecific song are of similar strength (and slightly greater) than the onset response to WN. When the onset response to both con and WN is excluded from the analysis, the maximum response to con becomes much greater on average than the maximum response to WN (see WN-T, Fig. 9, right).
We also compared the overall variance of the con-synthetic d' distributions with that of the con-con distribution. If the sounds sampled by the conspecific ensemble and the synthetic ensembles were acoustically equivalent as far as the neural responses are concerned, then the variance of the con-synthetic distributions should be similar to that of the con-con distribution. Figure 11 shows the cumulative distribution plots for d' values for mean and max rates for the excitatory units (combined data). As expected, relative to the con-con d' distribution, the curves for the con-synthetic distributions are all shifted to the right, reflecting the positive values of the average d' as analyzed above. The variances of the distributions, reflected in the slope of the cumulative distribution plot, are also different. For the mean rates, the con-con distribution had the smallest variance (0.63), followed by the con-ripples (3.09), con-tones (6.02), con-WN (10.21), con-pips (11.90), and con-WN-T (12.85) distributions. The variances of the con-synthetic distributions were all significantly different from that of the con-con distribution (P < 0.0001, ANOVA for equal variances adjusted for multiple comparisons). For the max rates, the con-con distribution had the smallest variance (0.14), followed by the con-pips (0.17), con-tones (0.35), con-ripples (0.36), con-WN (0.91), and con-WN-T (1.85) comparisons. The variances of the con-tones, con-ripples, con-WN, and con-WN-T distributions were all significantly different from the con-con distribution (P < 0.0001, ANOVA for equal variances adjusted for multiple comparisons). The larger variances of the con-synthetic distributions are due to the larger absolute value of the response difference between a conspecific song and a matched synthetic stimulus versus the response difference between two different songs. In other words, the matched synthetic stimuli are not equivalent to songs as far as the neural responses are concerned. Also, the variances in con-WN and con-WN-T were larger than the variances for the other con-synthetic cases and were discernable in both mean and maxrates, thereby quantifying the unique pattern seen in the con-white noise distributions of Fig. 11. There is a small number of neurons with similar difference in responses to con and WN than to con and matched synthetics: neurons with the strongest responses to white noise tended to respond poorly to song and vice versa.
|
The cumulative distribution plots are also useful in illustrating the size of the effect. It is particularly interesting to look at the percentage of neurons that show reverse selectivity. For example, in the con versus pips case, approximately 15% of the neurons had a greater mean response to synthetic sounds than to songs and 25% had a greater max response. The effect size for the ripple sounds versus conspecific song is particularly small: 20% of neurons had greater mean rates and 45% of neurons attained greater max rates in response to the ripple sounds. The effect size for white noise for mean rates also shows that 40% of the neurons had a greater sustained response to WN than to song. However, since this sustained response was poorly locked to stimulus features, only 20% of neurons had a greater maximum response to WN than to song when the onset response was excluded.
Compared with the excitatory units, the responses of the inhibitory units were weaker in general (Fig. 7). Nonetheless, when significant, the inhibition was stronger for the conspecific song than for the matched synthetics (Fig. 10). The d' measure for the con versus pips comparison for all units showed stronger mean inhibitory responses to conspecific song than to pips, but the d' values were not significantly different from zero for tones or ripples. In the combined analysis, the max rate was also lower for con relative to ripples at the 1% level and lower for con relative to pips and tones at the 2% level. Although the effect was very small in the inhibitory units, it is consistent with the effect seen in the excitatory units. Both show stronger responses to conspecific song than to the matched synthetic sounds: stronger excitations are elicited for the excitatory units and stronger inhibitions are elicited for the inhibitory units. On the other hand, although WN and con had similar overall inhibitory responses, the sustained inhibitory response to WN was even greater than that to con.
Mapping the neural selectivity
Our data also allowed us to ask whether the selectivity varied
across sub-areas. In particular, we were interested in the possibility
of a selective hierarchical discrimination of sounds that would
correlate with the anatomically determined hierarchy (see Fig.
4A and Vates et al. 1996
). To test this
hypothesis statistically, we tested for the effect of sub-area in the
mean d' for each stimulus type using one-way ANOVAs. This analysis was
restricted to the excitatory units because we did not have sufficient
numbers of inhibitory units within each area.
The analysis showed little evidence for hierarchical processing within these areas (Fig. 12). Statistically, there was no effect of sub-area for any con-synthetic d' values based on mean rates. The d' values based on max rates showed a significant sub-regional effect only for the con-WN comparison. The effect was significant for both multi-units and when the data from single-units and multi-units were combined. Overall, however, there was no clear pattern of any particular sub-region being more or less selective than the others. To complete this analysis, we also wanted to test the possibility that units became more selective to a smaller subset of stimuli even though, on average, one might find similar mean selectivity as one moves up in the processing stream. For example, higher order areas might have similar average selectivity for con versus tones as lower areas but with some neurons responding only to tones and others responding only to conspecifics. We therefore also tested for a differential in the selectivity by calculating the variance in z-scores for mean firing rates (including the onset responses) obtained across all con and synthetic stimuli for each recording site. The mapping of this measure of selectivity is shown in Fig. 5C. The difference in variance of z-scores across areas was also not statistically significant [F(5,124) = 1.045, P = 0.4], suggesting that the range of response strengths obtained from different stimuli is similar in the different sub-regions. In other words, we did not find an increase in selectivity in the neural responses as one moved up the auditory processing stream within the auditory forebrain.
|
Finally, in examining the maps in Fig. 5, we were able to consider the possibility of a functional mapping that did not coincide with the anatomical sub-divisions. Although our number of units was not sufficient to give us a statistically reliable measure of response for any particular small region of the forebrain, we were able to look for possible large or coarse functional sub-regions. A visual examination of the maps of Fig. 5 does not reveal any noticeable structure within any particular sagittal slice. On the other hand, we noticed that the medial slice seemed to exhibit greater selectivity. We tested this hypothesis by looking at differences in the mean d' across the medial, central, and lateral axes. Figure 13 shows the results of that analysis. For both mean and max rates, a one-way ANOVA showed a significant effect for the medial-lateral location for the con-pips distribution, with the medial section being the most selective. The other comparisons did not reach significance but the con-tones comparison showed a similar trend.
|
| |
DISCUSSION |
|---|
|
|
|---|
Auditory neurons throughout the auditory forebrain areas of field
L and cHV of adult zebra finches were found to prefer conspecific song
over a variety of synthetic stimuli designed to match some of the
acoustical features of zebra finch song. It had been previously shown
in both avian field L (Bonke et al. 1979
; Langner
et al. 1981
; Leppelsack 1978
; Leppelsack
and Vogt 1976
; Muller and Leppelsack 1985
) and
in mammalian auditory cortex (Newman and Wollberg 1978
; Rauschecker et al. 1995
; Wang et al.
1995
) that particular groups of neurons do not respond well to
simple sounds such as white noise or pure tones. These high-level
auditory neurons, however, did respond selectively to a small subset of
conspecific vocalizations or to synthetic sounds that closely
reproduced the spectro-temporal patterns of the preferred natural
sounds. It was thus suggested that such neurons could be used to
distinguish among species-specific calls. Since conspecific
vocalizations were particularly efficient at eliciting responses, these
studies also suggest the possibility that high-level auditory neurons
are selective for conspecific sounds (or, more generally, behaviorally
relevant sounds) relative to synthetic sounds (or behaviorally
nonrelevant sounds).
To test such a hypothesis, one needs to be specific about what is meant
by "selectivity for conspecific sounds" and how it is different
from the "selective responses" found previously. The selectivity
found in auditory neurons in previous studies describes the phenomenon
that as one moves up in the auditory processing stream, higher-level
neurons respond to a smaller number of stimuli chosen among a large set
of either synthetic or natural sounds. This increase in selectivity,
however, does not mean that the response of the higher-level neurons is
unpredictable or, equivalently, that it cannot be explained by a
neuronal transfer function. If the neuron's response is predictable,
then the explanation for the observed increase in selectivity is simply
that the particular set of acoustical features required to elicit a
neural response becomes more complex and that, given any stimulus
ensemble, such a set of complex acoustical features is found in a
smaller number of stimuli. Animal vocalizations, which are often
complex, could thus be potentially more efficient at eliciting
responses (Schafer et al. 1992
). To show "selectivity
for conspecific sounds," one would therefore need to show that the
acoustical features characterizing an ensemble of transfer functions
describing a population of neurons are particularly common in the
natural sound. Ultimately, therefore one would like to characterize the
actual transfer functions of auditory neurons and compare the
properties of the transfer functions to the acoustical properties found
in behaviorally relevant sounds. This approach has been very successful
in characterizing the selective responses of cortical neurons in the
echolocating bat (Suga et al. 1978
).
We began this process in the avian forebrain by systematically
investigating the form of the transfer function and the corresponding level of description of sound that would be necessary to characterize the stimulus-response functions of avian auditory forebrain neurons. To
do so, we generated synthetic stimulus ensembles that matched different
levels of the acoustical structure found in natural sounds. Our
synthetic tone pips and combination tones had the same power spectra
and AM as song and were designed to test whether the response
properties of auditory forebrain neurons could be described by a first
order transfer function. If the response of the neurons can be
explained by their frequency tuning and their amplitude-modulation
tuning, then the distribution of neural responses to conspecific song,
tones, and pips ensembles would be identical. On the contrary, we
discovered that the distributions of neural responses were not similar
as illustrated by a much larger variance in the distribution of d' for
the con-synthetic comparisons relative to the con-con one (Fig. 11).
Therefore to predict the full response of the neurons, the transfer
function must take into account joint spectral-temporal sound patterns or exhibit nonlinear response properties. A linear neuron that exhibits
sensitivity for joint spectral-temporal sound patterns would be fully
characterized by its spectral-temporal receptive field (STRF), but this
STRF would not be separable into the product of a function of frequency
(its frequency tuning curve) and a function of time (the impulse
response of its AM transfer function). A nonlinear neuron could for
example have responses to a harmonic stack that could not be explained
by the sum of its responses to the individual frequency components of
the stack. Neither one of these possibilities is surprising as both,
the nonlinear frequency tuning responses such as the two-tone
suppression (Sachs and Kiang 1968
) and the sensitivity
to unseparable spectral-temporal patterns such as tuning to frequency
sweeps (Suga and Schlegel 1973
), have been described
previously in the auditory system. More to the point, we also found
that the average mean and maximum responses to the conspecific sounds
were greater than those to the matched synthetic sounds (Fig. 9). We
can therefore conclude that the joint spectral-temporal tuning or the
nonlinear properties observed in auditory neurons in the songbird
forebrain are to some extent tuned to the particular higher order
structures of sound found in conspecific songs, thereby yielding
greater responses for these natural sounds.
The comparisons between responses to songs and those to pips and tones allow for a starting point in evaluating the contribution of known or putative nonlinearities that give rise to such selectivity. Since the intensity range of the pips ensemble was by necessity greater than the intensity of sound in narrow frequency bands for song (see METHODS), the preference for natural sounds over the pure tone pips could be explained by the (unlikely) possibility that a majority of neurons have nonmonotonic intensity response curves. This sole explanation cannot be true, since the random combination of pips with identical intensity values and AM spectra to sound in narrow frequency bands of song (our tones) also elicited significantly lower responses than conspecific song. The preference for song over tones therefore requires nonlinear spectral interactions or linear sensitivity to particular joint spectral-temporal patterns that would favor the particular spectral-temporal structure found in zebra finch song.
The responses to the ripple ensemble are informative in this respect.
The ripple ensemble was made of harmonic stacks designed to match the
harmonic stacks found in song. Of the three matched synthetic ensembles
(tones, pips, and ripples), the ripple ensemble had a modulation power
spectrum, describing the power in joint spectral-temporal acoustical
patterns, which is the closest to that of the song. Interestingly, of
the three ensembles, the difference in responses between song and
synthetic sounds was also the smallest for ripples, where the effect
size was particularly small. These small differences were obtained even
though the ripple sounds did not have the same power spectrum as songs.
In other words, to obtain matched neural responses between the
synthetic and the natural sounds, it seems more important to match the
modulation spectrum than the power spectrum. The small differences in
response also suggest that most of the selectivity for natural sounds
could potentially be explained by linear STRFs that are matched to the spectral-temporal structure of song. On the other hand, in previous work, we have shown that the linear STRFs are not able to completely describe the transfer function of avian neurons (Theunissen et al. 2000
) and therefore some of the selectivity might still
arise from nonlinear processing. By directly calculating the STRFs
obtained from song and synthetic ensembles with matched power and
modulation spectra, these two effects could be untangled.
The selectivity for conspecific song relative to matched synthetic
songs of the neurons in field L was also found to be heterogenous. Although on average the neurons responded more strongly to conspecific song, we found a significant fraction of neurons with the opposite selectivity (see RESULTS). This range of responses and
corresponding selectivity is remarkably different from the selectivity
for bird's own song found in auditory neurons in HVc, where the entire
population of neurons shows the same preference (see for example Fig. 4 in Theunissen and Doupe 1998
). We also found that 40%
of neurons in field L and cHV had a sustained response to white noise
that was similar or greater than that to song. The response to white noise was also qualitatively different in that it was not phase-locked to the sound, resulting in much smaller maximum firing rates once the
onset response was excluded. Although our particular analysis did not
reveal two distinct populations of neurons, our observations are
consistent with the observations that neurons in the mammalian inferior
colliculus and auditory cortex can be divided into two groups: neurons
that respond strongly and phase-lock to slower AM sounds (such as
animal vocalizations) and neurons that respond strongly in a sustained
fashion but do not phase-lock to rapid spectral-temporal modulations
such as those found in broadband white noise (Escabi and
Schreiner 2002
; Lu et al. 2001
). The
heterogenous responses that we found in the auditory forebrain of zebra
finches support the idea that, contrary to the song system which might be solely involved in the sensorimotor processing of the bird's own
song or related conspecific songs, the auditory forebrain is
responsible for processing a much larger ensemble of sounds. However,
among the sounds that cover the frequency and temporal characteristics
of zebra finch song, neurons are selective for the particular
spectral-temporal patterns found in conspecific song. Finally, it
should be noted that the neural responses in the avian auditory
forebrain have been shown to be affected by urethane. In particular,
the relative strength of the phasic versus sustained responses as well
as the selectivity to complex sounds change for auditory forebrain
areas that do not receive direct auditory input from the thalamus
(Capsius and Leppelsack 1996
). Therefore to relate the
selectivity of these high level auditory neurons found in this study to
perceptual behavior, differences between the awake and the anesthetized
preparation will need to be examined.
The selectivity in the avian forebrain for conspecific song over
similar sounds supports a theory of hierarchical auditory processing of
conspecific vocalizations where the selectivity for more specific
acoustical patterns found in conspecific song emerges as one ascends
the auditory pathway. At a coarse level, the selectivity in the avian
auditory forebrain would be placed between the avian auditory
periphery, which is not specialized for processing specific complex
sounds (Sachs et al. 1980
), and the song system where
highly specialized auditory neurons respond preferentially to the sound
of the bird's own song (Doupe 1997
; Margoliash
1986
; Theunissen and Doupe 1998
). It remains to
be seen whether the selectivity for conspecific song is first observed in the auditory forebrain or whether it is also present in some form at
earlier stages of processing, either in the thalamus or in the
midbrain. Neurons with complex response properties similar to those of
field L have been described in the avian auditory midbrain
(Scheich et al. 1977
); however, it is not known whether auditory neurons in the auditory midbrain preferentially encode conspecific sounds or other natural sounds. It is also possible that
the song system gets auditory input directly from the midbrain or
thalamus without the additional processing steps in field L and cHV. In
particular, it is known, that both Nif and HVc receive strong input
from nucleus uvaeformis (Uva) in the thalamus (Fortune and
Margoliash 1995
; Nottebohm et al. 1982
), and HVc
and RA receive cholinergic input from the ventral paleostriatum
(VP) in the basal forebrain. VP in turn receives auditory information
directly from the auditory thalamic relay nucleus Ovoidalis (Li
et al. 1999
).
The interpretation of our results in support of hierarchical
processing within the anatomically determined subdivisions of field L
and cHV is more complicated. A feedforward hierarchical processing
theory compatible with the anatomy (Fig. 1A and Vates et al. 1996
) would suggest that selectivity increases as one
moves up in the connectivity stream from sub-fields L2a and L2b to L1, L3, and further along to cHV. Previous physiological data have also
supported this view, finding that neurons in sub-fields L1 and L3 have
more complex response properties and are more selective for conspecific
sounds than those in L2 (Bonke et al. 1979
;
Langner et al. 1981
; Lewicki and Arthur
1996
; Sen et al. 2001
). On the other hand, our
analysis of selectivity based on the d' measure showed no significant
differences across the different subdivisions. Notably, we found that
the neurons in L2, though responding robustly to both synthetic and
natural stimuli, showed an even greater response to the natural sounds.
The magnitude of such an enhanced response is similar to the magnitude
found in more complex neurons in L1 or L3. Therefore according to this
measure of selectivity strength, there does not seem to be hierarchical
processing from L2 to the higher auditory areas. Since L2 receives
extensive feedback from L1 and L3 (Vates et al. 1996
),
it is possible that the selectivity for natural sounds is an emergent
property of the entire auditory forebrain network. Also, a postsynaptic
area, such as the song system, could take advantage of the fact that
highly selective auditory neurons can be found in each sub-area by
selectively connecting to those specific neurons. This hypothesis could
be verified with dual recordings.
We did, however, observe differences in response strength between field
L (L1/L2a/L2b/L3/l) and cHV. Although the strength of the selectivity
for natural sounds was similar for field L and cHV, cHV neurons showed
weaker auditory responses to both synthetic and natural stimuli. In
some cases, only the natural stimuli could elicit significant neural
responses. In other words, our results are not incompatible with
changes in the encoding of complex sounds, from a more distributed
encoding in field L to a sparser representation in cHV. The nature of
the nonlinearity would therefore also be different in the different
areas. Additionally, it is possible that the functional and anatomical
subdivisions within field L do not coincide exactly as it has been
suggested in a functional mapping study based on the responses to pure
tones (Gehr et al. 1999
). We observed a significant
difference in selectivity between the most medial part of the auditory
forebrain and the more central and lateral areas. Since the
medial-lateral axis runs approximately perpendicular to the major
dorsal/caudal and ventral/rostral tonotopic axes, we obtain a coarse
two-dimensional functional map with frequency tuning along one axis and
selectivity (or response complexity) along the other. Such organization
is reminiscent of the functional map of the mammalian auditory cortex (Schreiner 1995
).
Behaviorally, the selectivity for conspecific song relative to
matched synthetic sounds could play a role in song learning. If this
selectivity is innate, the enhanced responses to conspecific sounds
could explain the innate preference for conspecific song as models in
naïve young birds (reviewed in Marler 1997
). On the other hand, there are reasons to believe that this selectivity could be a result of experience. Previous experiments have shown that
the properties of auditory neurons in the caudal neostriatum are shaped
by the song-learning process (Gehr et al. 2000
;
Konishi 1978
). Also, there is no reason to believe that
the extensive plasticity observed in the auditory cortex would be
unique to the mammalian auditory forebrain (reviewed in
Rauschecker 1999
). For these reasons, it is likely that
environmental factors play an essential role in shaping auditory
processing in the avian auditory forebrain. Characterization of
auditory selectivity throughout the song learning process in normal
birds and in birds that have been deprived of normal auditory
experience will enable us to address these important questions.
| |
ACKNOWLEDGMENTS |
|---|
We thank S. Woolley and T. Fremouw for valuable comments on the manuscript. We are also grateful to K. Hillman and C. Fry for invaluable experimental and technical support.
This work was supported by research grants from the National Institute of Mental Health, the Searle Foundation, and the Sloan Foundation to F. E. Theunissen.
| |
FOOTNOTES |
|---|
Address for reprint requests: F. Theunissen, Univ. of California, Berkeley, Dept of Psychology, 3210 Tolman Hall, Berkeley, CA 94720-1650 (E-mail: fet{at}socrates.berkeley.edu).
| |
REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
G. D. Grana, C. P. Billimoria, and K. Sen Analyzing Variability in Neural Responses to Complex Natural Sounds in the Awake Songbird J Neurophysiol, June 1, 2009; 101(6): 3147 - 3157. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Huetz, B. Philibert, and J.-M. Edeline A Spike-Timing Code for Discriminating Conspecific Vocalizations in the Thalamocortical System of Anesthetized and Awake Guinea Pigs J. Neurosci., January 14, 2009; 29(2): 334 - 350. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Loui, E. H. Wu, D. L. Wessel, and R. T. Knight A Generalized Mechanism for Perception of Pitch Patterns J. Neurosci., January 14, 2009; 29(2): 454 - 459. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Norena, B. Gourevitch, M. Pienkowski, G. Shaw, and J. J. Eggermont Increasing Spectrotemporal Sound Density Reveals an Octave-Based Organization in Cat Primary Auditory Cortex J. Neurosci., September 3, 2008; 28(36): 8885 - 8896. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Gill, S. M. N. Woolley, T. Fremouw, and F. E. Theunissen What's That Sound? Auditory Area CLM Encodes Stimulus Surprise, Not Intensity or Intensity Changes J Neurophysiol, June 1, 2008; 99(6): 2809 - 2820. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. D Young Neural representation of spectral and temporal information in speech Phil Trans R Soc B, March 12, 2008; 363(1493): 923 - 945. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Boumans, C. Vignal, A. Smolders, J. Sijbers, M. Verhoye, J. Van Audekerke, N. Mathevon, and A. Van der Linden Functional Magnetic Resonance Imaging in Zebra Finch Discerns the Neural Substrate Involved in Segregation of Conspecific Song From Background Noise J Neurophysiol, February 1, 2008; 99(2): 931 - 938. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. S. Shaevitz and F. E. Theunissen Functional Connectivity Between Auditory Areas Field L and CLM and Song System Nucleus HVC in Anesthetized Zebra Finches J Neurophysiol, November 1, 2007; 98(5): 2747 - 2764. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. U. Voss, K. Tabelow, J. Polzehl, O. Tchernichovski, K. K. Maul, D. Salgado-Commissariat, D. Ballon, and S. A. Helekar Functional MRI of the zebra finch brain during song stimulation suggests a lateralized response topography PNAS, June 19, 2007; 104(25): 10667 - 10672. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Amin, A. Doupe, and F. E. Theunissen Development of Selectivity for Natural Sounds in the Songbird Auditory Forebrain J Neurophysiol, May 1, 2007; 97(5): 3517 - 3531. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. E. Cohen, F. Theunissen, B. E. Russ, and P. Gill Acoustic Features of Rhesus Vocalizations and Their Representation in the Ventrolateral Prefrontal Cortex J Neurophysiol, February 1, 2007; 97(2): 1470 - 1484. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Wang, R. Narayan, G. Grana, M. Shamir, and K. Sen Cortical Discrimination of Complex Natural Stimuli: Can Single Neurons Match Behavior? J. Neurosci., January 17, 2007; 27(3): 582 - 589. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Narayan, G. Grana, and K. Sen Distinct Time Scales in Cortical Discrimination of Natural Sounds in Songbirds J Neurophysiol, July 1, 2006; 96(1): 252 - 258. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. N. Woolley, P. R. Gill, and F. E. Theunissen Stimulus-Dependent Auditory Tuning Results in Synchronous Population Coding of Vocalizations in the Songbird Midbrain J. Neurosci., March 1, 2006; 26(9): 2499 - 2512. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Narayan, A. Ergun, and K. Sen Delayed Inhibition in Cortical Receptive Fields and the Discrimination of Complex Stimuli J Neurophysiol, October 1, 2005; 94(4): 2970 - 2975. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. N. Woolley and J. H. Casseday Processing of Modulated Sounds in the Zebra Finch Auditory Midbrain: Responses to Noise, Frequency Sweeps, and Sinusoidal Amplitude Modulations J Neurophysiol, August 1, 2005; 94(2): 1143 - 1157. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Hsu, S. M. N. Woolley, T. E. Fremouw, and F. E. Theunissen Modulation Power and Phase Spectrum of Natural Sounds Enhance Neural Encoding Performed by Single Auditory Neurons J. Neurosci., October 13, 2004; 24(41): 9201 - 9211. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. J. Terpstra, J. J. Bolhuis, and A. M. den Boer-Visser An Analysis of the Neural Representation of Birdsong Memory J. Neurosci., May 26, 2004; 24(21): 4971 - 4977. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Cardin and M. F. Schmidt Auditory Responses in Multiple Sensorimotor Song System Nuclei Are Co-Modulated by Behavioral State J Neurophysiol, May 1, 2004; 91(5): 2148 - 2163. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. N. Woolley and J. H. Casseday Response Properties of Single Neurons in the Zebra Finch Auditory Midbrain: Response Patterns, Frequency Coding, Intensity Coding, and Spike Latencies J Neurophysiol, January 1, 2004; 91(1): 136 - 151. [Abstract] [Full Text] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Visit Other APS Journals Online |