|
|
||||||||
1Department of Psychological and Brain Sciences, Center for Cognitive Neuroscience, Dartmouth College, Hanover, New Hampshire; and 2Department of Psychology, Helen Wills Neuroscience Institute, University of California at Berkeley, Berkeley, California
Submitted 25 July 2006; accepted in final form 26 November 2006
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
For example, auditory communication signals (i.e., species-specific vocalizations) are especially important in the socioecology of several species of nonhuman primates (Cheney and Seyfarth 1985
; Eimas 1994
; Eimas et al. 1971
; Hauser 1997
; Jusczyk 1997
; Jusczyk et al. 1983
; Miller and Eimas 1995
), such as rhesus monkeys (Macaca mulatta). Vocalizations convey information about the identity and the age of the caller and often provide information about sex and emotional or motivational state (Cheney and Seyfarth 1990
; Hauser 1997
). Some vocalizations transmit information about objects and events in the environment (Gifford 3rd et al. 2003
; Hauser 1998
; Seyfarth and Cheney 2003
).
In rhesus monkeys, the ventrolateral prefrontal cortex (vPFC) plays an important role in processing vocalizations (Hackett et al. 1999
; Romanski and Goldman-Rakic 2002
; Romanski et al. 1999
, 2005
). The vPFC is thought to be part of a circuit involved in representing auditory objects (Cohen et al. 2004b
; Rauschecker 1998
; Romanski et al. 1999
, 2005
). In particular, the vPFC may be part of a circuit that processes socially meaningful signals (Cohen et al. 2006
; Deacon 1992
; Gifford 3rd et al. 2005
).
A fuller and more comprehensive understanding of vocalization processing in the vPFC requires that we understand the acoustic features of the rhesus vocalizations and how these acoustic features relate to neural activity. We first characterized the acoustic structure of rhesus vocalizations by calculating their modulation spectra; the modulation spectrum quantifies the spectral and temporal features of an auditory stimulus as seen in a spectrographic representation. The structure of these spectra was similar to that found in other ensembles of natural stimuli. Next, we tested whether the tuning of vPFC neurons is designed to maximize the acoustic differences that exist between vocalizations; this type of tuning is hypothesized to facilitate an animal's capacity to discriminate between different vocalizations (Woolley et al. 2005
). Finally, using vocalizations, we estimated the spectrotemporal receptive field (STRF) of vPFC neurons to test whether the responses of vPFC neurons are modulated preferentially by the first-order (linear) acoustic features of an auditory stimulus. The results of these two neurophysiological studies suggest that vPFC neurons are not modulated preferentially by these features.
| METHODS |
|---|
|
|
|---|
Acoustic properties of rhesus vocalizations: modulation spectrum
The spectrotemporal modulations that are present in complex sounds, such as vocalizations, can be characterized by generating a modulation power spectrum (or modulation spectrum) (Singh and Theunissen 2003
; Theunissen et al. 2004
). Analogous to decomposing an acoustic waveform into a series of sine waves, a (log) spectrographic representation of an auditory stimulus can be decomposed into a series of sinusoidal gratings that characterizes the temporal modulations (in Hertz, Hz) and the spectral modulations (in cycles per Hz or octave) of the stimulus. The two-dimensional plot that illustrates the squared amplitude of the temporal- and spectral-modulation rates of a sound is the modulation spectrum. Modulation spectra can be calculated for a single stimulus or can be averaged with other stimuli to characterize the statistics of a particular class of stimuli.
In this study, we calculated the modulation spectra of rhesus vocalizations. The vocalizations were recorded and digitized as part of an earlier set of studies (Hauser 1998
). Each vocalization was assigned to one of 10 major classes. These classes are defined based on both their acoustic similarities and their behavioral significance. Our data set contains exemplars from each of these classes: 57 aggressives, 23 coos, 32 copulation screams, 24 gekkers, 42 grunts, 25 girneys, 19 harmonic arches, 46 screams, 20 shrill barks, and four warbles. We did not attempt to cluster independently the sound exemplars using their acoustical signature, as defined by their modulation spectrum, because we wanted to preserve the behavioral information, although our acoustical analyses could in theory also be used for such a classification task.
The first step in the estimation of the modulation spectrum is to calculate the spectrographic representation for each vocalization exemplar. The spectrographic representation was obtained with a filter bank of Gaussian-shaped filters whose gain function had a bandwidth of 32 Hz (measured as a SD). We used 299 filters that had center frequencies ranging from 32 Hz to 10 kHz. The corresponding Gaussian-shaped windows in the time domain had a temporal bandwidth of 5 ms. These parameters defined the timefrequency scale of the spectrogram and the upper limits of the spectral- and temporal-modulation frequencies that could be characterized by the spectrogram: 16.25 cycles/kHz and 100.5 Hz, respectively (Singh and Theunissen 2003
). These filter parameters resulted in very little energy at the edge of the modulation spectrum as determined by the timefrequency scale of the spectrogram. This observation suggests that we were able to capture most of both the temporal and spectral fluctuations in the sounds with a single timefrequency scale.
These timefrequency scales differ somewhat from our previous studies (Singh and Theunissen 2003
). In our initial characterization of the statistics of natural sounds, we used timefrequency scales corresponding to wider frequency filters (62, 125, and 250 Hz). These scales are appropriate for characterizing sounds with fast temporal modulations, such as zebra finch song and other environmental sounds. However, these scales do not allow for a characterization of fine spectral modulations. In the current study, we used a narrower filter (32 Hz) because a finer spectral modulation resolution was needed to discriminate between the characteristic spectrotemporal features of rhesus vocalizations and also to compare them with those found in human speech.
To calculate the modulation spectrum of a vocalization, we calculated the two-dimensional Fourier transform of each vocalization's log spectrogram. If a vocalization was <1 s, it was zero padded until its length was 1 s. If a vocalization was >1 s, the two-dimensional Fourier transform was calculated for nonoverlapping 1-s segments. Each 1-s segment was windowed with a Hamming window (Oppenheim et al. 1983
). The modulation spectrum was calculated by averaging the power (amplitude squared) of the two-dimensional Fourier transform. We report the class-based spectra, which were calculated by averaging the individual modulation spectra from each exemplar within each acoustic class. We also calculated the "composite" modulation spectrum, which is the average of the 10 class-based spectra, and the coefficient of variance between the class-based spectra.
For comparison, we calculated the modulation spectrum for zebra finch song, human speech, and environmental noise. Zebra finch song was recorded for adult zebra finch males in the laboratory of Dr. F. E. Theunissen. The human-speech exemplars were obtained from a database of 100 short English sentences spoken by native male and female American English speakers (Tyler and Preece 1990
). Environmental sounds were natural sounds that were not produced by animal vocalizations but by weather, water, or fire; these sounds were obtained from commercial audio CD recordings. The same sounds were previously analyzed in more detail (Singh and Theunissen 2003
). Similarities between modulation spectra were estimated using the correlation coefficient (r). Classical multidimensional scaling that used 1 r as a distance metric was used to visualize the pairwise distances between the modulation spectra. All calculations were done in the MATLAB (The MathWorks, Natick, MA) programming environment.
Neural recordings and analysis
SUBJECTS. Two female rhesus monkeys (Macaca mulatta) were used in these experiments. Both monkeys were trained on the task described in this study. They weighed between 8.0 and 9.0 kg. All surgical, recording, and training sessions were in accordance with the National Institutes of Health's Guide for the Care and Use of Laboratory Animals and were approved by the Dartmouth Institutional Animal Care and Use Committee. Neither monkey had been operantly trained to make behavioral responses to auditory stimuli.
SURGICAL PROCEDURES. Surgical procedures were conducted under aseptic, sterile conditions, using general anesthesia (isoflurane). These procedures were performed in a dedicated surgical suite operated by the Animal Resource Center at Dartmouth College.
In the first procedure, titanium bone screws were implanted in the skull and a methylmethacrylate implant was constructed. A Teflon-insulated, 50-gauge stainless steel wire coil was also implanted between the conjunctiva and the sclera; the wire coil allowed us to monitor the monkey's eye position (Judge et al. 1980
). Finally, a head-positioning cylinder (FHC-S2; Crist Instruments, Hagerstown, MD) was embedded in the implant. This cylinder connected to a primate chair and stabilized the monkey's head during behavioral-training and recording sessions.
After the monkeys learned the passive-listening task (see following text), a craniotomy was performed and a recording cylinder (ICO-J20, Crist Instruments) was implanted. This surgical procedure provided chronic access to the vPFC for neurophysiological recordings.
EXPERIMENTAL SETUP.
Behavioral training and recording sessions were conducted in a room with sound-attenuating walls. The walls of the room were covered with anechoic foam insulation (Sonomatt; Auralex, Indianapolis, IN). While inside the room, the monkeys were seated in the primate chair and placed in front of a stimulus array; because the room was darkened, the speaker producing the auditory stimuli was not visible to the monkeys. The primate chair was placed in the center of a 1.2-m-diameter, two-dimensional, magnetic coil (C-N-C Engineering, Seattle, WA) that was part of the eye-position monitoring system (Judge et al. 1980
). Eye position was sampled with an A/D converter (PXI-6052E; National Instruments, Austin, TX) at a rate of 1.0 kHz. The monkeys were monitored during all sessions with an infrared camera.
STIMULUS ARRAY. Auditory stimuli were presented from a speaker (PLX32; Pyle, Brooklyn, NY) that was 1.2 m in front of the monkey; the speaker was 1.2 m above the floor, which was at the monkeys' eye level. A red light-emitting diode (LED; model 276307; Radio Shack, Fort Worth, TX) was also mounted on the face of this speaker. This "central" LED served as a fixation point for the monkeys during the passive-listening task (see following text). The central LED subtended <0.2° of visual angle and had a luminance of 12.6 cd/m2.
BEHAVIORAL (PASSIVE-LISTENING) TASK.
During the passive-listening task, 1,0001,500 ms after fixating the central LED, an auditory stimulus was presented from the speaker. To minimize any potential changes in neural activity arising from changes in eye position (Cohen and Andersen 1998
; Groh et al. 2001
; Mullette-Gillman et al. 2005
), the monkeys maintained their gaze at the central LED during auditory-stimulus presentation and for an additional 1,0001,500 ms after auditory-stimulus offset to receive a juice reward.
AUDITORY STIMULI. The auditory exemplars from the Noise and the STRF procedures were recorded to disk. Each exemplar was filtered to compensate for the transfer-function properties of the speaker and the acoustics of the room. Each filtered exemplar was presented through a D/A converter (DA1; Tucker-Davis Technologies, Alachua, FL), an amplifier (SA1, Tucker Davis Technologies; MPA-250, Radio Shack), and transduced by the speaker. Each exemplar was presented at an average sound level of 65 dB SPL (sound pressure level, relative to 20 µPa).
Noise procedures.
The band-limited noise was designed to cover the range of spectrotemporal modulations found in the rhesus vocalizations (see RESULTS). To cover this range, we generated 16 classes of noise with different band-limited spectrotemporal modulations (see Fig. 1). Along the temporal-modulation axis, the band-limited noise covered the range from 0 to 20 Hz in 5-Hz steps. Along the spectral-modulation axis, the noise covered the range from 0 to 4 cycles/kHz in steps of 1 cycle/kHz. Note that all of the noise stimuli contained both positive and negative modulations corresponding to a mixture of upward and downward sweeps (Singh and Theunissen 2003
). The band-pass filters were applied simultaneously in both the temporal and spectral dimension. For example, a class of noise might consist of sounds with temporal modulations between 10 and 15 Hz (as well as between 10 and 15 Hz) and spectral modulations between 2 and 3 cycles/kHz (see Fig. 1G). Within each class of noise, we generated 10 exemplars. The duration of each exemplar was 500 ms, which approximated the mean duration of our digitized collection of rhesus vocalizations. Each of the 10 exemplars from each of the 16 noise classes was generated before the recording sessions and recorded to disk.
|
|
The vPFC was identified by its anatomical location and its neurophysiological properties. The vPFC is located anterior to the arcuate sulcus and area 8a and lies below the principal sulcus (Cohen et al. 2004b
; Romanski and Goldman-Rakic 2002
). vPFC neurons were further characterized by their strong responses to auditory stimuli (Cohen et al. 2004b
; Gifford 3rd et al. 2005
; Newman and Lindsley 1976
; Romanski and Goldman-Rakic 2002
).
RECORDING STRATEGY.
An electrode was lowered into the left vPFC; the left hemisphere of rhesus monkeys is thought to be specialized for processing vocalizations (Hauser and Andersson 1994
; Poremba et al. 2004
). To minimize sampling bias, any neuron that was isolated was tested.
Noise procedures.
Once a neuron was isolated, we randomly chose three classes of the band-limited noise; in some of the earlier recording sessions, we used only two classes of noise. Next, the monkeys participated in blocks of trials of the passive-listening task. In each block, the noise exemplars were presented in a balanced pseudorandom order. The intertrial interval is 12 s. Neural data were collected from
10 presentations of each of the 10 exemplars in each class.
STRF procedures.
Once a neuron was isolated, the monkeys participated in blocks of trials of the passive-listening task. In each block, the vocalization exemplars were presented in a balanced pseudorandom order. Because we needed to collect neural data from
40 s (Sen et al. 2001
; Theunissen et al. 2000
) of auditory-stimulus presentation, data were collected from
200 successful trials at an intertrial interval of 12 s.
NEURAL DATA ANALYSIS. Neural activity recorded during the passive-listening task was tested during the "baseline" and "stimulus" periods. The baseline period began 50 ms after the monkey fixated the central LED and ended 50 ms before auditory-stimulus onset; neural activity was aligned relative to the time when the monkey fixated the central LED. The stimulus period began at auditory-stimulus onset and ended at its offset; neural activity was aligned relative to the onset of the auditory stimulus. Data were analyzed in terms of a vPFC neuron's firing rate (i.e., the number of action potentials divided by task-period duration).
A two-tailed t-test determined whether a vPFC neuron's response was modulated by the band-limited noise or the vocalizations. A neuron was classified as "auditory" if the results of the t-test indicated that the neuron's mean baseline- and stimulus-period firing rates were reliably different at a level of P < 0.05.
Noise procedures. The selectivity of a vPFC neuron to the spectrotemporal modulations of each noise exemplar was quantified with three analyses: a z-score (response-strength) analysis, an information analysis, and a response-selectivity analysis.
The z-score (Grace et al. 2003
) was calculated on a neuron-by-neuron basis and class-by-class basis. This measure quantifies the normalized difference between a neuron's stimulus-period firing rate and the baseline-period firing rate. The z-score was defined as
![]() | (1) |
s2 is the variance of the response during the stimulus period, and
b2 is the variance of the response during the baseline period. Cov (s, b) is the covariance between the mean stimulus- and baseline-period firing rates. The amount of information was calculated on a neuron-by-neuron basis and a class-by-class basis. "Band-limited noise information" was the amount of information carried in the stimulus-period firing rates of vPFC neurons regarding differences between the noise exemplars.
We quantified the amount of information (Cover and Thomas 1991
; Shannon 1948a
,b
) carried in the firing rate of vPFC neurons using a formulization analogous to the one we described previously (Cohen et al. 2004b
; Gifford 3rd and Cohen 2004
). Information (I) was defined as
![]() | (2) |
To eliminate biases in the amount of information arising from small sample sizes, information rates were bias-corrected (Cohen et al. 2004b
; Gifford 3rd and Cohen 2004
; Grunewald et al. 1999
; Panzeri and Treves 1996
). On a neuron-by-neuron basis, we first calculated the amount of information from the original data and from bootstrapped trials. In bootstrapped trials, the relationship between a neuron's firing rate and the noise exemplar was randomized and then the amount of information was calculated. This process was repeated 500 times and the median value from this distribution of values was determined. The bias-corrected information was calculated by subtracting the median amount of information obtained from bootstrapped trials from the value obtained from the original data.
The response-selectivity index (Grunewald and Skoumbourdis 2004
; Rolls and Tovee 1995
; Vinje and Gallant 2002
) was calculated on a neuron-by-neuron basis and a class-by-class basis. The response-selectivity index (RI) was defined as
![]() | (3) |
STRF procedures.
The STRF of a neuron was generated using STRFPAK (http://strfpak.berkeley.edu), a MATLAB toolbox developed by Theunissen and colleagues (Grace et al. 2003
; Sen et al. 2001
; Theunissen and Doupe 1998
; Theunissen et al. 2000
, 2001
). A STRF was constructed from a neuron's responses to vocalizations.
We defined the STRF as the first-order Volterra kernel that relates the spectrographic representation of an auditory-stimulus exemplar to a peristimulus time histogram (PSTH) (DePireux et al. 2001
; Eggermont et al. 1983
; Escabí and Schreiner 2002
; Theunissen et al. 2000
). A neuron's STRF is hi(t) such that
![]() | (4) |
r0(t), the time-varying mean firing rate average across all stimuli, was obtained by smoothing the average PSTH to all stimuli except for the sound being analyzed with a 40-ms Hamming window. r0(t) is thus the expected mean response to all sounds irrespective of the actual spectrotemporal content. The STRF model then captures the deviations in the response that result from the specific spectrotemporal features of the sound being processed. We called this model the "time-varying mean" STRF. We also estimated the STRF with a fixed mean rate, with r0(t) replaced by its mean value over time. We called this more classical model the "fixed-mean" STRF.
The estimated and predicted neural responses and the spectrographic representation of the sound si(t) were all sampled at 1 kHz. This sampling rate was more than sufficient to capture the observed spiking precision as well as the temporal variations in the spectrographic representation, which are band-limited by the bandwidth of the filter or 62.5 Hz (Flanagan 1980
).
The STRF [hi(t)] is estimated by minimizing the mean-square difference between the predicted [rpre(t)] and the actual [ract(t)] firing rates. The solution for h{i}(t) is given by a set of linear equations for each temporal frequency
t in the Fourier domain of t.
In vector notion, this set of equations is: A
·
= 
. The autocorrelation matrix of the auditory stimulus (A
) in the frequency domain is defined as
![]() | (5) |
) is the Fourier transform of si(t), "*"denotes the complex conjugate, and
·
denotes the cross-moments calculated by averaging over the samples.
{i}(
) is the Fourier transform of the set of h{i}(t)
![]() | (6) |

is the Fourier transform of the cross-correlation between the stimulus envelope and the spike train
![]() | (7) |
) is the Fourier transform of ract(t) r0(t).
A neuron's STRF was calculated by first inverting A
at each temporal frequency
: 
= A
1·
. This inversion was performed using the pseudoinverse methodology. That is, the cross-correlation vector is projected into the subspace spanned by the eigenvectors of the stimulus autocorrelation with the largest eigenvalues. In this fashion, the inversion performs both a normalization step (normalizing for nonuniform power distribution found in the vocalizations) and a regularization step (limiting the estimation of STRF parameters for the spectrotemporal modulations that have significant power). The actual extent of the stimulus subspace with significant power is determined by cross-validation: STRFs are estimated using different numbers of eigenvalues and the STRF that yields the best prediction on a validation trial is chosen (Theunissen et al. 2001
; Woolley et al. 2006
).
Goodness-of-fit estimates were determined in two different ways: 1) coherence and 2) cross-correlation. The first goodness-of-fit estimate, I, was determined by integrating the coherence function (
2) over all
. Coherence was defined as
![]() | (8) |
) is the Fourier transform of rpre(t). The closer that
2 is to a value of 1, the more linearly related are Ract and Rpre. The second goodness-of-fit estimate was calculated by the cross-correlation coefficient (CC) between the predicted and actual firing rates
![]() | (9) |
For both goodness-of-fit estimates, rpre was calculated by convolving the STRF with an auditory-stimulus exemplar, adding r0(t), and rectifying the result. ract was generated by smoothing the PSTH with a window of variable sizes. To compare data across the population of neurons, we report the cross-correlation coefficient from a constant window size (31 ms).
VERIFICATION OF RECORDING LOCATIONS.
Magnetic resonance images were used to visualize the recording microelectrode in each monkey's brain. These images were obtained at the Dartmouth Brain Imaging Center using a GE 1.5T scanner (3-D T1-weighted gradient echo pulse sequence with a 5-in. receive-only surface coil) (Cohen et al. 2004a
,b
). These images confirmed that our recordings were from the vPFC.
| RESULTS |
|---|
|
|
|---|
The modulation spectra of rhesus vocalizations were calculated from a corpus of recordings that identified 10 classes of vocalizations (Hauser 1998
). We calculated the modulation spectrum for each exemplar within a vocalization class. The average modulation spectrum for each of the 10 classes of rhesus vocalizations is shown in Fig. 3. As can be seen, these "class-based" spectra had most of their energy at low spectral and temporal frequencies. A similar pattern was observed when we calculated the composite modulation spectrum (Fig. 3K). The composite modulation spectrum was calculated by averaging together the 10 class-based spectra (Fig. 3, AJ). In general, the composite modulation spectrum had high power at low modulation frequencies that rapidly decreased at higher frequencies. This pattern is characteristic of all natural sounds (Singh and Theunissen 2003
). In addition, most of the power for the medium to higher spectral modulation frequencies was found only at the very lowest temporal modulations, a characteristic feature of animal vocalizations (Singh and Theunissen 2003
). This property reflects the fact that sounds in vocalizations that have the most spectral structure (i.e., harmonic sounds with clear pitch percept such as vowels in human speech) tend to be slower.
|
|
|
To examine more systematically which spectrotemporal modulations varied across different rhesus-vocalization classes, we calculated the coefficient of variance between the class-based spectra (Fig. 3L). At low spectral- and temporal-modulation frequencies, the coefficient of variance was low. However, at spectral-modulation frequencies between 2 and 5 cycles/kHz, the amount of variance was high. Similarly, at high temporal-modulation frequencies (between 5 and 20 Hz), the coefficient of variance was also high. The average between-class coefficient of variance of the modulation spectra was 1.98.
Within each vocalization class, we also observed variability between the different exemplars. The within-class variability, though, was smaller than the between-class variance and was concentrated in the modulations that were the most characteristic of that vocalization type (data not shown). In other words, the modulation frequencies with the largest within-class variability did not necessarily overlap with the modulation frequencies with high between-class variability. The average within-class coefficient of variance was 1.01.
The intermediate spectrotemporal modulations, shown in Fig. 3L, may therefore be more informative for determining the class of the vocalization. We thus hypothesized that vPFC neurons may be particularly sensitive to these spectrotemporal modulations. To test this hypothesis, we recorded the responses of vPFC neurons to band-limited noise with spectrotemporal modulations similar to those found in vocalizations (see RESPONSES OF VPFC NEURONS TO BAND-LIMITED NOISE).
Auditory responses of vPFC neurons
We recorded from the left vPFC of two rhesus monkeys; 33 neurons were collected from one monkey and 24 from the second. Neural data were pooled for presentation because the results of our analyses were not reliably different (P > 0.05) between the two monkeys.
RESPONSES OF VPFC NEURONS TO BAND-LIMITED NOISE.
We report data from 57 auditory neurons in which we were able to record data in response to
10 presentations of each of the 10 exemplars in a tested noise class. In preliminary studies, we tested only two noise classes (n = 11), but for most of the reported data, we tested three noise classes.
vPFC activity was modulated strongly by band-limited noise. An example neuron is shown in Fig. 5 A. This vPFC neuron responded robustly to each of the three classes of band-limited noise. A different neuron that also responded robustly to band-limited noise is shown in Fig. 5B. On visual inspection, the response profiles for these two example neurons to the three different noise classes were very similar. This result was quantified for the entire data set in the following analyses.
|
Results of the z-score analysis for each individual vPFC neuron are shown in Figs. 5 and 6A. On a neuron-by-neuron basis and as a function of each noise class tested, we calculated a z-score value from the stimulus-period firing rate. Next, using a color map, we transformed these z-score values into a color representation, with red representing the highest z-score and blue representing the lowest z-score. Colored squares were then plotted as a function of the spectral and temporal modulations of each z-score's corresponding noise class. The insets in Fig. 5 show results of this z-score analysis for each of the two single-neuron examples. Each of the small panels in Fig. 6A represents the results of this analysis for a subset of the recorded neurons.
|
Results of the information analysis and the response-selectivity index analysis are shown in Fig. 7, top and bottom, respectively. The mean amount of band-limited noise information ranged from 0.1 to 0.2 bits; the theoretical maximum value is 3.3 bits. The SD of the information ranged from 0.02 to 0.09 bits. Using a variant of this information analysis (Chechik et al. 2006
), we tested the amount of band-limited noise information at different temporal resolutions from 50 to 500 ms. Because the duration of each noise burst was 500 ms, this resolution was equivalent to calculating the information value using a rate code. We could not identify any structure at any of the resolutions nor could we identify a trend as the resolution increased from 50 to 500 ms (data not shown). The mean response-selectivity index values ranged from 2.7 to 9.0 and the SD ranged from 1.6 to 7.9. Similar to the z-score analysis, tuning in the response of vPFC activity for both analyses does not seem to match the variance of the acoustic features of the vocalizations.
|
10 presentations of each of the 10 vocalization exemplars. An example of a vPFC neuron's response to these exemplars is shown in Fig. 8. As seen, this neuron responded to each of the 10 exemplars. For comparison, the maximum firing rate, independent of vocalization exemplar, in our population varied between 0.89 and 40.6 spikes/s. Based on its stimulus-period firing rate, the selectivity of this neuron to the different vocalizations is comparable to that seen in our previous studies (Cohen et al. 2004b
|
An example of a fixed-mean rate STRF is shown in Fig. 9. For this neuron, as well as for all of the tested neurons, there was very little identifiable structure (i.e., clearly bounded regions of excitation or inhibition) in its STRF. The STRF shown in Fig. 9A appears to have some small regions of excitation and inhibition. However, as shown in the bottom and right panels, the gain factors at these time delays and frequency bands are not significantly different from zero, indicating that structure in the STRF is not statistically reliable. Consistent with this observation, when we used this fixed-mean rate STRF to predict the neuron's peri-stimulus time histogram, we found that it was not a good predictor of this neuron's firing pattern (Fig. 9B). That is, a linear function that relates the spectrographic representation of a sound to firing rate is not a good model of vPFC activity. Another fixed-mean rate STRF is shown in Fig. 10. The pattern seen here is similar to that seen in Fig. 9. The STRF does not have any discernable structure and does not have reliable frequency or temporal tuning. None of our tested neurons had statistically reliable spectral or temporal tuning.
|
|
Another possibility that we considered was that the neural responses to sound were dominated by a characteristic invariant phasic response to all sounds (e.g., a strong onset response followed by a slower decay). In this scenario, the neurons could still respond to specific spectrotemporal features in individual sounds, although this selective response would ride above the stronger invariant response and could disappear in the classical STRF analysis. For this reason we also estimated the time-varying mean rate STRF as explained above and in METHODS.
The time-varying mean STRF for the neuron in Fig. 8 is shown in Fig. 11; the fixed-mean rate STRF is shown in Fig. 9. The time-varying mean STRF still did not have any reliable structure as seen by its frequency and temporal tuning. Similarly, we did not find any reliable structure in any of the time-varying mean STRFs for the other neurons in our data set. However, in all cases, the time-varying mean STRF was a somewhat better predictor of the neuron's response than the fixed-mean STRF model (compare Fig. 11B with Fig. 9B).
|
For the fixed-mean STRF model, the mean correlation-coefficient ratio was 0.13 (SD = 0.13). The mean information value was 0.34 (SD = 0.21). As one would predict from the data shown in Fig. 11, the cross-correlation and the information values were higher when the time-varying mean STRF model was used. In this instantiation, the mean correlation-coefficient ratio was 0.64 (SD = 0.32), which was reliably higher (t-test, P < 0.05) than the mean value generated using the fixed-mean STRF model. Similarly, the time-varying mean information value (0.56, SD = 0.2) was reliably larger (t-test, P < 0.05) than the mean value generated using the fixed-rate technique. A neuron-by-neuron analysis was also consistent with these analyses. Significantly more neurons had higher correlation-coefficient (information) values when calculated using the time-varying mean STRF model than when calculated using the fixed-mean STRF model (Wilcoxon, P < 0.05).
Because the STRFs were qualitatively similar for both the fixed-mean and time-varying mean firing rate models and because the structure of the STRFs was not reliable in both models, the difference in prediction cannot be attributed to differences in the predictive nature of the STRF. Instead, it suggests that increased predictive nature is based on the fact that our time-varying model was able to capture the transient response properties of vPFC to the presence of a sound irrespective of its nature. Nevertheless, the fact that the mean correlation-coefficient ratio was 0.64 shows that there remains a significant fraction of the neural response that is indeed sensitive to the stimulus properties but cannot be captured by the linear STRF. This is clearly seen, for example, in the response to the scream exemplar shown in Figs. 9 and 11 as well as the response seen in Fig. 8.
| DISCUSSION |
|---|
|
|
|---|
Acoustic features of rhesus vocalizations
The modulation spectra for the 10 acoustic classes of rhesus vocalizations had relatively similar profiles. On average, most of the energy of these spectra was found at low spectral and temporal frequencies with higher spectral modulations concentrated at the lowest temporal modulation frequencies. This pattern is similar to that found in other types of vocalizations such as those produced by zebra finch (Singh and Theunissen 2003
) and in human speech as shown here (Fig. 4). However, whereas the general patterns are similar and are a signature of animal vocalizations, the specific distribution of spectral modulations and temporal modulations can vary appreciably for different type of vocalizations. Therefore the modulation spectrum could be used to characterize, cluster, and classify different type of vocalizations.
Relationship between vPFC activity and the acoustic features of vocalizations
This between-class variability, and other types of variability, may be an important feature underlying an animal's capacity to discriminate between different classes or exemplars of sounds (Singh and Theunissen 2003
; Woolley et al. 2005
). Specifically, animals (or neurons) could learn to ignore regions of low variance because these areas do not convey any information about differences between different types of sounds. In contrast, animals (or neurons) could attend to regions of high variance because these regions do convey information about differences between sound types.
This hypothesis predicts that the response profiles of auditory neurons may be matched to low- and high-variance regions of acoustic space. One form of this matching may be that the variance in neural activity correlates with the variance in acoustic space. Our examination of vPFC sensitivity to band-limited noise was a direct test of this matching hypothesis (see RESPONSES OF VPFC NEURONS TO BAND-LIMITED NOISE). In this study, we created different exemplars of band-limited noise that tiled the regions of acoustic space that are encompassed by rhesus vocalizations and measured the response of vPFC neurons to these stimuli. As shown in Figs. 6 and 7, the match between the tuning in vPFC activity and the variance in acoustic space appears to be quite low; this issue is discussed more below.
A second prediction of this matching hypothesis is that the composite modulation-transfer function (i.e., the two-dimensional transformation of a STRF) would match the modulation spectra of a stimulus class; the modulation-transfer function shows the spectrotemporal modulation frequencies that activate a neuron (Chi et al. 1999
; Miller et al. 2002
; Singh and Theunissen 2003
; Woolley et al. 2005
). Because our STRFs did not show any significant structure (see Figs. 911) we did not test this form of matching.
However, a recent study in zebra finches provided evidence for this type of matching (Woolley et al. 2005
). In that study, the investigators found that modulation-transfer functions of midbrain and forebrain auditory neurons in the zebra finch match the modulation spectra of finch songs and other natural classes of sound but are not matched to the modulation spectra of artificial sounds. This neural-stimulus matching further supports an evolutionary-based hypothesis for brain function: neural circuitry is not "all purpose" but is designed to detect and to discriminate those features that exist in the real world (Felsen and Dan 2005
; Rieke et al. 1995
; Woolley et al. 2005
).
It is important to comment further on the relationship between the band-limited noise classes and the species-specific vocalizations. The band-limited noise was based on a statistical analysis of the spectrotemporal properties found in the species-specific vocalizations. They were not designed to mimic vocalizationsand indeed, they do not sound anything like a vocalization. The purpose of these noise classes was to create a set of stimuli that tiled the informative (i.e., high-variance) and noninformative (i.e., low-variance) regions of the composite modulation spectrum and to determine whether vPFC neurons are differentially sensitive to stimuli with these spectrotemporal properties. As discussed above, we did not find evidence for this sensitivity. Future studies should examine the capacity of monkeys to discriminate between vocalizations when informative or noninformative regions of their modulation spectra have been filtered.
Potential confounds
There are many different possible alternatives that might explain the results of our two neurophysiological studies (see STRFS OF VPFC NEURONS and RESPONSES OF VPFC NEURONS TO BAND-LIMITED NOISE). Below, we highlight some of these possible alternatives.
One possibility may be related to the brain area itself. For instance, Woolley et al. (2005)
found that neurons in the finch midbrain and in early auditory forebrain areas were tuned to the modulation spectrum of finch vocalizations. Similarly, STRFs with significant structure were reported in the auditory cortex of nonhuman primates (deCharms et al. 1998
) and other mammals (Kowalski et al. 1996
; Linden et al. 2003
; Shamma et al. 1993
). That is, these auditory areas code acoustic features. In contrast, we were testing activity in the prefrontal cortex. Consequently, independent of the important differences between mammalian and avian neural organization (Jarvis et al. 2005
), the differences between the current study and previous studies may simply reflect the fact that midbrain and early forebrain auditory areas are involved in acoustic-feature extraction (Mendelson and Grasse 1992
; Middlebrooks et al. 1980
; Shamma et al. 1993
), whereas more central areas are not involved in such feature extraction. However, this hypothesis may not be fully explanative: other studies suggest that the auditory cortex is not involved in feature extraction but may be involved in more complex types of auditory-object processing (Barbour and Wang 2003
; Machens et al. 2004
; Nelken et al. 2003
).
A second possibility might relate to the nature of the behavioral task. In our task, the monkeys' only behavioral requirement was to fixate the central LED. We did not require them to overtly or even covertly process the auditory stimuli. Is it possible that these behavioral requirements "interfered" with vPFC processing? For example, it may be possible that if the monkeys were attending to some aspect of the sound, we might have found that the STRFS of vPFC neurons were predictive. Similarly, vPFC neurons might show a different pattern of selectivity to the band-limited noise if the monkeys were engaged in a task that required them to discriminate between the different classes of noise.
Several lines of evidence suggest that the vPFC might have been affected by the demands of our task. For instance, because attending a fixation light is known to suppress (or even eliminate) auditory responses in the parietal cortex of rhesus monkeys (Gifford 3rd and Cohen 2004
), it is possible that vPFC responses may have been suppressed, relative to some other paradigm. On the other hand, because species-specific vocalizations are behaviorally relevant stimuli, the rhesus might have been (covertly) attending (Snyder et al. 2000
) to these interesting stimuli, causing an increase in responsivity (Maunsell and Treue 2006
). Similarly, when human subjects are engaged in tasks that require them to attend to the spatial or nonspatial attributes of an auditory stimulus, the frontal and parietal areas that are part of the anterior (nonspatial) and posterior (spatial) pathways are differentially activated (Ahveninen et al. 2006
; Alain et al. 2001
; Hart et al. 2004
; Maeder et al. 2001
; Rämä et al. 2004
), but when subjects listen passively to a stimulus, these frontal and parietal areas are not engaged (Hart et al. 2004