The squirrel monkey twitter call is an exemplar of a broad class of species-specific vocalizations that contain naturally voiced frequency-modulated (FM) sweeps. To investigate how this prominent communication call element is represented in primary auditory cortex (AI), neuronal receptive field properties to pure-tone and synthetic, logarithmically spaced FM-sweep stimuli in 3 barbiturate-anesthetized squirrel monkeys are studied. Responses to pure tones are assessed by using standard measures of frequency response areas, whereas responses to FM sweeps are classified according to direction selectivity, best speed, and speed tuning preferences. Most neuronal clusters respond to FM sweeps in both directions and over a range of FM speeds. Center frequencies calculated from the average of high and low trigger frequency edges of FM response profiles are highly correlated with pure-tone characteristic frequencies (CFs). However, bandwidth estimates are only weakly correlated with their pure-tone counterparts. CF and direction selectivity are negatively correlated. Best speed maps reveal idiosyncratically positioned spatial aggregation of similar values. In contrast, direction selectivity maps show unambiguous spatial organization. Neuronal clusters selective for upward-directed FM sweeps are located in ventral–caudal AI, where CFs range from 0.5 to 1 kHz. Combinations of pure-tone and FM response parameters form 2 significant factors to account for response variations. These results are interpreted in the context of earlier FM investigations and neuronal encoding of dynamic sounds.
Vocalizations among nonhuman primates may take various forms and, in many cases, the characteristics of these communication calls are quite stereotyped (Newman 1978; Schreiner et al. 1997; Wang et al. 1995; Winter et al. 1966; Wollberg and Newman 1972). Among squirrel and marmoset New World monkeys, distinct calls, such as cackles, growls, trills, twitters (Fig. 1B), chirps, shrieks, caws, and groans, have been discerned, as well as dialects that mark different tribes or bands of monkeys (Juergens 1986, 1998; Wang et al. 1995; Winter et al. 1966). Thus the squirrel monkey has many stereotyped communication calls in its vocal repertoire, each communicating some unique information to conspecific members of its social environment.
Integrative auditory neuroscience has sought to determine how communication calls and their major components are represented in auditory cortex. This task is challenging because species-specific vocalizations are complex sounds that harbor many spectral and temporal variations. For example, calls may have noiselike patterns that contain energy in distinct frequency regions, at precise times and with variable amplitudes, thereby rendering cues about the configurations of the excitation source and vocal tract (Ohde and Stevens 1983). Likewise, periodic excitation sources produce spectral energy at a fundamental frequency and at harmonics of this fundamental. The calls may also contain distinct, nonharmonically related bands of spectral energy at particular frequencies, representing resonant properties of the vocal tract. Furthermore, energy at different frequencies may be modulated. Finally, energy at particular frequencies may shift smoothly to other frequencies over time, where this smooth transition is termed frequency-modulated (FM) sinusoid or sweep (for discussion, see Gold and Morgan 2000; Stevens 1998). FM sweeps are used by monkeys for vocalization perception (May et al. 1988) and they are used by humans to identify phonemes and demarcate phoneme boundaries (Lindblom and Studdert-Kennedy 1967; Stevens and Klatt 1974).
Faced with this acoustic complexity, 2 approaches have been taken to study auditory cortical neuron receptive field properties in New World monkeys. One approach starts from ‘top down’ and attempts to identify neurons in auditory cortex that are selective for specific calls. Conceptually, the number of stereotyped calls is limited, so auditory cortex has the capacity to map specific neurons to specific calls. This approach has some merit because call selective neurons have been identified in higher auditory areas, such as prefrontal cortex. Nevertheless, findings in lower cortical areas, such as primary auditory cortex (AI), remain inconclusive (Newman and Wollberg 1973a,b; Romanski and Goldman-Rakic 2002; Winter and Funkenstein 1973; Wollberg and Newman 1972; Yeshurun et al. 1985). Although this approach addresses selectivity of cortical neurons for particular calls, it does not identify which aspects of a particular call are especially important in exciting a certain neuron. AI neurons in monkeys respond to specific calls as well as to significant spectral and temporal degradations of the fine structure and envelopes of these calls (Nagarajan et al. 2002; Wang et al. 1995; Wollberg and Newman 1972).
Another approach is to study the responses of neurons to stimuli that vary in only a few parameters, such as amplitude-modulated sinusoids and FM sweeps. The use of these simpler, well-characterized stimuli has been the predominant method used by investigators studying AI neuronal receptive field properties. The latter approach is adopted in this study to characterize quantitatively the responses and functional organization of squirrel monkey AI to synthetic FM sweeps. As in a previous pure-tone stimulus study, the experimental goal is to map densely the responses of neurons across all accessible AI (Cheung et al. 2001). FM sweeps serve as simplifications of frequency transitions that are present in many natural monkey calls (Bieser 1998). The goal of this study is to discern how these stimuli are represented across AI by addressing the following questions. 1) Is there a systematic and orderly representation of FM-sweep response profiles? 2) How do FM response profiles relate to basic properties of tonal receptive fields?
Although there is no a priori rationale to believe that squirrel monkey AI neurons are particularly selective for FM stimuli, they are nonetheless the focus of this study for the following reasons. First, AI receives direct input from the thalamus and is a central structure for the parallel and hierarchical processing of acoustic stimuli (Hackett et al. 1998; Kaas and Hackett 2000; Rauschecker 1998). Second, studies of FM responses in AI have been performed in the cat, ferret, and other species (Heil and Scheich 1992; Heil et al. 1992a,b,c; Mendelson and Ricketts 2001; Mendelson et al. 1993; Nelken and Versnel 2000; Shamma et al. 1993; Zhang et al. 2003), so squirrel monkey results may be interpreted in the context of findings from other species. Third, AI integrity is essential for monkeys to discriminate and process natural vocalizations (Harrington et al. 2001; Heffner and Heffner 1986 1984). Fourth, squirrel monkey AI may be developed as a model for early human cortical signal processing (Snowdon 1979; Wang 2000; Zoloth and Green 1979). Understanding how squirrel monkey AI processes relatively simple time-varying sounds will improve our understanding of acoustic signal processing in the human auditory system. Indeed, FM sweeps are prevalent in human speech, where they often take the form of formant transitions and specify the transition from one vowel to another. For these reasons and others, an incrementally more complete description of squirrel monkey AI signal processing function will advance knowledge of the general schemes of signal encoding in the central auditory system.
This study uses high-density mapping of neurons throughout AI to sample fully multiple isofrequency bands within single animals. This technique is powerful because it enables the evaluation for spatial patterns of response property variations within isofrequency bands. Pooled data derived from incomplete sampling across animals are less sensitive to detect patterns because the additional interindividual variations tend to obscure findings. Several independent parametric maps, including characteristic frequency (CF), latency, CF-corrected threshold, and CF-corrected frequency range selectivity (Q10), are represented in squirrel monkey AI (Cheung et al. 2001). Exploration of FM-sweep processing with respect to these other functional maps extends our understanding of squirrel monkey AI functional organization.
Experiments were conducted on 3 young adult male squirrel monkeys. All procedures were approved by the institutional animal welfare committee at the University of California, San Francisco and were consistent with national and state animal welfare guidelines. Surgical procedures and anesthetic protocol were described previously (Cheung et al. 2001). In brief, a surgical level of anesthesia was reached by using a mixture of isoflurane:nitrous oxide:oxygen (2:48:50%). Lidocaine was applied to incision areas. A tracheotomy was performed to optimize airway patency and pulmonary mechanics. Intravenous (iv) access was established to maintain anesthesia with sodium pentobarbital. Normal saline was delivered with 1.5% dextrose and 20 mEq KCl at 6–8 ml · kg−1 · h−1 to support cardiovascular function. Ceftizoxime (10–20 mg/kg iv every 12 h) was administered to minimize infection risk. Core body temperature was monitored using a thermistor probe and maintained at about 38°C using a feedback-controlled heated water blanket. The head was stabilized with a custom fixation device that provided access to both external auditory meati. Burr holes over the auditory forebrain were positioned extradurally, and a bone plate was removed. Dura was reflected to expose the parietal and temporal lobes, which were kept moist under silicone oil. Before microelectrode penetrations, a magnified video image of the recording zone was captured with a camera mounted on an operating microscope and stored in a computer. For every penetration site in AI, the corresponding point on the video image was marked using Canvas software (ACD, British Columbia, Canada). All sites were localized relative to the surface vasculature. At the conclusion of each mapping experiment, the animal was euthanized with an overdose of sodium pentobarbital, followed by bilateral thoracotomies and pneumothoraces.
All experiments were carried out in a double-walled sound-attenuating chamber (IAC, Bronx, NY). Auditory stimuli were delivered through a STAX-54 headphone enclosed in a small chamber that was connected by a sealed tube into the external acoustic meatus of the contralateral ear (G. Sokolich, U.S. Patent 4,251,686, 1981). The sound-delivery system was calibrated with a sound meter (Model 2209, Brüel & Kjær, Norcross, GA) and waveform analyzer (Model 1521-B, General Radio Company, West Concord, MA). The frequency response of the system was flat within ±6 dB ≤14 kHz, which extended past the bandwidth of the majority of neurons studied. After 14 kHz the output rolloff was 10 dB/octave. Tone bursts (3 ms linear rise/fall; total duration, 50 ms; and interstimulus interval, 400–1,000 ms) were generated by a microprocessor (TMS32010, 16-bit D-A converter at 120 kHz; Texas Instruments, Dallas, TX). At each penetration site a preliminary CF estimate was made by varying the intensity level and frequency of gated tone bursts. Frequency response areas (FRAs) were derived by presenting 675 pseudo-randomized tone bursts at different frequency and sound pressure level (SPL) combinations, where the preceding/succeeding frequency, level, and interstimulus interval between tone bursts were designed to minimize adaptation (Mendelson et al. 1997; Schreiner and Sutter 1992). The frequency-level pairs spanned 2.5–77.5 dB SPL in 5 dB steps, and 45 frequencies in logarithmic steps across a 2 to 4 octave range, centered on the estimated CF. One tone burst was presented at each frequency–level combination. FM sweeps were created using Matlab (The MathWorks, Natick, MA) and written to a compact disk for delivery. The sweeps traversed a 50 to 21,000 Hz frequency range in upward (from 50 to 21,000 Hz) and downward (from 21,000 to 50 Hz) directions. The upward and downward sweep speeds were 10, 17, 30, 52, and 90 octaves/s for a total of 10 stimulus conditions. The sweep speeds were chosen so that they covered the normal range of frequency modulations in squirrel monkey vocalizations. All sweeps had 100 ms rise/fall cosine squared ramps, during which time the frequency in the sweep was held constant at either 50 or 21,000 Hz. The onset interval between sweeps was 2 s and the offset interval between sweeps was >800 ms. Each sweep was presented 16 times for a total of 160 presentations, and the order of sweep presentations was randomized. The FM sweeps were presented 20–40 dB above minimum thresholds measured from FRAs.
Parylene-coated tungsten microelectrodes (FHC, Bowdoinham, ME) with 1 to 2 MΩ impedance at 1 kHz were used for multiunit recordings (small neuronal clusters) at depths of 650–950 microns, corresponding to cortical layers IIIb and IV. Microelectrodes were aligned perpendicular to the cortical surface and lowered using a hydraulic microdrive (David Kopf Instruments, Tujunga, CA) guided by a depth counter. Action potentials were isolated from background noise using an online window discriminator (DIS-1, BAK, Mount Airy, MD). The hardware recorded the number of discriminated spikes and times of arrival that occurred within a 200 ms epoch after tone burst onsets. Responses to FM sweeps, which were delivered from a compact disk, were recorded with spike times throughout the duration of the stimulus.
Data analysis: responses to pure tones
At each site, responses from 675 frequency–level stimulus conditions determine the FRA (Sutter and Schreiner 1991 1995). Only responses in the 8 to 40 ms window poststimulus onset are included in the analysis. Five measures are extracted from each excitatory tuning curve: CF, latency, minimum threshold, bandwidth (BW) 20 dB above threshold, and spectral asymmetry. CF is the frequency of the tone that evokes a response at the lowest SPL. Latency is the asymptotic minimum of first spike time arrivals across the full range of intensity levels at CF. Minimum threshold (hereafter, “threshold”) is the SPL of the quietest tone burst that evokes a response. The BW of the excitatory receptive field is calculated from measurements of the upper and lower frequencies bounded by the tuning curve at 20 dB above threshold. Q20 is calculated by dividing CF by the linear bandwidth at 20 dB above threshold. Tonotopic organization in the lemniscal auditory system is strongly expressed in AI. To address this confound, pure-tone and FM-sweep response (see following text) parameter covariation with CF is accounted for and corrected by removing CF dependency using nonparametric, locally linear least-squares regression models (1st-order polynomial; vicinity span = 0.05), termed LOESS (Cleveland 1993). CF-corrected response parameters are evaluated for pairwise correlations in Table 1. The asymmetry index (ASI) of response areas is calculated by where FHI is the high-frequency border of the FRA at 20 dB above threshold and FLOW is the low-frequency border. A value of 0 reflects a FRA that is symmetric about the CF. Values near 1 indicate a FRA that is skewed toward frequencies higher than CF and values near −1 indicate a FRA that is skewed toward frequencies lower than CF.
Data analysis: responses to FM sweeps
Responses to FM sweeps are quantified by first computing the peristimulus time histogram (PSTH) for all stimulus conditions using 10 ms bins. Each phasic FM response in the PSTHs is fitted with a Gaussian in the form f(t) = A × exp[−(t − B)2/2C2] that is centered about the peak response time. Three parameters define the Gaussian fits: A, amplitude of the Gaussian; B, position along the time axis at which the peak occurs; and C, width of the Gaussian. The Gaussian fit is constructed by minimizing the sum of the squared error between the PSTH and model using a Nelder–Mead simplex nonlinear minimization routine, which is available in Matlab (Press et al. 1992). Only PSTHs with ≥6 discernible peaks (≥3 in each direction) among the 10 upward and downward FM-sweep conditions are analyzed. Gaussian fits to PSTHs are significant for P < 0.05 in 486 of 515 (94.4%) of recording sites using the chi-square goodness-of-fit test (Bain and Engelhardt 1992).
Next, the parameters from a Gaussian fit are related to the corresponding FRA. The upward FM sweeps encounter the low frequency boundary first, whereas the downward FM sweeps encounter the high frequency boundary first, but both upward and downward sweeps (commonly) encounter both the upper and lower boundary. The frequencies at which FM sweeps encounter the FRA are termed trigger frequencies, which are related to the PSTH peaks. From these trigger frequencies, BW and CF of the FRA may be estimated. Trigger frequencies are computed by plotting the mean response time (latency in milliseconds) from the Gaussian fit B versus the inverse of the sweep speed (speed−1 in s/octave) for that particular FM sweep (Heil and Irvine 1998; Heil et al. 1992b). As shown in Figs. 2C and 3C, regression lines (linear least squares) are then computed for these data points. Only regression lines with significant nonzero slope (P < 0.05, 2-sided t-test) are considered. The slope of the line S is in units of octaves. This quantifies how many octaves the trigger frequency is above the starting frequency when the maximum response occurs. The upward trigger frequency is found through the relationship FUP = 50 × 2S Hz. In a similar manner, the downward trigger frequency is found through the relationship FDN = 21,000 × 2−S Hz. Because FM-sweep responses are recorded at stimulus intensity levels 20–40 dB above pure-tone thresholds, the lower boundary of BW derived from the FRA is estimated as the difference between the 2 trigger frequencies. CF derived from the FRA is estimated by taking the average of the 2 trigger frequencies or center frequency.
Response strength to the direction and speed of a particular FM sweep is quantified by calculating the area under the Gaussian fit to the PSTH by applying the Newton–Cotes procedure. From area values, a direction selectivity index (DSI) is calculated for each recording site. The DSI is the sum of all areas for upward FM-sweep responses minus the sum of all areas for downward FM-sweep responses, divided by the sum of all areas in both directions: DSI = (∑ areaUP − ∑ areaDN)/(∑ areaUP + ∑ areaDN). DSIs may take values between 1 and −1, where 1 corresponds to a recording site that responds exclusively to upward sweeps and −1 corresponds to a site that responds solely to downward sweeps. Best speed is computed using a centroid measure (Nelken and Versnel 2000; Shamma et al. 1993) and has the form BS = (∑ speedi × areai)/∑ areai, where each sweep speed is multiplied by the corresponding response area, divided by the sum of all response areas. Best speed is in units of octaves/s. Speed tuning is defined as: ST = 5/4 × [1 − [mean(response)]/max(response)], where response is the area–speed value in the FM-sweep direction that gives the largest normalized peak response. The factor 5/4 scales ST to take values between 0 and 1, where 0 implies no tuning for speed (equal response at every speed) and 1 implies selectivity for only one speed.
Data analysis: response maps
Spatial maps for the FM response profile parameters of direction selectivity, best speed, and speed tuning (Fig. 6) are reconstructed by applying Voronoi–Dirichlet tessellation (Cheung et al. 2001). In tessellation map reconstruction, the cortical surface is divided into a number of polygons, one for each recording site. The shape and area of each polygon are determined by minimizing the cumulative perimeter of all polygons using an optimization algorithm. Each polygon reflects, with its size, the cortical area represented by the recording site and, with its color, the value of the response parameter. Small polygons indicate areas of dense sampling; large polygons reflect areas with sparse sampling. The significance of aggregated spatial distributions is evaluated by statistical techniques. Randomization tests (data not shown) are performed to test the results against the null hypothesis that response values are randomly distributed. For example, consider DSI spatial maps in Fig. 6. For each polygon in the raw tessellation map, the absolute value of the difference between the DSI value for the polygon and the DSI values for neighboring polygons is tabulated and the average difference is computed. For DSI values, the average difference should be near zero. For the null hypothesis, DSI values are randomized with respect to the polygons, and the average difference procedure described above is performed. Histograms for statistical testing are constructed by running randomized simulation maps 10,000 times. Comparisons are made with the actual average difference values.
To reduce response map noise, new models of FM response maps are computed by applying nonparametric 2-dimensional locally linear least squares regression models (1st-order polynomial; vicinity span = 0.05). This method uses data from a small neighborhood of points in a 2-dimensional spatial map and fits 2 linear functions to these data in orthogonal dimensions, thereby reducing data variance. This procedure is repeated for every site of the spatial map. The procedure is analogous to computing a running mean over a time series, and makes no assumption about the form of the data to be fitted. The results of this variance reducing procedure are shown in Fig. 7.
Description of twitter vocalization and FM stimuli
The waveform of a typical squirrel monkey twitter call is shown in Fig. 1A. The twitter call is composed of several discrete segments produced from 6 to 10 Hz (for comparison with marmoset twitter, see Nagarajan et al. 2002). The main point of interest in the waveform is the rate at which discrete segments of the call are produced. This rate is important because time compression of twitter calls in marmoset monkeys decreases neural responsiveness (Wang et al. 1995).
Although the twitter call time waveform (Fig. 1A) shows interesting features, it is perhaps more conveniently viewed as a spectrogram. The twitter time–frequency plot (Fig. 1B) reveals complex aspects that are not easily discernible in the waveform plot. For example, each discrete segment shown in Fig. 1A is composed of ≥3 trajectories. Of particular note is that each trajectory is composed of frequency varying or FM sweeps (red) that traverse time–frequency space at speeds between 20 and 50 octaves/s. Parametric FM sweeps (white) are overlaid onto Fig. 1B to show how synthetic FM stimuli used in this study relate to the natural vocalization.
Description of data
Neural responses at 486 penetration sites in AI of 3 squirrel monkeys (SM01: 137 sites, left hemisphere, 0.55 < CF < 7.3 kHz; SM64: 182 sites, right hemisphere, 0.29 < CF < 22.8 kHz; SM82: 167 sites, left hemisphere, 0.43 < CF < 7.5 kHz) were studied in detail. The recordings were obtained from the temporal gyrus (low- to midfrequency CFs) and supratemporal plane within the lateral sulcus (high CFs). Neurons in the data set had short latency responses to pure tones and orderly progression of CF across the cortical surface, allowing for consistent and certain identification of AI (Cheung et al. 2001). Because of corrupt and lost data files, the final FRA data set suitable for center frequency and BW correlation analyses was reduced for SM01 to 134 sites and SM64 to 176 sites.
FM-sweep peristimulus time histograms
The responses of a typical squirrel monkey AI neuronal cluster to FM-sweep stimuli (16 repetitions) are shown in Fig. 2, where column A shows responses to upward and column B responses to downward sweeps. The bar under each histogram marks sweep duration. Fitted Gaussians are superimposed over gray-colored PSTHs. This cortical site responds most vigorously to upward and downward sweeps for speeds <30 octaves/s (Fig. 2, A and B). Also, note decreasing response onset times with increasing sweep speeds.
Two main parameters are extracted from the Gaussian fits for all PSTHs (see methods). The first is latency of the Gaussian fit, which estimates the time from sweep onset to peak response. The second is area of each Gaussian fit, which estimates response strength. The results of these computations for site sm64-117-1 are illustrated in Fig. 2, C and D. In Fig. 2C, plots of latency versus inverse of FM-sweep speed are shown for upward and downward directions. The linear regression lines have high correlation coefficients (Upward: r = 0.999, P < 0.0001; Downward: r = 0.994, P = 0.0002, t-test). The slopes of the regression lines fitted to upward and downward direction data are used to calculate trigger frequencies, which estimate frequencies at which the neuronal cluster first responds to FM stimuli (Bain and Engelhardt 1992; Draper and Smith 1981). For site sm64-117-1 in Fig. 2, the frequency calculated from upward direction sweeps is 8327 Hz and from downward direction sweeps is 9,238 Hz. Because these 2 frequencies represent the trigger frequencies of the cluster to FM stimuli, each may be taken as an estimate of the effective low and high side boundary of the cluster's FRA. From this, the average of the low and high frequencies (8,783 Hz) is the FM-sweep center frequency (Fig. 5). This value is in moderate agreement with the pure-tone CF of 9,780 Hz (10% difference). The difference between the 2 trigger frequencies (911 Hz) is the FM-sweep BW. The BW calculated from pure tones is 1,689 Hz. The 2 BW estimates are quite different (46% difference).
Plots of normalized area or response strength versus sweep speed are shown in Fig. 2D. Normalized areas are calculated by dividing individual response areas by the single largest response area, regardless of sweep direction. Thus the curves for upward and downward sweep stimuli are not normalized independently. These curves show a neuronal cluster's selectivity for a particular sweep speed in either the upward or downward direction. For a neuronal cluster with high selectivity for upward FM sweep at 17 octaves/s, the expected response profile would have a strong peak at 17 octaves/s and have values near zero at other upward sweep speeds, as well as a flat curve near zero for all downward sweep speeds. In Fig. 2D, the responses show that this neuronal cluster prefers low FM-sweep speeds in both directions. From these curves, it is clear that FM-sweep speeds slower than 30 octaves/s elicit the strongest responses and a low pass area–speed characteristic is present. Note that when comparing the normalized area–speed curves for both directions, responses in the downward direction are stronger, but the shapes of the 2 curves are similar. The observation that the shape of upward and downward area–speed curves appears correlated is borne out by population data (last row, Fig. 4). The direction selectivity index (DSI) for this neuronal cluster is −0.094, which is in accordance with stronger responses for downward sweeps. The measure of best speed, a centroid measure, is 16.7 octaves/s. Speed tuning is 0.79, which reflects higher selectivity for relatively slow downward sweeps.
Another example of typical responses to FM-sweep stimuli is shown in Fig. 3. The strongest responses are for upward sweeps at 17 octaves/s (Fig. 3A). Responses for downward sweeps, although weaker, also peak at 17 octaves/s (Fig. 3B). This example shows that a neuronal cluster may respond most vigorously to a particular sweep speed and have direction selectivity for upward sweeps, but nevertheless respond, albeit more weakly, in a nonselective fashion to downward sweeps.
Figure 3, C and D is constructed by applying the same analysis procedure as outlined in Fig. 2, C and D. In Fig. 3C linear regression fits are computed for latency versus inverse speed data (Upward: r = 0.999, P < 0.0001; Downward: r = 0.995, P < 0.001; t-test). The estimated trigger frequency in the upward direction is 7,942 Hz and in the downward direction is 8,867 Hz. The FM sweep center frequency is 8,405 Hz, which was in good agreement with the pure-tone CF of 9,080 Hz (7.5% difference). The FM-sweep BW is 925 Hz, which is considerably different from the pure-tone BW of 1,423 Hz (35% difference). In Fig. 3D, area–speed curves show maximal responsiveness to slow sweep speeds in the upward direction. The neuronal cluster is generally unselective for any particular downward-sweep speed, as evidenced by the relatively flat curve. DSI is 0.137, reflecting greater responsiveness to upward sweeps. Best speed is 25.9 octaves/s. Speed tuning is 0.34, indicating moderate to low selectivity for specific sweep speeds.
Population distributions of FM-sweep preferences
Response distributions for the 3 FM-sweep response parameters and correlation coefficient values for upward and downward area–speed curves for all cases (SM01, SM64, and SM82) are shown in Fig. 4. Each column represents results for a particular monkey. The 1st row shows DSI distributions. The median of distributions for all 3 monkeys is centered about zero, so in general AI neurons, sampled over 4–6 octaves, are equally responsive to upward and downward sweeps. The 2nd row shows best speed distributions. Although most sites prefer sweeps that are between 30 and 50 octaves/s, they are responsive to a broad range of sweep speeds. The median of distributions is centered at 40 octaves/s, which is in the range of vocalization FM-sweep speeds in Fig. 1B. The 3rd row shows speed tuning. The 3 monkeys show similar distributions, with the median at 0.34, a value that reflects moderate selectivity for specific sweep speeds. The 4th row shows correlation coefficient distributions for upward and downward area–speed curves. The overall median correlation value is 0.5, with a skewed distribution, which implies that many sites in AI respond to upward and downward sweep directions of specific FM speeds in a similar fashion.
Center frequency and bandwidth estimates derived from FM sweeps
Figure 5 shows the relationships of response parameter estimates calculated from FM-sweep data versus those from pure-tone FRA tuning curves for the 3 cases. The 1st row is a comparison of FM-sweep center frequency versus pure-tone CF. The 2nd row is a similar comparison for BW estimates. The columns represent individual monkeys. The 45° line indicates perfect correlation. Center frequency derived from FM sweeps is highly correlated with CF derived from pure tones in all cases (SM01: r = 0.82, P < 0.0001, n = 134; SM64: r = 0.92, P < 0.0001, n = 176; SM82: r = 0.70, P < 0.0001, n = 167; t-test). Thus a center frequency based on the trigger frequencies is sufficient to predict CF from pure tones (see also Heil and Irvine 1998; Heil et al. 1992b).
For BW, the correlation between the 2 estimates is weak (SM01: r = 0.16, P = 0.03, n = 134; SM64: r = 0.21, P = 0.003, n = 176; SM82: r = 0.37, P < 0.0001, n = 167; t-test). In these calculations there is an implicit assumption that when the FM-sweep stimulus first excites a neuronal cluster the energy in the sweep is located at either the low (upward sweep) or high (downward sweep) frequency edge of the cluster's FRA. For a simple response pattern, the expectation is that BW estimates obtained from these low and high trigger frequencies will yield reasonable estimates of the excitatory BW obtained from pure-tone stimuli. This is not the case. FM-sweep BWs are generally narrower than pure-tone BWs (Fig. 5, 2nd row). Here, the derived BW response parameter to dynamic complex sounds (FM sweeps) does not give a reliable estimate of BW response parameters to simple static sounds (pure tones). This finding has been attributed to nonlinearities in responses and asymmetries in strengths of inhibitory influences on the low and high frequency sides of the receptive field (Heil et al. 1992a; Kowalski et al. 1995; Shamma et al. 1993; Zhang et al. 2003).
Topographic distributions of FM-sweep response profiles
Topographic maps for the FM-sweep response parameters are shown in Fig. 6. Each column represents the topography in one animal and each row represents a specific response parameter. The 1st row in Fig. 6 displays a cartoon of the squirrel monkey brain and the corresponding position of AI for each monkey. For SM64, data from the right hemisphere have been reflected to facilitate comparisons with the left hemispheres of the other 2 monkeys. The portion of AI that is projected dorsal to the lateral sulcus is anatomically within the supratemporal plane and medial to the temporal gyrus. There are 2 gaps in the SM64 AI map. The first break is along the rostral–caudal axis and represents the lateral sulcus. In this unusual case of frequency map topography relative to the lateral sulcus, AI on the temporal gyrus is limited to CFs <2 kHz. By removing the frontoparietal operculum, additional data are obtained by mapping AI within the supratemporal plane. Although SM64 has a tonotopic map that is largely rolled into the supratemporal plane, this monkey's response profiles for FM sweeps (direction selectivity, best speed, and speed tuning; Fig. 4) are virtually identical to those of the other 2 monkeys. The second break is along the rostral–ventral axis and corresponds to a large cortical vein. Bars mark 1 mm for Figs. 6 and 7.
The 2nd row in Fig. 6 shows the spatial distribution of raw DSI values. Despite substantial local spatial variability across individual monkeys, a distinct ordering of DSI values is evident. Ventral–caudal AI shows spatial aggregation of higher DSI values for all 3 monkeys. Each site in this locale has DSI >0.2, which indicates upward FM sweep selectivity (P < 0.001). Also, there is a tendency for downward-directed FM sweeps preference to aggregate in dorsal–rostral AI, although the spatial clustering is not statistically significant.
The 3rd row in Fig. 6 shows the spatial distribution of raw best-speed values. In all 3 monkeys, the distribution of values is patchy. Certain best speed values show spatial aggregation, but there is no consistent distribution pattern across animals. For example, sites that respond preferentially to best sweep speeds at 52 octaves/s are located in specific patches in monkeys SM01 and SM64, but not in monkey SM82. Likewise, sites that respond preferentially to sweep speeds <17 octaves/s are located in distinct patches in SM64 and SM82, but not so clearly in monkey SM01.
The 4th row in Fig. 6 shows the spatial distribution of raw speed tuning or selectivity for FM-sweep speeds. Again, a patchy distribution is evident. Areas of broad tuning for sweep speeds are grouped together, as shown by the distribution of sites with tuning values <0.25. Similarly, there appears to be grouping of sites with high tuning (>0.75). These patches of grouped selectivities are distributed in various positions across monkeys.
The results of statistical testing to reject the null hypothesis that the spatial distribution of response values of a particular FM response parameter is randomly distributed confirm visual observations. Preference for upward sweeps located in ventral–caudal AI is significant (P < 0.001). The hypothesis that best speed and speed tuning spatial maps are randomly organized cannot be rejected.
Figure 7 shows smoothed maps. In the 1st row, tonotopic organization in all cases is well demonstrated. Globally, isofrequency bands with CF >4 kHz have curvilinear contours that enter and exit the supratemporal plane. Without full sampling, this intriguing topography may be interpreted as CF reversals, falsely marking an auditory field outside AI. The full extent of isofrequency contours is accessible for study. Lower CFs are located ventrally. Higher CFs are located rostrally. The CF gradient is along the ventral–rostral axis. The 2nd row in Fig. 7 more clearly shows aggregation of high DSI values or preference for upward sweeps about ventral–caudal AI. There is also a subtle gradient of positive to negative DSI values, most striking in SM01. The vector is along the ventral–rostral axis. The 3rd row in Fig. 7 shows smoothed spatial maps for best speed. The spatial organization is again patchy because values >45 octaves/s appear as aggregated clumps of varying sizes. The 4th row in Fig. 7 shows smoothed spatial maps for speed tuning. As in Fig. 6, the vast majority of neuronal clusters are not selectively tuned for a particular FM-sweep speed. There are coherent patches in the maps for SM64 (ventral–caudal AI) and SM82 (caudal AI) that have speed-tuning values >0.6, which signifies relatively narrow tuning for specific FM speeds.
Correlation between CF and direction selectivity
As noted earlier, there is a CF gradient along the ventral–rostral axis in AI. A comparison of CF and DSI distributions (Fig. 7, 1st and 2nd rows) suggests covariation of these 2 parameters: neuronal clusters with upward-direction selectivity are located predominantly in the low CF portion of AI. To quantify this relationship, CF versus DSI plots are presented in Fig. 8. In each case, an inverse linear relationship is identified. This relationship is strongest for SM64 (r = −0.44, P < 0.0001, n = 176; t-test), intermediate for SM01 (r = −0.30, P < 0.001, n = 134; t-test), and weakest for SM82 (r = −0.23, P = 0.002, n = 167; t-test). These negative correlations are in accordance with results from other species (Heil et al. 1992a; Mendelson and Ricketts 2001; Poon and Yu 2000; Zhang et al. 2003).
Excitatory pure-tone FRA asymmetries
Pure-tone FRAs are further analyzed by examining asymmetry about CF. An intracellular investigation shows that under voltage clamp conditions, FRA asymmetry is correlated with FM-sweep direction preference (Zhang et al. 2003). The ASI quantifies the relative high versus low frequency tilt about CF. An analysis between ASI and DSI values fails to reveal any significant correlations (SM01: r = −0.051, P = 0.282; SM64: r = −0.044, P = 0.286; SM82: r = −0.077, P = 0.166; t-test). Thus spectral asymmetry of the FRA arising from spiking response does not appear to reflect spectral asymmetries found in membrane potential magnitudes (Zhang et al. 2003).
Correlations between pure-tone and FM response parameters
Pure-tone and FM-response parameters are evaluated for covariation relationships. A correlation matrix for CF-corrected or residual threshold, Q20, latency, DSI, best speed, and speed tuning is shown in Table 1. The data set is taken from 448 sites across animals where all test parameters are available. Statistical evaluation with the Bonferroni correction for multiple comparisons is applied (Hochberg 1988; Perneger 1998). For alpha = 0.05, P = 0.05/15 = 0.003. The results indicate statistically significant (P < 0.001) but weak correlations for the following pairs: threshold–DSI (r = 0.287), threshold–best speed (r = 0.172), Q20–latency (r = 0.164), Q20–best speed (r = −0.263), and best speed–speed tuning (r = 0.233). The negative correlation between Q20 and best speed implies that neurons with broader FRA are better able to respond to faster FM-sweep speeds. Also note that best speed and speed tuning are positively correlated, so faster FM sweeps are more selectively processed than slower FM sweeps.
Another approach to evaluate relationships among raw threshold, Q20, latency, direction selectivity, and best speed is the application of factor analysis (variance maximizing, or varimax). The inclusion of speed tuning does not add appreciably to the analysis, so this parameter is excluded. In this technique, a smaller set of factors is constructed from the aforementioned response parameters to capture as much variance in the data set as possible. Table 2 shows the relative contributions of response parameters to the 2 significant factors (P < 0.0001, Bartlett's chi-square). Threshold, latency, and direction selectivity are the primary contributors to factor 1, whereas Q20 and best speed are the primary contributors to factor 2. The 2 FM-response parameters—direction selectivity and best speed—segregate to factors 1 and 2, respectively. This implies that these 2 parameters measure fundamentally different aspects of FM processing by AI neurons.
The main goals of this study are to characterize neuronal receptive field properties and their spatial distributions in squirrel monkey AI to synthetic FM-sweep stimuli. Parametric FM-sweep stimuli are used because they represent important acoustic elements of species-specific vocalizations that can be varied systematically. Dense mapping experiments in single animals reveal the following: 1) The vast majority of AI neurons respond in a temporally precise manner to FM sweeps in both directions and over a range of FM speeds. 2) FM-sweep center frequencies are highly correlated with pure-tone CFs, but BW estimates are only weakly correlated with their pure-tone counterparts. 3) CF and direction selectivity are negatively correlated. 4) Spatial maps for direction selectivity show preference for upward-directed FM sweeps to be located in ventral–caudal AI, where CFs range from 0.5 to 1 kHz. 5) Combinations of pure-tone and FM response parameters form 2 significant factors to account for response variations. In the discussion below, results from this study are placed in the context of earlier work and neuronal encoding of dynamic sounds.
Comparison to other studies: methodologies and species
When comparisons are made among various studies it is important to recognize that published studies differ in FM-sweep stimuli structure, species, and anesthesia. FM-sweep stimuli may be logarithmic, as in this study, or linear. The choice of FM-sweep frequency range and speeds can influence observed responses. A diversity of species have been chosen for FM encoding studies, such as ferret, rat, cat, and monkey. Moreover, a variety of anesthetics have been used. Despite these constraints, a general picture of FM processing in AI emerges.
In an earlier barbiturate-anesthetized squirrel monkey AI study, FM-sweep response profiles from the supratemporal plane, insula, and field R (Bieser 1998) demonstrate phase-locked activation to FM sweeps. In this report, there is no specific attention to quantitative assessment of direction selectivity or best speed for individual neurons, so direct comparison with the present study is limited.
In barbiturate-anesthetized ferrets (Kowalski et al. 1995; Shamma et al. 1993), the results are similar to those of the current squirrel monkey study. Using logarithmic FM sweeps, the mean DSI value is −0.05, which is comparable to this study. One difference in FM stimuli is the range of sweep speeds (30–300 octaves/s), which is much wider in the ferret study. Best speed preference is different. Ferrets prefer much faster FM sweeps (mean at 157 octaves/s), compared with 20–60 octaves/s in this study. One reason for the difference in best-speed response profile may be attributable to species variations. Another reason may be that very fast FM sweeps resemble broadband clicks, which are more similar to transients rather than to dynamic FM sweeps present in vocalizations. Also, a patchy but consistent distribution of DSI is found along isofrequency contours in ferret AI, which is similar to cat AI (see following text) but in contrast to squirrel monkey AI, where upward direction selectivity is clustered about ventral–caudal AI.
In rat auditory cortex there is a significant correlation between DSI and CF (Mendelson and Ricketts 2001; Zhang et al. 2003). Rat AI neurons that prefer upward FM sweeps have CFs <8 kHz, whereas downward FM sweeps have CFs >8 kHz. The value of 8 kHz appears to correspond to the frequency range over which rats have their lowest pure-tone behavioral thresholds (Kelly and Masterton 1977). In the present study, neurons with little direction selectivity have CFs in the range of 4–6 kHz, which is near the frequency range where behavioral pure-tone detection thresholds are lowest in squirrel monkey (Beecher 1974; Fujita and Elliot 1965).
In a barbiturate-anesthetized cat study using logarithmic FM sweeps (Mendelson et al. 1993), the DSI population distribution center is about zero. There is a weak correlation between DSI and CF, and DSI is uncorrelated with best sweep speed, suggesting that direction and speed are processed independently in AI for at least in the midfrequency range sampled in that study.
In another cat AI study using linear FM sweeps (Heil et al. 1992b), the distribution of DSI values centers about −0.124. This result is somewhat lower than values found in the Mendelson (1993) study and much lower than values in this report. The variance may be related to the sampling of only moderately high CFs. A similar study in the cat anterior auditory field (AAF) (Tian and Rauschecker 1994) shows that CF values are adequately predicted from FM-sweep response PSTHs, in which many neurons are selective for sweep speed but not sweep direction.
In chick Field L, the avian auditory analogue of primary auditory cortex, a linear FM-sweep study (Heil et al. 1992a) shows systematic spatial organization for direction selectivity and best speed. Direction selectivity and best speed covary with CF, and there appears to be topographic organization for direction selectivity and best speed among units with similar CFs.
One confounding issue for comparisons among studies is the use of logarithmic versus linear sweeps. These methodologies are compared in AI (Nelken and Versnel 2000) of the barbiturate-anesthetized ferret. In experimental paradigms that most resemble studies by Mendelson et al. (1993b), Heil et al. (1992b), and the current study, there is a significant unity correlation between DSI values derived from logarithmic and linear sweeps (Fig. 8C; Nelken and Versnel 2000). It is also noted that results of best-speed preferences, in the dichotomy of fast or slow, are similar when either logarithmic or linear FM sweeps are used. Thus despite different experimental paradigms, species, and anesthetic conditions, meaningful conclusions can still be drawn. The general distributions across animals are quite similar, and suggest that results on direction selectivity and best speed preferences are robust across experimental paradigms.
Functional organization of FM-sweep response parameters
Highly dense spatial mapping of single and multiunit responses in squirrel monkey AI using pure tones has shown systematic spatial distributions for CF, latency, and CF-corrected threshold (Cheung et al. 2001). Using this mapping approach, the hypothesis that FM-sweep response parameters are also spatially organized in a nonrandom manner is tested.
The results show that of the traditional FM-sweep parameters—direction selectivity, best speed, and speed tuning—only upward FM-sweep direction selectivity is represented reproducibly in ventral–caudal AI. When smoothed maps are considered, there appears to be spatial aggregation of downward-selective neuronal clusters in dorsal–rostral AI. The spatial organization of best speed and speed tuning is best characterized as idiosyncratic patches throughout AI. Yet, patchy spatial distributions in combination with other relatively smooth gradients, such as latency, may provide efficient representations of complex sounds containing FM sweeps. This is consistent with previous results showing that neurons in squirrel monkey auditory cortex respond to many different vocalizations, as well as to significant spectral and temporal degradations of these vocalizations (Newman and Wollberg 1973a; Wollberg and Newman 1972). As such, the concept that AI neurons respond or detect only one call is not well supported. The results of the present study are in line with the hypothesis that AI performs decomposition of sounds into their constituent parts (Nelken 2002; Ohlemiller et al. 1994) in a spatially organized manner (Schreiner 1995, 1998).
Finally, these remarks must be qualified by noting that the recordings are from layers IIIb and IV. These laminae are the main targets of thalamic input into primary cortex and display strong responses with latencies shorter than those found in other layers. Because layers IIIb and IV have been the dominant focus in auditory cortex mapping studies, very little is understood about how responses in different layers might lead to different functional maps. Only a few studies have attempted to relate FM-sweep responses across different cortical depths (Mendelson and Cynader 1985; Mendelson et al. 1993; Shamma et al. 1993). These studies conclude that direction selectivity and best speed are constant in orthogonal cortical penetrations. Nevertheless, FM response properties may vary as a function of cortical depth because of cortico-cortical connections (Code and Winer 1985, 1986; Winguth and Winer 1986), although further clarification is needed.
Similarities and differences of FM-sweep response distributions across animals
The results in Fig. 4 make it clear that similar population response patterns are present across individual animals. Selectivity for upward or downward FM sweeps, best speed, speed tuning, and correlation between upward and downward area–speed curves are quite similar in all 3 monkeys. These consistent response distributions imply that there is a general range of FM sweep parameters that are represented in AI.
Although population distributions of FM sweep response profiles are similar across animals, there are intriguing interanimal differences. The most salient variations are found in topographical maps of FM-sweep responses. Aside from representation of upward FM sweeps, spatially coherent locations for other FM-response parameters are positioned idiosyncratically across AI. The sources of variations are likely related to differences in developmental aspects of cortical organization and behaviorally relevant acoustic experiences among individual monkeys (Buonomono and Merzenich 1998).
Similarities and differences between simple and complex sound responses
Many studies implicitly assume that the responses to FM sweeps are related to the FRA. The rationale is that as the FM sweep traverses different frequencies it will reach either the upper or lower frequency boundary of the FRA. The inference that follows is that FM-sweep trigger frequencies are somehow linked to FRA boundary frequencies. Using trigger frequencies, center frequency estimates are well correlated with pure-tone CFs. BW estimates are much more weakly correlated. As discussed in methods, FM sweeps are presented 20–40 dB SPL above threshold, whereas pure-tone BWs are measured 20 dB SPL above threshold. Ceteris paribus, BWs derived from FM sweeps are expected to be broader than BWs derived from pure tones. In fact, BW deviations are biased in the opposite direction (Fig. 5). Some important considerations for this finding are consequences of side-band inhibition (Shamma et al. 1993) and differences in the timing of excitatory and inhibitory contributions (Zhang et al. 2003), which are especially salient when using dynamic stimuli, such as FM sweeps. The contributions of these 2 factors cannot be disentangled in this study because no specific assay for measuring side-band inhibition was deployed.
The most significant difference between simple pure tones and FM sweeps is the stationary nature of tones. This suggests that variations in excitatory BW estimates may be attributable to the nonstationary nature of FM sweeps. Synthetic FM sweeps are parameterized sounds that contain correlations. Midbrain auditory neurons process correlated sounds in ways that are not predictable from the processing of simple, stationary sounds (Escabí and Schreiner 2002). Multiunit recordings may reduce the influence of these correlations through response averaging, although it is likely that most auditory neurons process spectrally dynamic sounds differently from spectrally simple sounds (Schreiner 1998).
A methodological consideration to reconcile BW estimate variations is the choice of procedure for calculating trigger frequencies, which are based on the means of Gaussian fits to PSTHs. Using other measures to mark the timing of FM-sweep responses, such as the arrival time of the first spike, would not change the trigger frequencies because the width of responses is relatively constant at most sweep speeds. Trigger frequency estimates are dependent on relative, not absolute, peak response times. Thus the estimation of trigger frequencies is robust to the choice of method for marking response times.
Finally, FM responses in AI are not completely accounted for by simple parameters. This is borne out by factor analysis, which shows at least 2 factors that are constructed from combinations of pure-tone and FM response parameters. Cortical neuron representation of dynamic sound stimuli includes pure-tone and FM response parameters that serve as subsets of a more complete set of auditory processing factors in AI (Barbour and Wang 2003; deCharms et al. 1998; Escabí and Schreiner 2002).
This work was supported by Veterans Affairs Merit Review Grant to S. W. Cheung, National Institutes of Health Grants DC-02260 and NS-34835 to C. E. Schreiner, and by Montgomery Street Foundation, Coleman Fund, and Hearing Research, Incorporated.
The authors thank D. A. Copenhaver for assisting with the experiments.
↵* B. Godey and C. A. Atencio contributed equally to this work.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Copyright © 2005 by the American Physiological Society