Lesion studies have indicated that the auditory cortex is crucial for the perception of acoustic space, yet it remains unclear how these neurons participate in this perception. To investigate this, we studied the responses of single neurons in the primary auditory cortex (AI) and the caudomedial field (CM) of two monkeys while they performed a sound-localization task. Regression analysis indicated that the responses of ∼80% of neurons in both cortical areas were significantly correlated with the azimuth or elevation of the stimulus, or both, which we term “spatially sensitive.” The proportion of spatially sensitive neurons was greater for stimulus azimuth compared with stimulus elevation, and elevation sensitivity was primarily restricted to neurons that were tested using stimuli that the monkeys also could localize in elevation. Most neurons responded best to contralateral speaker locations, but we also encountered neurons that responded best to ipsilateral locations and neurons that had their greatest responses restricted to a circumscribed region within the central 60° of frontal space. Comparing the spatially sensitive neurons with those that were not spatially sensitive indicated that these two populations could not be distinguished based on either the firing rate, the rate/level functions, or on their topographic location within AI. Direct comparisons between the responses of individual neurons and the behaviorally measured sound-localization ability indicated that proportionally more neurons in CM had spatial sensitivity that was consistent with the behavioral performance compared with AI neurons. Pooling the responses across neurons strengthened the relationship between the neuronal and psychophysical data and indicated that the responses pooled across relatively few CM neurons contain enough information to account for sound-localization ability. These data support the hypothesis that auditory space is processed in a serial manner from AI to CM in the primate cerebral cortex.
Psychophysical studies of sound-localization ability have defined several stimulus parameters that affect localization performance and the binaural and spectral cues that could be used to calculate acoustic stimulus location (e.g., Butler 1986; Makous and Middlebrooks 1990;Perrott and Saberi 1990; Recanzone et al. 1998; Stevens and Newman 1936; Wightman and Kistler 1989; see Middlebrooks and Green 1991). The processing of these cues has been investigated rigorously at different levels of subcortical auditory structures (e.g., Fitzpatrick et al. 1997; Imig et al. 1997; Joris and Yin 1995; Kuwada et al. 1997; Litovsky and Yin 1998; Yin and Chan 1988, 1990), and in the auditory cortex (e.g., Clarey et al. 1995; Phillips and Irvine 1981; Rajan et al. 1990a,b; Semple and Kitzes 1993a,b). Although lesion studies in humans (Haeske-Dewick et al. 1996; Poirier et al. 1994; Sanchez-Longo and Forster 1958) and monkeys (Heffner and Heffner 1990; Thompson and Cortez 1983) have indicated that the auditory cortex is crucial for this perception in primates, it is still poorly understood how auditory cortical neurons participate in the perception of acoustic space.
Electrophysiological studies in anesthetized mammals indicate that the representation of acoustic space is not topographically organized in AI (e.g., Brugge et al. 1996; Imig et al. 1990; Middlebrooks and Pettigrew 1981;Middlebrooks et al. 1998; Rajan et al. 1990b) in contrast to the barn owl optic tectum (Knudsen and Konishi 1978; see Knudsen and Brainard 1995) and the mammalian superior colliculus (Jay and Sparks 1984; King and Hutchings 1987;Middlebrooks and Knudsen 1984; Wallace et al. 1996). It has been suggested that acoustic space is represented by population firing rate codes (e.g., Eisenmann 1974) or by the temporal firing pattern of single neurons or populations of auditory cortical neurons (e.g., Ahissar et al. 1992;Furukawa et al., 2000; Gottlieb et al. 1989; Middlebrooks et al. 1994, 1998;Vaadia and Abeles 1987; Xu et al. 1998,1999). To date, however, only two studies have investigated how the activity of cortical neurons in the awake primate relates to sound location perception (Ahissar et al. 1992; Benson et al. 1981), and neither of these studies investigated localization in both azimuth and elevation or directly related single-unit activity to sound-localization behavior.
Recent anatomic evidence suggests that auditory information is processed serially in the primate cerebral cortex (e.g., Jones et al. 1995; Morel et al. 1993;Rauschecker et al. 1997; Romanski et al. 1999; see Kaas et al. 1999; Rauschecker 1998 for reviews) similar to the dorsal and ventral processing streams of visual information (Ungerleider and Mishkin 1982). The available electrophysiological and imaging studies also support this idea (e.g., Bushara et al. 1999;Rauschecker et al. 1995; Recanzone et al. 2000; Weeks et al. 1999) although there has been no direct evidence linking the responses of auditory cortical neurons to the identification or localization of sound stimuli.
This study reports the activity of cortical neurons in monkeys localizing tone and noise stimuli and relates this activity to the monkeys' psychophysically measured auditory spatial acuity. Single-neuron responses were recorded in the primary auditory cortex (AI) and the caudomedial field (CM), which have been hypothesized to form part of the “where” processing stream in the primate auditory cortex (Rauschecker 1998).
A partial report of these data has appeared in abstract form (Guard et al. 1998).
All procedures conformed to the Public Health Service policy on the care and use of experimental animals and were approved by the University of California at Davis animal care and use committee. Two adult male rhesus monkeys (Macaca mulatta) weighing 7–11 kg over the course of the study were used in all experiments (monkeys L and M).
Stimuli and apparatus
All experiments were conducted in a double-walled acoustically shielded sound booth (IAC) with inner dimensions of 2.4 × 3.0 × 2.0 m (l × w × h), with echo-attenuating foam (Sonex) on all four walls, the ceiling, and most of the floor. The monkey sat in an acoustically transparent primate chair 1 m from the rear wall and 1.5 m from the adjacent wall and was monitored continuously in the left profile by the experimenter via an infrared camera and closed-circuit monitor.
Acoustic stimuli were generated using a TDT (Tucker-Davis Technologies) system controlled by a PC. Stimuli consisted of either tones, band-passed noise, or broadband noise. Stimulus duration, intensity, and spatial location was varied depending on the task requirements (see following text). Stimuli were presented from one of six different speakers at 18 different locations. Five speakers were located on an array (Fig. 1 A; speakers a–e), and the sixth speaker was located at 90° contralateral to the recording location (speaker o). The array was constructed from two curved aluminum arms that positioned each of the five speakers 146 cm from the center of the interaural axis of the monkey. One speaker (a) was located directly in front of the monkey, two speakers were located at eccentricities of 30° (b and c) and two at 15° (d and e). The array was positioned at 0° as shown in Fig.1 A, or rotated 45, 90, or 135° counterclockwise with the axis of rotation at speaker a (Fig. 1 B). Speakers b–e therefore were positioned at four different locations in space, and each crossed through the zero axis for either elevation (b and c) or azimuth (d and e).
Stimuli used for determining the spatial response properties of these neurons consisted of 200 ms (5-ms rise/fall) tones, Gaussian noise, or one of five different band-passed noise stimuli. Bandpassed stimuli had cutoff frequencies of 50–750 Hz, 750–1,500 Hz, 1.5–3.0 kHz, 3.0–6.0 kHz, and 5.0–10.0 kHz. Measurements of the frequency response for each speaker at each location showed that there were ≤3 dB variations in the amplitudes of different frequency components, so the stimulus intensity was randomly varied across presentations by ±3 dB centered at 65 dB SPL. In each session, two stimuli were presented on randomly interleaved trials; a tone near the characteristic frequency (CF) of the neuron and the noise stimulus that encompassed this frequency. Each tone and noise stimulus was presented for ≥10 correct trials from each of the 17 frontal locations in each complete session. Two sessions occasionally were performed in a single day. The position of the right pinna was measured on some sessions using the search coil technique (DNI Instruments) by placing the monkey in a magnetic field and attaching a search coil on the most distal and posterior portion of the pinna with cyano-acrylate. The pinna position was digitized at 1 kHz and stored for off-line analysis.
Behavioral procedures during single-neuron recording
The activity of single neurons was measured while the monkey performed the sound-localization task illustrated in Fig. 1,C–E. A light-emitting diode (LED) suspended in front of the array that did not reflect onto the array blinked on/off until the monkey depressed a lever to initiate a trial. A series of three to eight tone stimuli (50-ms duration, 3-ms rise/fall) were presented from the 90° speaker every 750 ms (S1 stimuli). The monkey's task was to maintain the lever depressed during the S1 stimulus presentations and to release the lever within 500 ms when a stimulus was presented from one of the 17 frontal locations (S2 stimuli) to receive a fluid reward. On most trials the S2 stimulus was either a 200-ms duration tone or noise as described in the preceding text. Catch trials were introduced randomly to ensure that the monkey was attending to the location of the stimulus and not the stimulus duration or type. The first type (Fig.1 D) presented the 200-ms duration tone or noise from the speaker at 90°. In this case, the monkey was not rewarded for releasing the lever until the stimulus was presented from a frontal location. The second type (Fig. 1 E) had a short-duration stimulus (50-ms tone) presented from one of the frontal locations. In this instance, the monkey was rewarded if it released the lever. Thus the monkey could only reliably obtain a reward if it maintained the lever depressed until a stimulus was presented from a frontal location.
The short-duration (50 ms) tone stimuli presented from the contralateral speaker were used to characterize each neuron's frequency and intensity response properties reported previously (Recanzone et al. 2000). Sixteen stimulus intensities varied from 0 to 80 dB SPL in 15 steps (5.333 dB/step), and 31 stimulus frequencies varied over a 2–4 octave range in 30 steps (16 × 31 = 496 total stimuli). Each stimulus was presented at least two times. These data were used in this report to physiologically characterize these neurons, to physiologically define the primary auditory cortex and caudomedial fields, and to define the rate/level functions of each of the cortical neurons (see following text). The frequency and intensity tuning functions of these neurons have been reported previously (Recanzone et al. 2000).
Behavioral procedures during psychophysical tasks
A separate set of experiments defined the sound-localization ability of these monkeys to these acoustic stimuli. All stimuli in this task were 200 ms in duration. The S1 stimulus was presented at 0° in azimuth and elevation and changed to a new location (S2) using the same apparatus and an additional six speakers to present stimuli along either the horizontal meridian or midsaggital plane at eccentricities of ±5, 10, 15, 22.5, and 30°. The monkey's task was identical: depress the lever to initiate a trial and release the lever when it detected a change in the stimulus location. For each speaker location, the performance was calculated as the hit rate (No. hits/[No. hits + No. misses]) multiplied by (1 − false-positive rate) where the false-positive rate was calculated as (No. FP/total number of trials). This measure of performance provides a reliable measure of perceptual ability (Phan et al. 2000; Recanzone et al. 1992,1993, 1998; see Recanzone et al. 1991). Thresholds were calculated as the location with a 0.5 performance by linear interpolation. The S2 locations were interleaved randomly between trials with 25–45 trials presented at each S2 location. Sessions were repeated using different speakers for each location by presenting stimuli with the array rotated 90 and 180° counterclockwise.
A restraining head post and recording cylinder was implanted surgically using conventional methods to allow a vertical approach to AI (see Pfingst and O'Connor 1980; Recanzone et al. 1997). Magnetic resonance imaging (MRI) images in the frontal plane were obtained in monkey L before surgery (1.5 T magnet, 3-mm slices). Electrodes were introduced into the brain through guide tubes inserted through a plastic grid fastened to the recording cylinder (Crist et al. 1988). Guide tubes were inserted through the dura into the overlying parietal cortex ∼2–5 mm above the superior temporal gyrus. Electrodes (FHC, ∼1–3 MΩ at 1 kHz) were advanced through the guide tube via a hydraulic microdrive (FHC or Narshige) until neuronal activity driven by auditory stimuli was encountered. Unit activity was amplified and displayed on an oscilloscope and audio monitor using conventional methods. Search stimuli consisted of tones, noise, band-passed noise, and clicks presented from the 90° or frontal speakers. Single-neuron activity was isolated on-line using a time-amplitude window discriminator (Bak), off-line by using spike-sorting software, or both. Single-neuron waveforms were digitized at 50 kHz and stored for later analysis.
Poststimulus time histograms were constructed using 3-ms time bins by summing the activity over each of the trial repetitions. For all other analysis, the firing rate was measured for both 100 and 250 ms starting at the onset of the stimulus presentation. We compared the results using these two response windows for all subsequent analysis, and although there could be minor differences in the results, they never reached statistical significance (paired t-test;P < 0.05 in all cases) and therefore we will present only the results using the 250-ms recording window.
Multiple regression analysis compared the activity as a function of the stimulus azimuth and elevation using either all trials for each stimulus presentation (170 trials) or by taking the mean response at each location. Behavioral thresholds were predicted by the neuronal response by interpolating the azimuth or elevation location that corresponded to the mean ± 1 SD of the firing rate for stimuli presented directly in front of the monkey. Comparisons between the neuronal and behavioral data were defined as the ratio of the threshold predicted by the neuronal response divided by the behaviorally measured threshold.
Intensity tuning was defined using the 50-ms duration stimuli presented from the contralateral speaker (see Recanzone et al. 2000). The responses to each stimulus intensity was averaged across the five tested frequencies nearest the CF of the neuron (2 higher, 2 lower, and 1 at CF, 10 trials/intensity). The mean response at the highest intensity tested then was divided by the greatest response at any intensity. This ratio varies from 0 (no response at the highest intensity tested) to 1.0 (the highest intensity tested elicited the greatest response).
Monte Carlo analysis was used to determine if neurons with the same spatial response properties were nonrandomly organized within AI. A computer program randomly assigned one of the grid locations and electrode depths to each of the studied neurons. The number of instances where adjacent recording locations contained neurons that were spatially sensitive, insensitive, or mixed (seeresults) was measured in 10,000 iterations. These simulations then were compared with that observed experimentally. This analysis did not require the same pattern of different response classes to match that found experimentally (i.e., location by location), it only compared the likelihood that the same distributions would be observed by chance.
Psychophysical measures of sound-localization performance
Representative examples of psychometric functions frommonkey L tested with different stimuli are shown in Fig.2. In each panel, the performance is shown as a function of the stimulus eccentricity in azimuth (□) or elevation (●) relative to directly in front of the monkey (0° in azimuth and elevation); - - - in each plot shows the performance level of 0.50. Thresholds were calculated as the 0.50 performance level (see Recanzone et al. 1991) for changes in location to the left and right in azimuth and up and down in elevation. There was no statistically significant difference in threshold between left versus right in azimuth or up versus down in elevation (paired, 2-tailed t-test; P > 0.05), and thresholds were taken as the mean of the two measured values across sessions. Thresholds for the tone and noise stimuli are shown in Table1 for azimuth (left columns) and elevation (right columns). Consistent with previous studies in the macaque monkey (Brown et al. 1978, 1980, 1982;Heffner and Heffner 1990; May et al. 1986) and human subjects (see Middlebrooks and Green 1991) localization in azimuth was best for the stimulus with the largest bandwidth, followed by the 1-octave band-passed noise, and finally the tone stimuli. For localization in elevation, monkey L was unable to reach the 0.50 performance level for any of the tone stimuli presented. Monkey M did reach a 0.50 performance level for the 1-, 2-, and 4-kHz tones, but only for the stimuli at the greatest eccentricities. For the band-passed noise stimuli, elevation thresholds could be measured in both monkeys, and they were greatest for the noise stimuli restricted to low-frequency components, and progressively improved as the stimulus included more high-frequency components. For broadband noise stimuli, thresholds for localization in elevation were only slightly higher than for localization in azimuth in monkey M and equivalent inmonkey L.
In summary, localization ability was best for broadband noise and poorest for tone stimuli, and localization in azimuth was better than localization in elevation. These results were consistent with previous studies (Brown et al. 1978, 1980, 1982; Heffner and Heffner 1990; May et al. 1986).
Identification of primary auditory cortex
Histological verification of the recording locations is not yet possible as these monkeys are still participating in experiments. The data of this report are based on recording locations that have been physiologically identified as being in AI and CM based on the CF progression and frequency tuning bandwidth (see Kosaki et al. 1997; Merzenich and Brugge 1973; Morel et al. 1993; Rauschecker et al. 1997). Details of the frequency and intensity tuning properties of these neurons and the physiological criteria used to classify neurons as being within AI and CM have been reported previously (Recanzone et al. 2000) and are described only briefly here. Figure3 shows the recording locations and the region physiologically defined as AI (heavy line) in both monkeys.A and B show the dorsal view of the recording grid, with shaded circles denoting cortical locations where the multiple-unit CF and threshold was qualitatively defined, and the filled circles show recording locations where the quantitative data of this report were collected. C and D show the CFs of neurons at these locations. Data also were collected after rotating the grid 30° counterclockwise to investigate intermediate grid locations (data not shown). For both monkeys, there was a caudal-to-rostral progression of locations where neurons had high to low CFs. At the rostral border in monkey M, there was a reversal of CF, consistent with the rostral field. Medial to AI neurons were more broadly tuned (B) and responded better to band-passed stimuli and clicks, consistent with CM. These neurons commonly were tested with stimuli containing higher frequency stimuli (>5 kHz), as most neurons responded briskly to these stimuli even though the CF could be defined as much lower. Laterally, neurons showed an increase in bandwidth and also had better responses to more complex stimuli (voices, fingers snapping, etc.) consistent with the lateral fields. Recording locations also were compared with MRI images of the cerebral cortex inmonkey L and were consistent with the area investigated as being AI and CM based on the gross morphology of the superior temporal gyrus (e.g., Jones et al. 1995; Merzenich and Brugge 1973; Morel et al. 1993). The numbers of neurons recorded in AI and CM to each of the different stimulus types are shown in Table 2.
Figure 4 shows the responses of a representative AI neuron as a function of the spatial location of tones (Fig. 4 A) and broadband noise (Fig. 4 B). Each raster shows the response to each stimulus presentation and the poststimulus time histogram (PSTH) shows the sum of these responses for each location, oriented on the page at the relative location from the monkey's perspective (see Fig. 1 B). This neuron is representative of the sample from AI in that there was a consistent and robust response across spatial locations. Most AI neurons also showed a greater response for contralateral locations compared with ipsilateral locations for noise stimuli, and less apparent spatial sensitivity for tone stimuli (compare A and B). Figure5 shows the responses to the same acoustic stimuli for a representative CM neuron. Contralateral locations elicited a greater response than ipsilateral locations for both tone (Fig. 5 A) and noise (Fig. 5 B) stimuli. A second feature common to CM neurons was in the clear difference in the response as a function of the stimulus azimuth but less clear modulation in elevation. This is illustrated by comparing the responses across the middle row of PSTHs (azimuth tuning) to themiddle column of PSTHs (elevation tuning).
These two representative examples show the general response features of most neurons, but there was considerable variability between neurons in both areas. For example, some neurons could have only onset responses with much lower responses (or inhibition) during the latter part of the stimulus presentation, whereas other neurons could show offset responses. A detailed description of the differences in response characteristics is inappropriate here; however, we have noted that ∼2/3 of AI and CM neurons had phasic responses to these stimuli, defined as responses between 100 and 200 ms after stimulus onset that were at least half the magnitude as the response during the first 100 ms after stimulus onset (Recanzone, unpublished observations). As noted in the preceding text, we saw no significant differences in the following results depending on whether responses were measured over the initial (100 ms) or throughout (250 ms) the response period relative to the onset of the stimulus, and therefore only data based on the responses 250 ms are reported in the following text.
Multiple regression analysis was used to determine if the response was significantly correlated with the stimulus location. The results of this analysis are shown for the same two neurons in Fig.6. Azimuth and elevation components are illustrated separately. Consistent with the visual inspection of the PSTHs, this quantitative analysis showed that for the AI neuron there was no significant correlation between the response and the location of the tone stimuli (Fig. 6, A and B) or the elevation of the noise stimuli (Fig. 6 D), but there was a significant correlation with the azimuth location of the noise stimuli (Fig. 6 C). Similarly, for the CM neuron there was a significant correlation between the response and the stimulus azimuth (Fig. 6, E and G) but not elevation (Fig. 6,F and H) for both tone and noise stimuli. This analysis was conducted on all neurons, and those that were significantly correlated with either the azimuth or elevation of either the tone or noise stimulus are termed “spatially sensitive.”
The percentages of spatially sensitive neurons as a function of the stimulus type are shown in Fig. 7. There were greater percentages of AI (left) and CM (right) neurons with higher firing rates for contralateral (•) than ipsilateral (□) locations (Fig. 7 A). For CM neurons, only those tested with stimuli containing high-frequency components are shown, as the numbers of neurons tested with other stimuli was too small to make valid comparisons (12 neurons tested with stimuli <1.5 kHz and 15 neurons tested with stimuli between 1.5 and 6 kHz; see Table 2) but all statistical tests incorporated all studied neurons. Across both AI and CM neurons, the proportion of neurons sensitive to the azimuth of noise stimuli was greater than the proportion of the same neurons that were sensitive to the azimuth location of tone stimuli (paired t-test; P< 0.05). There was no difference in the proportion of neurons sensitive to the azimuth location of either stimulus type between AI and CM neurons (P > 0.05). The proportion of neurons sensitive to the stimulus elevation is shown in Fig. 7 B. The only tone stimuli that showed a substantial percentage of neurons that were sensitive to the elevation location were those with frequencies >15 kHz, although the slopes generated from the regression analysis of these neurons was relatively low (see following text). Again, there was a difference in the proportion of neurons sensitive to the elevation of noise stimuli compared with tone stimuli (P < 0.05) but not between AI and CM neurons (P > 0.05).
These data are summarized in Fig. 7 C, where the percentage of neurons that were sensitive to the azimuth and elevation locations were pooled regardless of the direction of the slope of the regression line. Approximately half of the AI neurons and a greater percentage of CM neurons were sensitive to the azimuth location of the stimulus. This analysis also demonstrated that relatively few neurons were sensitive to the elevation of the tonal stimuli (□). It should be noted that we repeated the regression analysis using only the mean response for each stimulus location and found the equivalent result. The main differences between the two analysis procedures was in neurons with relatively low firing rates that had P values in the range of 0.040–0.065, but the overall pattern of the results was unchanged.
One question raised by this analysis is whether neurons that were sensitive to the location of noise stimuli were also sensitive to the location of tone stimuli. For both tone and noise stimuli, there was a slightly greater percentage of CM neurons that were sensitive to the stimulus location compared with AI neurons (Fig.8). This trend was reversed when the proportion of neurons that were sensitive to either tone, or noise, or both stimuli were compared with ∼80% of neurons in both cortical areas being defined as spatially sensitive to at least one stimulus. The main difference between AI and CM neurons was that proportionally more CM neurons were sensitive to the spatial location of both tone and noise stimuli (Fig. 8, far right). In CM, 92% of the neurons sensitive to the azimuth of the tone stimulus were also sensitive to the azimuth of the noise stimulus, whereas this only occurred in 4% of AI neurons. Not shown in Fig. 8 is that the spatial sensitivity was consistent between the tone and noise stimuli, where one CM and two AI neurons had greater responses to contralateral tone stimuli and ipsilateral noise stimuli, and two CM neurons and two AI neurons had greater responses to downward tone stimuli and upward noise stimuli. In all other cases, the response preference was consistent between the tone and noise stimuli.
The degree of the response modulation as a function of the stimulus location was defined by the normalized slope of the regression line. The response on each trial was normalized to the averaged response to the center stimulus, using the greater of the responses between tone and noise stimuli. Slopes are expressed in units of percentage change in activity/degree. The frequency distributions of the slopes for AI and CM neurons tested with broadband noise and >15-kHz tone stimuli are shown for monkey L in Fig.9. This case was chosen for illustration as there was the greatest number of both AI and CM neurons tested with these stimuli (Table 2). The modulation of the response as a function of stimulus location was greater for neurons in CM compared with AI for these stimuli as well as across all recorded neurons for each stimulus tested (ANOVA; P < 0.05).
In summary, ∼80% of the AI and CM neurons tested were modulated by the spatial location of the stimulus. Neurons that were spatially sensitive to the stimulus elevation most commonly were found using noise stimuli and rarely encountered using tone stimuli. AI neurons were rarely spatially sensitive to both tone and noise stimuli, whereas approximately half of the CM neurons were spatially sensitive to both stimulus types, and most neurons classified as spatially sensitive to tone stimuli were also spatially sensitive to noise stimuli. These data indicate that there is a transformation of spatial sensitivity of neurons and suggest that the input integrated from spatially sensitive AI neurons could account for the responses of at least some CM neurons.
Correlation of responses with psychophysical performance
A major goal of these experiments was to compare the spatial responses of single neurons to the ability of these monkeys to localize the same acoustic stimuli. More neurons were spatially sensitive to noise than tone stimuli, consistent with the behavioral results. There was also relatively few, or no, neurons that were sensitive to the elevation of the stimulus for tone or noise stimuli containing only low frequencies, again consistent with the poor localization in elevation of these stimuli. These relationships indicate that the general spatial sensitivity of auditory cortical neurons is consistent with the sound-localization performance.
To directly correlate the neuronal and behavioral data, the mean and SD of the activity of each single neuron was measured to each of the locations also tested in the psychophysical paradigm (0 and ±10 and 30° in azimuth and elevation). The spatial location that corresponded to a difference of 1 SD of the mean response to the center location then was computed by linear interpolation. Examples of this analysis from the same AI and CM neurons shown in Figs. 4 and 5 are shown in Fig. 10 A. This analysis is similar to signal detection theory analysis (Green and Swets 1966), in that the variance in the neuronal measure is used to predict the behavioral data. For each neuron, the predicted thresholds were taken as the average of the contralateral and ipsilateral estimates in azimuth and upward and downward estimates in elevation. These neuronal predictions were compared with the behavioral data as the ratio of the predicted threshold divided by the measured threshold. Perfect correspondence of these two measures gives a ratio of 1.0. The frequency distribution of this ratio across all AI and CM neurons is shown in Fig. 10, B–D. To prevent the spatially insensitive neurons from driving the population mean to infinity, the maximum ratio was arbitrarily set at 60, which was the highest ratio encountered for all spatially sensitive neurons. The population of CM neurons had significantly smaller ratios than the population of AI neurons for azimuth localization of noise stimuli (mean ± SD: 8.7 ± 10.7 vs. 2.3 ± 1.2 for AI and CM, respectively;P < 0.05), elevation localization of noise stimuli (7.4 ± 13.3 vs. 6.0 ± 6.0; P < 0.05), and azimuth localization of tone stimuli (8.4 ± 9.3 vs. 6.2 ± 5.3; P < 0.01). We were unable to perform the same analysis for tone stimuli tested in elevation due to the inability of these monkeys to accurately localize these stimuli (see preceding text and Table 1).
These large means are the result of many spatially insensitive neurons. Restricting the analysis to neurons with ratios of ≤2.0 indicated that for azimuth localization of noise stimuli 31.3% of AI neurons and 39.1% of CM neurons could predict the behavioral threshold to within a factor of 2 or better. Similarly, 24.9% of AI neurons and 27.9% of CM neurons had ratios <2 for localization of noise stimuli in elevation, and 15.7% of AI and 32.0% of CM neurons had ratios <2 for azimuth localization of tone stimuli. Finally, there was a substantial number of neurons in both AI and CM that predicted behavioral thresholds that were better than the monkeys. For azimuth localization of noise stimuli, there were 14.2 and 23.2% of neurons that had ratios <1.0 in AI and CM, respectively. There was no difference for the elevation localization of noise stimuli (8.5% of both AI and CM neurons), but a lower percentage of AI neurons that predicted thresholds better than the monkey for azimuth localization of tone stimuli (4.1 vs. 8.7% for AI and CM, respectively). These analyses indicate that neurons in CM were better able to predict the behavioral thresholds compared with neurons in AI, although neurons in both cortical areas could accurately predict the behavioral performance.
This analysis shows that individual neurons could predict thresholds better than, at, and worse than those measured behaviorally. Similar distributions of the ability of single neurons to predict behavioral thresholds have been described for visual motion processing areas of the cerebral cortex (e.g., see Britten et al. 1992;Celebrini and Newsome 1994). We therefore considered the possibility that populations of neurons could more effectively predict the behavior than single neurons. For this analysis, the responses were pooled between either all neurons studied or only the spatially sensitive neurons. Neurons were pooled based on the stimulus used in defining their response (e.g., low-frequency tones and band-passed stimuli) and correlated with the behavioral thresholds for those same stimuli. The response on each trial was normalized to the peak activity for each individual neuron before pooling to normalize across firing rates, and then the same analysis as described in the preceding text (Fig. 10 A) was conducted. Figure11 A shows the results for both monkeys comparing the prediction from pooling all neurons (x axis) or only the spatially tuned neurons (yaxis) for the azimuth thresholds for both tone and noise stimuli, and the elevation thresholds for noise stimuli (21 comparisons total). Pooling only the spatially sensitive neurons resulted in a closer correspondence between the predicted and measured threshold (most points fall below the diagonal line of Fig. 11 A) compared with pooling all neurons. This was particularly true for CM neurons, where pooling the responses of spatially sensitive neurons consistently resulted in a predicted/measured ratio near 1.0 (dashed horizontal line of Fig. 11 A).
These results are summarized in Fig. 11 B, where the mean and SD of the data from Fig. 11 A are shown for AI (open bars) and CM (closed bars) when all neurons (All) or only the spatially sensitive neurons (Sens) were pooled. For both AI and CM, pooling neurons strengthened the correlation between the neuronal prediction and the behaviorally measured threshold. In AI, the ratio of the population of spatially sensitive neurons was 2.00, which was significantly different from a population with a mean of 1.0 (t-test; P > 0.05). In CM, the ratio from the population of spatially sensitive neurons was 1.38, which was different from AI (P < 0.05) but not different from a population with a mean of 1.0 (P > 0.05). These data indicate that the pooled responses of spatially sensitive neurons in CM retained sufficient information to account for the behaviorally measured performance across stimulus types in these two monkeys.
One issue of concern is that pooling the neurons gave better predictions than averaging the individual neurons, which is due in large part to the normalization. It may be that this normalization procedure is inappropriate, as neurons with higher firing rates could be providing a larger signal than neurons with low firing rates. This seems unlikely as the overall activity had little effect on the spatial sensitivity of the neurons (see following text). To further address this question, the neuronal responses were pooled without normalization and subjected to the same regression analysis as was done for the single-neuron responses (i.e., Fig. 6). This analysis revealed that the slopes of the regression lines and the correlation coefficients were significantly greater for the population of spatially sensitive neurons compared with the population of all neurons in both AI and CM (t-test; P < 0.05). The slopes of the regression lines and the correlation coefficients were also greater in CM than in AI across all comparisons of azimuth and elevation localization (P < 0.05). This analysis shows that the normalization procedure did not qualitatively affect the main findings.
Relationship between spatially sensitive and spatially insensitive neurons
The next level of analysis was designed to determine if there were any response characteristics that would reveal the spatially sensitive neurons as a distinct subpopulation, or if all neurons form a continuum from no spatial sensitivity to high spatial sensitivity. The distributions of regression line slopes shown in Fig. 9 indicate that the spatial sensitivity forms a unimodal distribution. The same was true if the data were pooled across stimulus types and monkeys (data not shown).
The first possibility considered was that spatially sensitive neurons comprised those with relatively high or low firing rates. Comparison of the average activity across all locations tested showed no statistically significant difference between monkeys or the stimuli used (ANOVA; P > 0.05 for all comparisons), and the data therefore were pooled. The firing rates between the spatially sensitive and insensitive neurons were not statistically significantly different from each other (P > 0.05). A similar result was noted when the activity was normalized to either the frontal location that gave the greatest activity, the center location, or the speaker at 90° contralateral to the recording location (allP values <0.05). The spontaneous activity was also not different between spatially sensitive and insensitive neurons (P > 0.05). The intensity tuning of the neurons also was defined (see methods). Previous studies have indicated that there was no significant differences in the intensity tuning of neurons in AI and CM (Recanzone et al. 2000). Comparison of the intensity tuning between the neurons that were defined as spatially sensitive and those that were not showed no significant difference (Kolmogorov-Smirnov; P > 0.05). We also compared the populations of neurons that were classified as having sustained responses (response during 100–200 ms at least half the response during 0–100 ms from stimulus onset) or phasic responses (nonsustained) and again saw no differences between spatially sensitive and spatially insensitive neurons (P > 0.05).
It had been previously noted in the cat that the best location in azimuth of cortical neurons tended to cluster within a given region, (e.g., Clarey et al. 1994; Imig et al. 1990; Middlebrooks et al. 1998; Rajan et al. 1990b). Although the recording locations were generally separated by 1 mm, in many instances two or three neurons were recorded simultaneously from the same electrode, so it was possible to determine if spatially sensitive neurons were clustered together. The percentages of pairs of neurons (39 pairs) and triplets of neurons (28 triplets) that had the same spatial sensitivity (either all sensitive or all insensitive) was significantly greater than that expected by chance (χ2; P < 0.05). The analysis of CM neuron pairs (13 pairs) and triplets (8 triplets) showed a similar result, again with more neighboring neurons sharing the same spatial selectivity than expected by chance (χ2; P < 0.05). It should be noted that although these differences were statistically significant, there was still a significant percentage of neuron pairs and triplets in which both spatially sensitive and insensitive neurons were encountered (30–40% for both AI and CM neurons). Thus spatially sensitive neurons were locally clustered together, but this clustering was not complete.
Although the sampling density of recording locations was relatively sparse, it was still possible to determine if there was a nonrandom distribution of spatially sensitive neurons in AI using a Monte Carlo analysis (see methods). The number of one, two, and three adjacent recording locations that had the same spatial response type (sensitive, insensitive, or mixed), regardless of their spatial position, was compared with that expected by chance. The results showed that the number of adjacent recording locations that shared the same response type was not significantly different from chance for either tone or noise stimuli. Thus although spatially sensitive and insensitive neurons have a tendency to be clustered together locally, there was no systematic organization of these neurons across AI.
Spatially tuned neurons
In addition to the spatially sensitive neurons, we also encountered neurons that we have termed spatially “tuned,” as they had a response of ≥75% of the maximum response for one to three adjacent spatial locations and firing rates below this level for all other locations. These neurons were not classified as spatially sensitive because the regression analysis did not reach statistical significance due to the few locations with high activity surrounded by locations with lower activity. This classification was restricted to neurons that were completely bounded within the central 60° of acoustic space, as neurons responsive to only the edges of the tested region may have been classified as spatially sensitive if greater eccentricities had been tested. There were 21/353 AI neurons tested with tone stimuli (5.6%) and 31/353 neurons tested with noise stimuli (5.9%) that were classified as spatially tuned, but no neurons in CM for either tone or noise stimuli. It should be noted that these AI neurons still responded to most stimulus locations, but not above the 75% criterion. The same analysis based on 50% of the maximum response resulted in only five neurons that could be defined as spatially tuned.
The spatial receptive fields of these AI neurons are shown in Fig.12. These receptive fields were defined as the region with ≥75% of the peak response by interpolating between measured points using an inverse-square algorithm that weighted responses from ≤12° from the measured location but did not alter the measured values. Each panel represents the receptive fields defined for both monkeys L (thin lines) and M (thick lines) for each of the eight stimulus types. These neurons tended to represent contralateral space and there were several examples of spatial receptive fields that crossed the midline, but few neurons had receptive fields restricted to ipsilateral space (1/353 and 5/353 for tone and noise stimuli, respectively).
Because these neurons had receptive fields localized near the midline, if this class of neurons were topographically organized in AI, they should be observed in discrete regions. Figure13 shows the spatial receptive fields of neurons from monkey M tested with 5–10 kHz band-passed noise (Fig. 13 A) and their corresponding recording locations (Fig. 13 B). Although spatially tuned neurons were recorded at adjacent locations, they commonly had large differences in their receptive field locations (e.g., a and e), and overlapping receptive fields were recorded several millimeters apart in cortex (e.g., b and c). Similar results were noted for the other stimulus types investigated, and the recording location of all other spatially tuned neurons are shown in Fig. 13 B as ●. These results indicate that there is no topographic organization of the spatially tuned neurons, similar to the lack of topography noted for the spatially sensitive neurons.
The experiments described in the preceding text were conducted using five different speakers (Fig. 1), and therefore there is a remote possibility that either the single-neuron responses or the behavioral thresholds were influenced by speaker-specific characteristics. Measurements of the acoustic stimuli showed that the intensity at each frequency was within 3 dB across all five speakers, and the intensity of the stimuli was randomly varied between trials by this amount, although the phase of each frequency component showed greater variability. To verify that these phase differences were not an issue, an ANOVA analysis was conducted to determine if the neuronal response was affected by the speaker identity and not the location. The results showed that only seven neurons in AI and one neuron in CM had responses that could be attributed to the speaker identity out of the 942 comparisons (353 + 118 neurons, 2 stimulus conditions;P < 0.05). Each of these neurons had low activity levels (range: 0.5–3 spikes/stimulus at the location with the peak response), and it is likely that this contributed more to the statistical difference that any biological response to the specific speaker characteristics. In addition, only one of these neurons also was classified as spatially sensitive, and none was classified as spatially tuned. Thus we conclude that the spatial selectivity of these neurons that we observed was based on the spatial location of the stimuli and not a sensitivity to the characteristics of a particular speaker.
It is also possible that pinna movements made by the monkey could affect the neuronal activity. Because the monkey had no knowledge of the location of the next stimulus, it could not adopt a unique pinna orientation for any specific location. However, some of the neuronal variability we measured could occur if the monkeys adopted a wide range of pinna orientations throughout a session, for example, for neurons that were strongly modulated over the same intensity range that the different pinna orientations produced. For example, if the monkey made more pinna movements in sessions using a certain stimulus type, that could have increased the variability in the neuronal response and thereby altered the predictive power of thresholds described in the preceding text. Representative pinna positions measured in different sessions are shown for both monkeys in Fig.14. Regardless of the stimuli used the distributions of pinna positions were very similar, where the pinna were oriented within a 10–15° range on 80% of the trials either within a session or between sessions. We conclude that it is unlikely that the pinna orientation could have substantially affected our present results.
These experiments are the first in a series of studies from this laboratory to determine the neural correlates of sound-localization in the awake primate auditory cortex. The responses of single neurons in primary auditory cortex and the caudomedial field were recorded while monkeys localized sounds of different frequencies and bandwidths. Approximately 80% of the neurons in both AI and CM were modulated by the spatial location of either a tone or noise stimulus (or both) within this region of frontal space with the majority having greater responses to contralateral stimuli. Spatially sensitive neurons could not be distinguished by their activity levels or rate/level functions and were not topographically organized. Neurons in both cortical areas had responses that were consistent with the behavioral performance, and pooling the responses of the spatially sensitive neurons improved the correlation between the neuronal response and the behavioral performance in both cortical areas. The pooling also showed that relatively small populations of CM neurons were better able to predict behavioral performance than AI neurons. We first will discuss the similarities of these results to those described in previous studies and then relate how these results support the hypothesis that acoustic space is serially processed from AI to CM.
Relationship to previous studies
Previous studies in auditory cortical fields in both the cat and monkey have reported neurons that were modulated by the spatial location of the stimulus (e.g., monkey: Ahissar et al. 1992; Benson et al. 1981 cat: Barone et al. 1996; Brugge et al. 1996; Imig et al. 1990; Middlebrooks and Pettigrew 1981;Middlebrooks et al. 1998; Rajan et al. 1990a). Our results are consistent with those findings even though we tested a much more restricted region of acoustic space (60° compared with 140–360°). This was not unexpected as the majority of neurons reported to date had responses that were strongly modulated near the midline and therefore would be expected to be revealed even when restricting the area of investigation to the frontal region. It may be the case that significantly more spatially sensitive neurons would have been encountered, particularly in CM, if a wider range of spatial locations had been tested.
Analysis of the cortical location of spatially sensitive neurons was similar to that described previously in the cat, in that there was some local clustering (e.g., Clarey et al. 1994; Imig et al. 1990; Middlebrooks et al. 1988;Rajan et al. 1990b). However, this clustering was not complete as there were many instances in which spatially sensitive and spatially insensitive neurons were recorded at the same electrode.
Although there were many similarities between the neuronal responses in the behaving monkey compared with those in the anesthetized cat, there were also several differences. In the monkey we found many neurons in both AI and CM with firing rates higher than those normally seen in the cat. It was typical in our sample to have neurons with firing rates of 200 spikes/s after the onset of the stimulus, and most of our neurons had firing rates of 10 spikes/stimulus, which is consistent with previous reports from the awake monkey (Ahissar et al. 1992; Benson et al. 1981; Pfingst and O'Connor 1981). The difference in firing rates may reflect differences between species, the stimuli used, or between the anesthetized and unanesthetized states. Most previous studies in cat have employed broadband noise stimuli and studied neurons with relatively high characteristic frequencies, whereas we also used band-passed and tone stimuli. In our experience, few neurons with characteristic frequencies <10 kHz were well driven by noise stimuli. Previous studies also have indicated that the activity of auditory cortical neurons can be strongly modulated by the attentive and/or behavioral state of the animal (e.g., Benson et al. 1981; Hubel et al. 1959; Miller et al. 1972, 1980). These factors combined would likely account for most of the differences we have observed in this experiment compared with previous studies in the anesthetized cat.
A second difference was that the rate/level functions could not distinguish between the spatially sensitive and insensitive neurons in contrast to previous studies in the cat where the most directional neurons also had the most nonmonotonic rate/level functions (Clarey et al. 1994; Imig et al. 1990;Samson et al. 1993a,b). In the cat studies, a greater range of acoustic space was tested in azimuth, and the classification of the spatial responses was very different from ours. It is therefore possible that the high- and low-directionality units classified by Imig and colleagues would be classified as spatially sensitive by restricting the range of acoustic space to the frontal region.
Spatial processing by the primate auditory cortex
Across the population, the spatial sensitivity of AI and CM neurons was consistent with the sound-localization ability in the macaque (Brown et al. 1978, 1980, 1982; Heffner and Heffner 1990; May et al. 1986) as well as humans (see Middlebrooks and Green 1991). The neuronal responses showed better spatial selectivity for noise than tone stimuli in azimuth, intermediate spatial selectivity for noise stimuli in elevation, and very poor spatial selectivity to tone stimuli in elevation. Recently it has been suggested that the primate auditory cortex processes information in two serial and parallel processing streams (e.g., Kaas et al. 1999; Rauschecker 1998), similar to the ventral “what” and dorsal “where” processing streams in the visual cortex (Ungerleider and Mishkin 1982). There is good anatomic evidence in favor of this hypothesis, particularly in the projection patterns between the thalamus and the “core” (AI and R), “belt” (CM and lateral fields), and “parabelt” areas of auditory cortex as well as by the intracortical connections between these and different cortical areas (e.g., Hackett et al. 1998a,b; Jones et al. 1995; Kosaki et al. 1997; Molinari et al. 1995; Morel et al. 1993; Rauschecker et al. 1997; Romanski et al. 1999). The limited available electrophysiological evidence is also in favor of this hypothesis as neurons in the lateral fields respond better to complex stimuli such as band-passed noise and vocalizations (Rauschecker et al. 1995), and CM neurons depend on inputs from AI for responses to tone but not noise stimuli (Rauschecker et al. 1997). The broad frequency tuning of neurons in CM compared with AI (Merzenich and Brugge 1973; Recanzone et al. 2000) appears to make these neurons ideally suited for spatial processing, and it was suggested that CM form part of the “where” processing stream in auditory cortex (Rauschecker 1998).
There were several key observations from this study that support the hypothesis that auditory spatial information is serially processed between AI and CM. First, there were relatively few AI neurons that were spatially sensitive to both tone and noise stimuli, whereas nearly all of the CM neurons that were sensitive to the azimuth location of tones were also spatially sensitive to noise. AI neurons may process the different spatial clues independently, for example, the interaural difference cues would give rise to spatial sensitivity to tone stimuli, whereas spectral cues would give rise to spatial sensitivity to noise stimuli. CM neurons then could combine the inputs from these different AI neurons to give rise to spatial sensitivity to both tones and noise.
A second observation was the lack of topography of spatially sensitive neurons in AI. This is consistent with anatomic evidence indicating that CM neurons receive input from many sparsely located neurons across AI (e.g., Jones et al. 1995; Morel et al. 1993; Rauschecker et al. 1997) and therefore could integrate information from spatially sensitive neurons distributed across AI. This is supported further by the greater percentage of neuron pairs and triplets that were both or all spatially sensitive in CM compared with AI.
The third observation was that the spatial response properties of CM neurons, either individually or pooled, predicted the behavioral performance more accurately than AI neurons. If CM neurons integrate the outputs of the spatially sensitive AI neurons, then the predictions of threshold from the population of CM neurons should be close to those of the spatially sensitive AI neurons, which we observed. When the spatially sensitive neurons in CM were pooled, the ability of these neurons to predict the behavioral threshold was enhanced and was not significantly different from a 1:1 correspondence. Neurons integrating the output of the spatially sensitive neurons in CM therefore would have sufficient information to account for the behavioral performance. Thus the anatomical and electrophysiological data support the hypothesis that spatial information is processed in AI (as well as nonspatial information necessary for stimulus identification, or “what”), which then is integrated by CM neurons to form a better representation of acoustic space.
Direct comparison between the neuronal and behavioral data indicated that ∼5–20% of the neurons predicted thresholds lower than those observed experimentally. It is not uncommon to observe neuronal responses that are superior to the behavioral performance based on ideal observer models (e.g., Britten et al. 1992;Celebrini and Newsome 1994). Better agreement between neuronal and behavioral data can be achieved by pooling small populations of weakly correlated neurons that are not optimally tuned to the stimulus being discriminated (e.g., Shadlen and Newsome 1998; Shadlen et al. 1996). Because all neurons were tested using the same stimulus locations, we presumably recorded from neurons that may have been better tuned to other regions of acoustic space. Pooling the responses of all such neurons resulted in good predictions of the behavioral data, consistent with recent results showing better predictions of sound-localization behavior based on spike pattern information from A2 neurons in the cat (Furukawa et al. 2000).
Although these data are consistent with a serial processing between AI and CM, it is not conclusive and key issues remain to be resolved. For example, AI lesions strongly affect the responses of CM neurons to tone but not noise stimuli (Rauschecker et al. 1997). This indicates that CM receives auditory input independent from AI, consistent with the nonoverlapping thalamic projections (Molinari et al. 1995; Morel et al. 1993;Rauschecker et al. 1997). Thus one question is whether the spatial processing of tone and noise stimuli are independent of each other or whether the responses to noise stimuli independent of AI reflect a different aspect of acoustic processing. An alternative interpretation is that AI and CM neurons process spatial information independently.
Two limitations of this study are that we have used spike rates of individual neurons and grand averages of populations of neurons to correlate with the behavioral performance, and these neuronal responses were only tested at a single intensity. These techniques provide a good first estimate of encoding schemes that could be used in the representation of acoustic space. However, other types of encoding schemes are also likely, including the temporal pattern of the responses or the interactions between adjacent neurons or small populations of neurons (e.g., Ahissar et al. 1992;Gottlieb et al. 1989; Middlebrooks et al. 1994,1998; Vaadia and Abeles 1987; Xu et al. 1998, 1999). A recent study has shown that the spatial information contained in the temporal pattern of the responses in enhanced by pooling populations of neurons in anesthetized cats (Furukawa et al. 2000). Therefore incorporating additional response features beyond simple spike rates will likely improve the correlations in the awake monkey that we observed.
The results of this study indicate that the responses of both AI and CM neurons are modulated by the spatial location of stimuli presented at moderate intensity levels (65 dB SPL). Sound-localization performance is relatively stable over a wide range of stimulus intensities (Alshuler and Comalli 1975; Comalli and Alshuler 1976; Recanzone et al. 1998), yet the activity of many neurons in these cortical areas is strongly modulated by stimulus intensity (Recanzone et al. 2000). However, the correlations in this study were based on the relative response rates across stimulus locations and therefore should hold across a broad range of intensities as is seen in event-related potential recordings in behaving monkeys (Phan and Recanzone 2000). Intensity invariance also has been seen with respect to the temporal firing pattern of neurons in nonprimary auditory cortex of the cat (Furukawa et al. 2000; Middlebrooks et al. 1998) and likely also would hold true in the awake monkey.
Regardless of how the auditory cortex processes spatial information, this report describes the first evidence that the responses of CM neurons are consistent with the behaviorally measured ability to localize tone and noise stimuli in both azimuth and elevation. The results of this study lead us to believe that CM plays an integral role in auditory spatial processing in the primate and supports the hypothesis that auditory spatial information is processed serially in the primate cerebral cortex.
The authors thank K. H. Britten, L. A. Krubitzer, K. O. O'Connor, and M. L. Sutter for helpful comments on previous versions of this report, and the California Regional Primate Research Center for expert veterinary care.
This work was funded by National Institute on Deafness and Other Communication Disorders Grant DC-02371, The Klingenstein Fund, and the Sloan Foundation (all to G. H. Recanzone).
Address for reprint requests: G. H. Recanzone, Center for Neuroscience, 1544 Newton Ct., Davis, CA 95616.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Copyright © 2000 The American Physiological Society