The integration of visual and auditory events is thought to require a joint representation of visual and auditory space in a common reference frame. We investigated the coding of visual and auditory space in the lateral and medial intraparietal areas (LIP, MIP) as a candidate for such a representation. We recorded the activity of 275 neurons in LIP and MIP of two monkeys while they performed saccades to a row of visual and auditory targets from three different eye positions. We found 45% of these neurons to be modulated by the locations of visual targets, 19% by auditory targets, and 9% by both visual and auditory targets. The reference frame for both visual and auditory receptive fields ranged along a continuum between eye- and head-centered reference frames with ∼10% of auditory and 33% of visual neurons having receptive fields that were more consistent with an eye- than a head-centered frame of reference and 23 and 18% having receptive fields that were more consistent with a head- than an eye-centered frame of reference, leaving a large fraction of both visual and auditory response patterns inconsistent with both head- and eye-centered reference frames. The results were similar to the reference frame we have previously found for auditory stimuli in the inferior colliculus and core auditory cortex. The correspondence between the visual and auditory receptive fields of individual neurons was weak. Nevertheless, the visual and auditory responses were sufficiently well correlated that a simple one-layer network constructed to calculate target location from the activity of the neurons in our sample performed successfully for auditory targets even though the weights were fit based only on the visual responses. We interpret these results as suggesting that although the representations of space in areas LIP and MIP are not easily described within the conventional conceptual framework of reference frames, they nevertheless process visual and auditory spatial information in a similar fashion.
What we see affects how we interpret what we hear and vice versa. The capacity to combine information from our different senses helps us understand and respond appropriately to our environment. Yet merging visual and auditory information would be at best pointless and at worst counterproductive if the wrong elements of the visual and auditory environments are combined. For instance, speech comprehension is enhanced by successfully associating the speech sounds to the sight of moving lips—that is, by linking visual and auditory components that are causally related to one another. But linking the speaker's voice to an unrelated component of the visual scene would be useless or even detrimental to speech comprehension. Thus it is important for the nervous system to correctly determine which auditory and visual stimuli arise from the same event before integrating them into a common percept.
A simple way to determine which visual images should be associated with which sounds is by comparing their spatial locations. In general, visual and auditory stimuli that arise from the same location in space are likely to be generated by the same source. Determining whether a sight and sound are at the same location seems effortless, but actually poses a complex computational challenge for the brain because the spatial components of visual and auditory processing are so different from one another. The retina provides the brain with a map of the locations of visual stimuli with respect to the direction of gaze. In contrast, the auditory system computes the locations of auditory stimuli based on interaural timing and level differences as well as spectral cues (for reviews, see Cohen and Knudsen 1999; Kelly et al. 2002). These cues provide information concerning the position of the sound source with respect to the head and ears. The challenge for the brain lies in the fact that primates and other animals can move their eyes with respect to their heads. As a result, the relationship between a visual location on the retina and a particular difference in sound level or arrival time across the two ears is dynamic and depends on the current position of the eyes in their orbits.
In theory, the brain might solve this problem by creating a unified representation of visual and auditory space in a common frame of reference. The visual and auditory receptive fields of individual neurons ought to be in alignment with one another and to maintain this alignment despite movements of the eyes. In this study, we sought evidence for this type of representation in the lateral and medial banks of the intraparietal sulcus in the areas commonly referred to as LIP and MIP.
Areas LIP and MIP have been implicated as a potential locus for integrating visual and auditory signals. Neurons in this region are known to be sensitive to the locations of both visual and auditory stimuli. (LIP: Ben Hamed et al. 2001, 2002; Cohen et al. 2004; Gifford and Cohen 2004; Mazzoni et al. 1996b; Platt and Glimcher 1998; Stricanne et al. 1996; MIP: Cohen and Andersen 2000). Multimodal activation has also been found in the human parietal cortex (Bremmer et al. 2001; Bushara et al. 1999, 2003; Cusack et al. 2000; Deouell and Soroker 2000; Deouell et al. 2000; Griffiths et al. 1998; Warren et al. 2002).
Currently, our knowledge of the frame of reference of visual activity in LIP and MIP is incomplete. Several studies in area 7a or the banks of the intraparietal sulcus have investigated how neural responses to a stimulus at a given retinal position are affected by eye position (Andersen and Mountcastle 1983; Andersen and Zipser 1988; Andersen et al. 1985, 1990; Batista et al. 1999; see also DeSouza et al. 2000). These studies found that the responses to stimuli at fixed positions on the retina vary as a function of eye position (see also: Brotchie et al. 1995; Snyder et al. 1998). The fact that eye position influences the responses of many neurons suggests that the representation is not a pure, canonical eye-centered reference frame in which the position of the visual stimulus on the retina is the only determinant of neural activity. Moreover, as pointed out by Andersen and Mountcastle (1983), the observed influence of eye position on responses to fixed-retinal stimuli might mean that some neurons have head-centered receptive fields, a possibility that is best explored by resampling an overlapping range of stimulus positions in and around the receptive field at each fixation position (e.g., Batista et al. 1999).
Indeed, studies of auditory activity in LIP in which the receptive field was assessed at each eye position have suggested a continuum between head- and eye-centered reference frames (Stricanne et al. 1996). In MIP, auditory activity in a reach task has been found to be predominantly eye-centered (Cohen and Andersen 2000). Although all these studies suggest the tantalizing possibility that either visual or auditory signals or both undergo a coordinate transformation to facilitate multisensory integration within a common reference frame, the frame of reference of visual and auditory signals and possible alignment of receptive fields have never been investigated in the same population of neurons.
In this study, we examined these questions by testing the responses of LIP and MIP neurons to visual and auditory targets as a function of stimulus location and eye position. We report a mixture of eye-centered, head-centered, and more complex receptive fields for both visual and auditory targets. Although there were more predominantly eye-centered than head-centered response patterns for visual stimuli and more predominantly head-centered than eye-centered response patterns for auditory stimuli, the modal reference frame patterns was neither head- nor eye-centered for both modalities, and the visual and auditory distributions showed substantial overlap. Individual neurons responsive to both visual and auditory targets were comparatively rare. Nevertheless, the responses of parietal neurons to visual and auditory targets at the same location tended, on average, to be correlated. A simulation indicated that this degree of correlation was sufficient to allow the simplest possible neural network to successfully “read out” the location of both visual and auditory targets based on the visual and auditory responses of our recorded neurons. We interpret these results as suggesting that although the representation of space in areas LIP and MIP are not easily described within the conventional conceptual framework of reference frames, these areas nevertheless process visual- and auditory-spatial information in a similar fashion and could contribute to visual-auditory binding.
Animals and animal care
All procedures were conducted in accordance with the principles of laboratory animal care of the National Institutes of Health (publication No. 86–23, revised 1985) and were approved by the Institutional Animal Care and Use Committee at Dartmouth. Two rhesus macaques (Macaca mulatta) were used in this experiment. Neither monkey had participated in previous experiments.
The monkeys underwent an initial surgery under isoflurane anesthesia to implant a head post for restraining the head and a scleral eye coil for monitoring eye position (Judge et al. 1980; Robinson 1963). After behavioral training, a recording cylinder was implanted over the right lateral bank of the posterior parietal cortex using stereotaxic techniques (Grunewald et al. 1999; Platt and Glimcher 1998). The location of the cylinder over the intraparietal sulcus was verified with MRI scans (see additional details under Verification of recording locations).
All experimental and behavioral training sessions were conducted in complete darkness in a single-walled sound-attenuation chamber (IAC) lined with sound-absorbing foam (3-in painted SonexOne, Sonex). Stimulus presentation and data collection were controlled through Gramakln 2.0 software (from the laboratory of Dr. Paul Glimcher). Eye position was sampled at 500 Hz. EyeMove (written by Kathy Pearson, from the laboratory of Dr. David Sparks) and Matlab 5.3 (Mathworks) software were used to analyze the data.
Sensory targets were presented from a stimulus array that was 1.44 m in front of the monkey. The array contained nine speakers (Audax, Model TWO25V2) with a light-emitting diode (LED) attached to each speaker's face (Fig. 1A). The speakers were placed from 24° left to 24° right of the monkey in 6° increments at an elevation of 0°. Additional LEDs serving as fixation positions were located above and below the row of speaker-LED assemblies. These fixation LEDs were located 12° right, 0°, and 12° left at an elevation of ±18°. (The specific target and fixation positions used during recording or behavioral sessions are described under Recording strategy.) Auditory targets were band-pass white noise bursts (500 Hz to 18 kHz; rise time of 10 ms) at 50 dB ±2 dB SPL (A weighting, Bruel and Kjaer, Model 2237 integrating sounds level meter with Model 4137 condenser microphone). The luminance of each LED was ∼26.4 cd/m2.
In all experiments, the monkeys performed an overlap saccade task (Fig. 1) to auditory and visual targets (all conditions randomly interleaved). This task was used for assessing the reference frame of visual and auditory receptive fields. In some experiments, the monkeys also performed a separate block involving a memory-guided saccade task to visual targets (Fig. 1BC). The memory-guided saccade task helped identify areas LIP and MIP (see additional details under Verification of recording locations) (Gnadt and Andersen 1988; Snyder et al. 2000) but was not used further in the data analysis.
During the overlap task (Fig. 1B), 900–1,300 ms after fixating a visual stimulus, a sensory target (either auditory or visual) was presented. After a delay of 600–900 ms, the fixation light was extinguished and the monkey had 500 ms to shift its gaze to the location of the target.
During the memory-guided saccade task (Fig. 1C), 900–1,300 ms after fixating a visual stimulus, a second visual target was presented for 600–900 ms. Following offset of the target and an additional delay of 900–1,100 ms, the fixation stimulus extinguished, signaling the monkey to shift its gaze to the remembered target location. Monkeys received a juice or water reward for successfully completing either task.
Recording procedures and strategy
At the start of each recording session, a stainless-steel guide tube was advanced through the dura. Next, a varnish-coated tungsten electrode (FHC, ∼2 MΩ impedance) was extended into the brain with a hydraulic micropositioner (Narishige, model No. N-46017). Extracellular neural signals were amplified (Bak Electronics, model #No. MDA-4I), and action potentials from single neurons were isolated using a dual-window discriminator (Bak Electronics, model No. DDIS-I). The time of occurrence of each action potential was stored for off-line analysis.
While the recording electrode was advanced through the intraparietal sulcus, the monkey performed the overlap task. We recorded any neuron that we could isolate, regardless of whether the neuron appeared to be modulated by the overlap task. When more than one neuron was recorded in the same session, the recording sites were separated by ≥250 μm unless the neural waveforms clearly distinguished between individual neurons.
Once a neuron was isolated, the monkey participated in a prescreening series of overlap trials to determine qualitatively whether the neuron responded better when saccades originated above or below the target locations. The two fixation locations for this were either 18° above or below a limited set of three auditory and visual target locations [(azimuth = −12°, elevation = 0°), (0°, 0°), and (+12°, 0°)]; negative azimuthal values refer to locations contralateral to the recording site. In a subset of 12 neurons recorded early during the course of these experiments, the azimuthal locations of the fixation positions were −18, 0, and 18°.
Based on the results of this prescreening, we chose either the upper or the lower row of fixation locations for the performance of the rest of the experiment. We used three fixation locations and the full set of target locations as shown in Fig. 1A. This allowed us to test each parietal neuron's sensitivity to the location of auditory and visual targets and the reference frame in which it codes auditory- and visual-target location. The location of the fixation stimulus, the location of the sensory target, and the target modality varied randomly on a trial-by-trial basis. Data were collected as long as the neuron was well isolated and the monkey performed the task. On average, we collected 6.7 ± 1.4 (mean ± SD) successful trials per task condition (fixation location × target location × target modality).
Neural data were aligned relative to sensory-target onset and divided into different time periods. The baseline period was a 300-ms period before target onset, and the target period was a 450-ms period that began 50 ms after target onset. For these time periods, data were analyzed in terms of a neuron's firing rate: the number of action potentials divided by time-period duration.
Quantitative analyses of reference frame
To quantify the reference frame in which neurons code spatial information, we compared each neuron's responses when target locations were defined with respect to the head versus when target locations were defined with respect to the eyes. This comparison was quantified by comparing the dot product of the response functions aligned in a head-centered reference frame at different eye positions versus the dot product of the response functions in an eye-centered reference frame at different eye positions. The equation for this calculation was where , , and are the vectors of average responses of the neuron to a target at location i when the monkey's eyes were fixated at the left (l), right (r), or center (c). This calculation is equivalent to calculating an average correlation coefficient between the response functions and will hereafter be referred to as such. The correlation coefficient was calculated once with target locations defined with respect to the eyes (the eye-centered correlation coefficient) and once with target locations defined with respect to the head (the head-centered correlation coefficient). We only included those target locations that were present for all three fixation positions in both head- and eye-centered frames of reference for this analysis (n = 5 locations: −12, −6, 0, 6, and 12°). R̄ is the average response across all target conditions; subtracting this value serves to make the distribution of responses symmetric around a value of 0. The value of the correlation coefficient can range from −1 to 1, with a value of 1 indicating that the receptive fields as measured at the different eye positions showed perfect alignment in the reference frame used for the calculation. A value of 0 would indicate that the receptive fields at the different eye positions were unrelated to each other. A value of −1 would indicate that the response functions were perfectly anti-correlated with one another.
Verification of recording locations by magnetic resonance imaging and memory activity
The variance of this metric was calculated with a bootstrap analysis (100 iterations of 80% of data for each target location/eye-position combination). This bootstrap analysis allowed us to define a 95% confidence area centered on the mean of the bootstrap distribution.
Recording locations were confirmed by visualizing a microelectrode in the intraparietal sulcus of each monkey through magnetic resonance images (MRI). These images were obtained at the Dartmouth Brain Imaging Center using a GE 1.5 T scanner (3-dimensional (3-D) T1-weighted gradient echo pulse sequence with a 5-in receive-only surface coil). Figure 2 shows a reconstruction of the recording sites together with the MRI images from both monkeys. In monkey C, the electrode penetrations traversed the intraparietal sulcus from the medial to lateral banks. In monkey B, the electrode penetrations ran parallel to the sulcus primarily on the lateral bank. We did not record from the ventral intraparietal area (VIP), which includes the fundus of the intraparietal sulcus and extends a few millimeters up the banks (Bremmer et al. 2002; Colby et al. 1993). In short, our recording locations are in agreement with previously published reports of areas MIP and LIP (Andersen et al. 1990; Ben Hamed et al. 2001; Eskandar and Assad 1999, 2002; Grunewald et al. 1999; Platt and Glimcher 1998; Powell and Goldberg 2000; Shadlen and Newsome 2001; Snyder et al. 2000).
In addition to the MRI, the presence of neurons exhibiting high levels of activity during the delay period of the memory-guided saccade task was used to confirm the locations of LIP and MIP (Barash et al. 1991; Eskandar and Assad 1999; Gnadt and Andersen 1988; Grunewald et al. 1999; Linden et al. 1999; Mazzoni et al. 1996b; Platt and Glimcher 1998; Powell and Goldberg 2000; Shadlen and Newsome 2001; Snyder et al. 2000). Only neurons recorded within the perimeter of grid locations in which memory period activity had been identified were included in the sample; i.e., if delay period activity was identified in penetrations made from two grid holes and not from a grid hole between those two, data from all three grid holes were included. In the depth dimension, only neurons that were within 2 mm of at least one neuron with delay period activity (typically recorded in a different recording session) were included.
A total of 387 neurons were recorded, and 186 of these were screened for memory activity (2-way ANOVA with target location and time period, i.e., response vs. baseline, as the 2 factors, P < 0.05 for the main effect for time period or the interaction term). Eighty of these 186 neurons (43%) exhibited sustained activity during the delay interval. The perimeter defined by these locations yielded a final sample of 275 neurons (80 neurons with memory-related activity and 195 adjacent neurons either not tested for memory activity or tested and found not to have significant memory-related activity). The range of locations from which our final data set was obtained is shown in Fig. 2.
We conducted a number of tests on the data set to determine whether there was a trend in the results as a function of recording location. The results of these analyses, which are presented in results under Reference frame and recording location, indicated that there were no notable differences as a function of the location of the recorded neuron or monkey. We therefore pooled the data from all locations and monkeys for the analyses reported in the following text.
We recorded 275 neurons (122 neurons from monkey B and 153 from monkey C) from the banks of the intraparietal sulcus in areas MIP and LIP. The effect of eye position on the spatial receptive fields of the neurons was complex and ranged along a continuum from predominantly eye-centered to predominantly head-centered, including patterns that did not fit cleanly into an eye-head continuum. (Note: since the monkeys' heads were immobilized, head-, body- and world-centered reference frames were held fixed with respect to one another in this study. For simplicity, we will refer to these three possibilities as a head centered-reference frame. Future work would be needed to differentiate between these possibilities.)
Neural sensitivity to visual and auditory stimuli
We first tested the statistical significance of the responses to visual and auditory targets and eye position using ANOVA and t-test. In testing sensitivity to target location (ANOVA), we conducted the test two ways: with target location defined with respect to the head (Table 1 line B) and with respect to the eyes (Table 1, line C). Our goal was to identify responsive neurons for inclusion in subsequent analyses. These tests were not intended to determine reference frame on their own, as a neuron with a receptive field in one reference frame might well have a statistically significant effect of target location defined in the other reference frame as well, depending on the size of the receptive field in relation to the separation between the fixation positions. The t-test served as an alternative means of identifying responsive neurons; this test did not require that the neurons show any kind of spatial sensitivity.
The results are listed in Table 1. The proportion of neurons sensitive to visual targets totaled ∼72% when the results of the two ANOVAs and the t-test for simple responsiveness were combined (Table 1, line E); ∼36% of neurons were sensitive to the spatial locations of visual stimuli according to ANOVA (Table 1, lines B and C). The proportion of neurons sensitive to the locations of auditory targets was 12–13% (Table 1, lines B and C) or 51% when the ANOVAs and t-test were combined (Table 1, line E).
The proportion of neurons responsive to both visual and auditory targets also depended on the definition of visual and auditory responsiveness, and ranged from 5 to 43% (Table 1, lines B–E). Importantly, the proportion of neurons responsive to both visual and auditory targets was slightly but significantly greater than predicted by chance, i.e., the product of the individual proportions (χ2 test, P < 0.05, Table 1, lines A and E).
We also conducted several tests to verify the presence of statistically significant effects of eye position in our data set. Three tests were conducted. For eye-position sensitivity in the visual responses, we tested for main effects or interaction terms in a two-way ANOVA when visual target location was defined with respect to the eyes, i.e., in the frame of reference “native” to the early visual pathway (Table 1, line H). For eye-position sensitivity in the auditory responses, we did the same thing, only for this ANOVA, we defined target location with respect to the head (Table 1, line G). Finally, we assessed eye-position sensitivity during the baseline using a one-way ANOVA (Table 1, line I). According to these tests, the proportion of neurons sensitive to eye position ranged from 24 to 48% (Table 1, lines G–I).
Quantitative analysis of the alignment of receptive fields in eye- versus head-centered coordinates
To assess the alignment of the receptive fields in head- versus eye-centered coordinates, we included only neurons that showed sensitivity to target location (Table 1, line D). The reference frame of these neurons was quantified by comparing the alignment of the response functions in head- versus eye-centered coordinates (see Quantitative analyses of reference frame, Eq. 1). The results are illustrated in Fig. 3. In this figure, each data point indicates the average correlation coefficient between the neuron's response functions when the target locations were defined with respect to the eyes versus when the target locations were defined with respect to the head, with positive values indicating that the response functions were correlated, a value of zero indicating lack of correlation, and negative values indicating that the response functions were anticorrelated.
Data points below the diagonal line and to the right of 0 (dashed line) indicate neurons whose response functions aligned better in an eye-centered reference frame than in a head-centered reference frame—i.e., the eye-centered correlation coefficient of these neurons was a positive value and was greater than the head-centered correlation coefficient. Data points above the diagonal line and above 0 (dashed line) indicate neurons the response functions of which aligned better in a head-centered reference than an eye-centered reference frame. Data points that lie along the diagonal line indicate neurons in which the alignment of response functions was equivalent in both head- and eye-centered reference frames.
The results in this figure suggest that the visual reference frame of parietal neurons formed a continuum between eye- and head-centered reference frames. Specifically, 33% (n = 41/125) of spatially modulated visual neurons (Fig. 3A) had receptive fields that were more consistent with an eye-centered reference frame (i.e., the eye-centered correlation coefficient was larger than the head-centered correlation coefficient, and the 95% confidence interval was below the line of slope 1 and in positive territory along the x axis; red points), whereas 18% (n = 23/125) were more consistent with a head-centered reference frame (similar calculation; green points). Forty-nine percent of the parietal neurons (n = 61/125) had receptive fields that did not permit classification into either head- or eye-centered reference frame (gray points).
The reference frame of the auditory neurons (Fig. 3B) was qualitatively similar to that seen in the visual neurons in that it also spanned a range between predominantly head- and predominantly eye-centered coordinates with the majority of neurons not classified into either category. However, the relative proportions of head- and eye-centered neurons was opposite to that seen for the visual responses. Ten percent of these neurons (n = 5/52) had receptive fields that were more consistent with an eye-centered reference frame and 23% (n = 12/52) had receptive fields that were more consistent with a head-centered reference frame. Most neurons (n = 35/52; 67%) had receptive fields that could not be classified into either eye- or head-centered reference frames.
Figure 3C illustrates the reference frames of the subset of 24 neurons (8.7% of the total population of 275 neurons, Table 1, line D) whose firing rate was spatially modulated by both visual and auditory targets. The visual (red points) and auditory values (blue points) for each neuron are connected with lines. To quantify whether these neurons coded visual and auditory targets in the same reference frame, we converted the correlation coefficient values of each neuron for visual and auditory targets into an angle [tan−1 (head-centered correlation coefficient/eye-centered correlation coefficient)] and compared them (data not shown). This analysis indicated that, on average, the visual responses of parietal neurons were more eye-centered than their auditory responses (t-test, P < 0.05).
Additional results showing that the pattern of reference frame is robust to the particular method of analysis are presented in the supplementary material. In particular, use of a cross-correlation analysis of reference frame, subtraction of baseline firing rate, and use of a shorter spike counting window did not alter the basic pattern of results.1
Figures 4–12 provide a sense for the response patterns of the individual neurons that comprise the population. As suggested by the population analysis, some neurons had receptive fields that were clearly anchored to the retina, and eye position had little if any effect on their activity. Such neurons therefore appeared to encode visual-target locations in a predominantly eye-centered reference frame. An example neuron is shown in Fig. 4. This neuron responded robustly to visual targets located within the neuron's receptive field as can be seen from the rasters and peristimulus time histograms (Fig. 4). This neuron appears to code visual-target location in an eye-centered reference frame because the receptive fields mapped at the three different fixation points are well aligned when plotted as a function of eye-centered target location (Fig. 4BA, right) and are misaligned when graphed as a function of head-centered target location (Fig. 4B, left). The eye-centered correlation coefficient for this neuron is close to a value of 1, whereas the head-centered correlation coefficient for this neuron is ∼0 (inset).
Three additional neurons that coded visual targets in a predominantly eye-centered reference frame are shown in Fig. 5. For the neurons shown in Fig. 5, A and B, the row of target locations spanned all or most of the receptive field resulting in peaked (“tuned”) receptive fields, whereas for the neuron in C, only the medial edge of the receptive field was sampled. Like the neuron in Fig. 4, the receptive fields of these neurons are best aligned when plotted in an eye-centered reference frame (right panels). In contrast, when the receptive fields are plotted in a head-centered reference frame, they are not aligned. The eye-centered correlation coefficients of these neurons are all strongly positive, whereas the head-centered correlation coefficients range from negative (Fig. 5B) to positive (C; insets). The neurons in Fig. 5, B and C, also had small but statistically significant effects of eye position on their baseline firing rates.
Other neurons appeared to code visual targets in an eye-centered reference frame, but the magnitude of the responses was modulated by eye position. These individual neurons had receptive fields that maintained a constant eye-centered location, but the magnitude of the responses varied considerably with changes in eye position (Fig. 6). This pattern tended to produce a smaller eye-centered correlation coefficient value (insets) than was the case for neurons that did not show eye-position gain effects (Fig. 5), but the comparison between head- and eye-centered correlation coefficients continued to be informative regarding which reference frame produced better correspondence between the receptive fields: the eye-centered correlation coefficient was more positive than the head-centered correlation coefficient for both of the neurons shown in Fig. 6.
Three neurons that appeared to encode visual target locations in a head-centered reference frame are shown in Fig. 7. The neuron shown in Fig. 7 had a receptive-field peak located straight ahead with respect to the head (left) regardless of the monkey's fixation position. When target location was defined with respect to the eyes (right), the receptive field occupied different retinal positions depending on fixation position. The peak was located to the right with respect to the eyes when the eyes were fixating to the left (red trace) and to the left with respect to the eyes when the eyes were fixating to the right (blue traces). A similar pattern occurred for the neuron in Fig. 7B. The neuron illustrated in Fig. 7C, while having an unconventional receptive field shape, still encoded visual-target locations in a head-centered reference frame. (There were 4 neurons with these “bucket”-shaped visual receptive fields in our sample; 3 were predominantly head-centered and 1 was predominantly eye-centered). The head-centered correlation coefficients of the neurons in Fig. 7 were all greater than the eye-centered correlation coefficients (insets). The locations of these three individual neurons are illustrated on the MRI coronal sections shown in Fig. 2.
Even in neurons that coded visual target location in a predominantly head-centered reference frame, eye position could influence the gain of the neural responses. This eye-position influence was particularly evident in the baseline firing rate of some neurons but also occurred during the target period. For example, when the monkey shifted his gaze to the rightward fixation position, the baseline- and target-period firing rates of the neuron in Fig. 7B were elevated relative to the other fixation positions. Eye position also modulated the baseline- and target-period firing rates of the neuron in Fig. 7C.
Finally, Figs. 8 and 9 show examples of the population of neurons that encoded visual-target locations in ways that did not seem to fit the conceptual framework of reference frames: their response patterns were not particularly consistent with either head- or eye-centered coordinates. For the neuron in Fig. 8A, parts of the receptive fields (i.e., some of the edges) lined up better in a head-centered reference frame and other parts (i.e., the peaks) lined up better in an eye-centered reference frame, and the result was that the response functions were correlated with each other in both reference frames (inset). The neuron in Fig. 8B shows a variation on this theme: two of the response functions (left and center) aligned well in a head-centered reference frame, and a different combination of two response functions (right and left) aligned better in an eye-centered reference frame. In other cases, the receptive fields changed in shape and structure when the eyes moved. For example, the neuron illustrated in Fig. 8C had a well-defined receptive field when the eyes were directed straight ahead. However, when the eyes moved to the left, the neuron became largely insensitive to visual targets. The receptive field changed in shape when the eyes moved to the right. These receptive fields did not align in either head- or eye-centered coordinates (bottom left vs. bottom middle), and both the eye- and head-centered correlation coefficients for this neuron had a value close to 0 (bottom right), reflecting the lack of correlation between the receptive fields as measured at each fixation position. Similarly, the neuron in Fig. 8D had a receptive field that changed in shape and location when the eyes moved (Fig. 8D, bottom left) but not in such a way as to create a clearly head-centered receptive field (Fig. 8D, bottom middle).
The neurons in Fig. 9 illustrate examples of neurons that were excluded from the reference frame analysis due to failure to show statistically significant spatial sensitivity. The neuron in Fig. 9A was quite responsive to visual stimuli in both the contra- and ipsilateral hemifields, but the receptive fields appeared to be poorly aligned in both reference frames. Figure 9B shows another example neuron with very different response patterns at the three eye positions. Both the baseline and target-related firing rate of the neuron in Fig. 9B were dramatically affected by changes in eye position but in different ways. For the rightward fixation position, the baseline firing rate was quite high, but the activity was suppressed during visual-target presentation. In contrast for the leftward fixation position, the baseline firing rate was relatively low and the neuron's activity increased during visual-target presentation. That eye position can exert different effects on different aspects of neural responses has also been observed in previous studies (Andersen and Mountcastle 1983; Andersen et al. 1985).
Comparison between visual and auditory receptive fields and reference frame
For those neurons that responded to both visual and auditory targets, there was a rough correspondence between the visual and auditory response patterns, despite some discrepancy between the reference frames of their receptive fields. Example neurons are shown in Figs. 10 and 11. The neurons in Fig. 10, A and B, appear to have predominantly eye-centered visual receptive fields and prefer leftward (contralateral) visual-target locations. The neurons also responded well to leftward auditory targets, but the auditory receptive fields appeared to be more intermediate in reference frame. The opposite pattern of responses is seen in the data shown in Fig. 11A. This neuron responded well to rightward (ipsilateral) visual targets and rightward auditory targets. However, this neuron appeared to code visual targets in a slightly head-centered reference frame but auditory targets in a reference frame that was slightly, but not significantly, eye-centered.
Other neurons had more dissimilar auditory and visual receptive fields. For example, the neuron in Fig. 11B had an eye-centered visual receptive field located to the left. However, this neuron responded to auditory targets only when the monkey was fixating to the right, and then it responded only to ipsilateral auditory targets. The neuron failed the statistical test for spatial sensitivity to auditory targets.
As seen in Figs. 10 and 11A, the visual and auditory receptive fields of individual neurons can weakly correspond. Is this correspondence statistically significant at the level of the population? To examine this issue, we assessed the alignment of visual and auditory responses for each eye position by calculating the correlation coefficient between the mean visual and auditory responses of bimodal neurons, much as we did for assessing the alignment of the response functions within a modality for each reference frame. Only neurons that were spatially sensitivity in both modalities were included (Table 1, line D, n = 24). We calculated the correlation coefficient separately for the data from each eye position. The results are illustrated in Fig. 12, with A showing the mean of the three correlation coefficients (1 for each fixation position) and B showing the correlation coefficient values separately for each fixation position. The distribution of correlation values is skewed toward positive correlations, and this pattern is generally maintained but in a slightly weaker form when the results are viewed separately for each fixation position (B). The degree of correlation between visual and auditory receptive fields was significantly greater than zero (t-test, P < 0.05).
Can this population be read-out?
The weakness of the correlation between the visual and auditory responses of individual neurons raises the issue of whether activity from this neural population is capable of providing a reliable signal of target location, independent of modality. To examine this issue, we constructed a simple two-layer neural network, “trained” it based on the visual responses, and tested it with the auditory responses.
In this network, neurons from our sample provided input to a single “output” neuron. The goal of the simulation was to determine whether the output neuron could produce a linear signal of target location with respect to the eyes. This network is essentially a system of linear equations and the weights can in principle be determined using linear regression techniques (although for convenience we used the Matlab neural network toolbox).
To create a training set, we first calculated the mean and SD of each neuron's firing rate for each combination of target location and eye position. Only the visual trials were used. We then sampled normal distributions with the same means and SDs 100 times for each neuron times visual target location times eye position to produce a training set. The “weights” between each input neuron and the output neuron were calculated to optimize the output with this training set. Finally, we tested the network's output both with the real visual mean responses, which had contributed to the creation of the training set, and with the real auditory mean responses, which had not.
The results of this simulation are shown in Fig. 13. Figure 13A shows the network's output signal after the weights were fit based on the training set constructed from the visual responses. As expected, the network performed well, producing a signal of target location that varied linearly with the true target location. The amount of variance accounted for in the least squares regression lines shown here ranged from 45 to 90% (the precise values vary with each run of the simulation and depend on the fixation position). More importantly, the network's output also varied linearly with target location when the input signal was auditory (Fig. 13B), even though the weights were fit based solely on the visual responses (amount of variance accounted for by the illustrated regression lines ranged from 27 to 73%). The output is somewhat more variable but clearly scales with sound-source location. The auditory output does differ from the visual output by a gain factor, which is to be expected as we made no attempt to equate the intensity of the stimuli used in our experiment.
Reference frame and recording location
The lateral and medial banks of the parietal sulcus are thought to be functionally distinct areas (e.g., Cohen and Andersen 2000; Snyder et al. 1997). We therefore wondered if there was a relationship between a neuron's reference frame and its location within the intraparietal sulcus. In monkey B, a subset of the penetrations were limited to the lateral bank (the 1–2 most lateral locations in the 3 most anterior panels in the bottom row of Fig. 2), whereas the remaining penetrations likely included a mixture of neurons from both banks. Figure 14A shows the correlation coefficient values for the subset of neurons that were recorded on the lateral-only penetrations compared with the rest of the data from this monkey. The pattern of results for the neurons limited to the lateral bank is quite similar to that of the remainder of the data set. In monkey C, the penetration trajectories crossed the intraparietal sulcus from the medial bank to the lateral bank. Figure 14B shows the head- and eye-centered correlation coefficient values as a function of the recording depth, normalized to the midpoint of the depth of neurons recorded in that recording grid location. There is no apparent pattern to the reference frame as a function of depth. Finally, as memory activity has been used as a marker for both LIP and MIP, we compared the reference frame of the neurons that had memory activity versus those that either did not or were not tested for memory activity (Fig. 14C). Again there was no apparent difference between the data subdivided in this fashion. Finally, a comparison of Fig. 14, A, which presents the data from monkey B, and B, which illustrates the results from monkey C, reveals that there is no substantial difference between the results obtained in the two monkeys, even though the A-P extent of the sampled regions differed slightly (Fig. 2).
Comparison with the auditory pathway
We have previously tested the reference frame of auditory neurons in the inferior colliculus (IC) (Groh et al. 2001) and the core region of the auditory cortex (Werner-Reiss et al. 2003) and found that the auditory representations in these structures are neither head- nor eye-centered. In Fig. 15, we compare the visual and auditory representations in the intraparietal sulcus with the auditory representations in the IC and auditory cortex. The intraparietal data from this figure comes from Fig. 3, and the same methods were used to calculate correlation coefficient values for neurons in the IC and the auditory cortex. Next, the head- and eye-centered correlation coefficient values were converted to angles (tan−1(head-centered correlation coefficient/eye-centered correlation coefficient), and then rotated by 45° so that an angle of 0 corresponds to the line of slope one in Fig. 3. The distributions in this figure indicated that the representations of visual and auditory space in the IPS are broadly similar in reference frame to the representations of auditory space in these areas of the auditory pathway. The auditory values in these three brain areas did not differ from one another (ANOVA, P > 0.05) and showed considerable overlap with the visual values from the IPS. However, the visual responses were significantly more eye-centered than the auditory values in all three brain areas (ANOVA, P < 0.05 and post hoc t-test, P < 0.05).
Our results suggest that parietal neurons have receptive fields with a complex structure and that only a minority of individual neurons encode target locations in a reference frame that is predominantly eye- or head-centered; the majority of neurons have response patterns that reflect a combination of head- and eye-centered information and/or eye position. The visual and auditory reference frames were largely similar with the visual representation being slightly biased in favor of eye-centered coordinates compared with the auditory representation. The existence of a subset of neurons that appeared to use a predominantly head-centered frame of reference to code visual stimuli has not been previously suspected for this brain region, although previous experiments have not expressly ruled out this possibility. The visual and auditory responses of individual neurons were weakly but significantly correlated. A neural-network simulation suggested the information coded by these neurons may be sufficient to successfully read-out target location. In the following text, we compare our studies with previous parietal studies and discuss how auditory and visual signals are represented and transformed between different brain structures in the visual, auditory, and oculomotor pathways of the brain.
Comparison with previous studies in parietal cortex
In general, the proportion of visual neurons that we report is comparable to that seen in previous studies (36–68% in our study vs. 46–62% for Barash et al. 1991; Linden et al. 1999). However, the reference frame of these responses was surprising. It has been assumed that parietal neurons code visual targets in an eye-centered reference frame and that changes in eye-position modulate only the gain of the response to a particular visual-target location without changing the structure or location of a receptive field (Andersen and Mountcastle 1983; Andersen et al. 1985, 1990; Bremmer et al. 1997a, 1998; Cohen and Andersen 2002). In contrast, we identified a substantial population of neurons in areas LIP and MIP that code visual-target locations in a reference frame other than eye-centered. These neurons had receptive fields whose structure and location changed with eye position (Figs. 3, 7–9, and 11). The continuum of reference frames from head centered to eye centered is reminiscent of that seen in VIP (Duhamel et al. 1997) and area PO/V6 (Fattori et al. 1992; Galletti et al. 1993, 1995).
To the extent that our methods overlapped, our results are consistent with the results of previous studies in areas MIP, LIP, and 7a (Andersen and Mountcastle 1983; Andersen et al. 1985, 1990; Batista et al. 1999; Bremmer et al. 1997a). Although there have been several studies in which the retinal location of a stimulus was held constant and the effect of eye position was assessed (Andersen and Mountcastle 1983; Andersen et al. 1990; Bremmer et al. 1997a), these studies did not expressly test head- versus eye-centered reference frames by resampling a neuron's receptive field at each eye position. Only a few studies have reassessed the receptive field for more than one eye position (Andersen et al. 1985, 1990), and in these studies, the target locations sampled at different eye positions did not necessarily exist in both eye- and head-centered reference frames or the analyses conducted did not assess reference frame. For example, in some studies, the receptive field was sometimes sampled in slices orthogonal to the dimension along which the fixation position changed, so that the range of head-centered locations of the stimuli did not overlap at the different eye positions (e.g., Andersen et al. 1985). In another study, the receptive field was sampled with radial targets and the best direction (best radial target angle) of the neuron was assessed at different eye positions (Andersen et al. 1990). The best direction of a neuron with a head-centered receptive field might or might not change on the retina as the eyes move—it would depend on where the eyes moved to with respect to the receptive field. Changes in eye position along an axis that passed through the center of the receptive field would produce no change in best direction, whereas changes in eye position along an axis perpendicular to that would potentially produce some change in the best direction, depending on the distance from the receptive field center.
In short, our conclusions likely differ because we conducted a different experiment rather than because the underlying phenomenon is necessarily different. The neurons illustrated in Figs. 7, 8, or 9 would show an influence of eye position similar to that of previous studies if the responses at a single retinal location were plotted as a function of fixation position. (We note, though, that in many instances the shape of said eye-position influence would depend critically on the particular retinal location that was chosen.)
Nor are our results necessarily in conflict with previous studies employing a double-step task (Colby et al. 1995; Duhamel et al. 1992; Mazzoni et al. 1996a). In such experiments, a receptive field is first identified while the eyes are fixating in one location, and then a sequence of two targets are presented, and the animal makes saccades to each target location in the sequence. The sequence is designed so that the first saccade brings the remembered location of the second visual stimulus onto the retinal location occupied by the receptive field at the original eye position. The task is used to assess whether the receptive field moves in space with the eye movements. Many but not all neurons show this property, showing activity when the saccade to the first target brings the second target's location into the receptive field. Neurons that respond to the second target (either slightly in advance of the eye movement to the 1st target or synchronized with that eye movement) likely do not employ a head-centered frame of reference but could use either an eye-centered or complex coding format, a broad category that includes the majority of our neurons. Our results provide a potential explanation for the failure of some neurons to demonstrate this remapping: the receptive field may stay anchored to the head and thus will not move when the eyes move.
Our findings also bear on experiments comparing the responses of parietal cortical neurons under conditions of free gaze versus attentive fixation (Ben Hamed et al. 2002). The differences in the response pattern have been attributed to the cognitive state of the animal. Our results show that in many instances, the receptive field's structure changes depending on where the monkey is fixating (attentively). Thus the key variable may be eye position in addition to cognitive state. An experiment in which neural responses are compared when the monkey is required to actively fixate the same set of eye positions as it did under a free-gaze condition would be needed to tease apart these alternatives.
Previous studies have identified both similarities and differences between areas LIP and MIP. Both areas have visual activity, auditory activity, memory activity, saccade-related activity, and reach-related activity and show a similar sensitivity to eye position (Andersen and Mountcastle 1983; Andersen et al. 1998; Cohen and Andersen 2000; Cohen et al. 2002; Colby and Goldberg 1999; Snyder et al. 2000). The key difference between these two structures appears to be the relative proportions of reach- and saccade-related activity with MIP having relatively more reach- related activity and LIP having relatively more saccade-related activity. Given that we only used a saccade task, and included only neurons in our sample that were responsive in this task, it is not surprising that we did not observe any differences between these two areas and should not be construed as evidence that there are no differences between these brain areas when other tasks are employed.
AUDITORY AND VISUAL NEURONS.
The proportion of neurons that were spatially modulated by auditory targets is comparable to that reported in two recent studies (11–20% vs. 10–13% for Gifford and Cohen 2004; Linden et al. 1999). Stricanne et al. (1996) reported a much large proportion of auditory neurons (36%). This larger proportion may relate to differences in screening methods during recording sessions or differences in analysis methods.
The range in the reference frame of our auditory neurons is comparable to that reported in previous studies. Stricanne et al. (1996) reported that 44% of their neurons coded the remembered location of an auditory target in an eye-centered reference frame and 33% coded in a head-centered reference frame. The remaining proportion of neurons coded the remembered target location in a reference frame that was intermediate between eye- and head-centered. Cohen and Andersen (2000) reported that 42% of neurons in the parietal reach region (which overlaps with area MIP) code the remembered location of an auditory target in an eye-centered reference. Overall, our data are consistent with these parietal studies in that we find that the reference frame of auditory activity varies between head- and eye-centered with some neurons coding auditory-target locations in an eye-centered reference frame. Differences in the precise proportions of neurons labeled as being predominantly head- or eye-centered between studies are likely to be uninteresting differences caused by differences in the way in which reference frame is quantified and/or the placement of arbitrary category boundaries in what appears to be a continuous distribution of effects.
Auditory and visual neurons
The prevalence and properties of our population of auditory and visual neurons was similar to that of previous studies. Like Linden et al. (1999), we found relatively few bimodal neurons (5–33% in our study vs. 14% for Linden et al. 1999), and the auditory and visual receptive fields were weakly but positively correlated. However, our simulation (Fig. 13) indicated that this correlation may be adequate to produce an output that can convey target location regardless of the modality of that target.
Comparison of reference-frame information with other brain regions
Where do the original head- and eye-centered representations of auditory and visual target locations get transformed into the continuum of reference frames that we see in areas LIP and areas MIP? Also, is this transformation gradual or does it occur in one step? The answer to this question can be obtained by comparing the reference frame of neurons in different brain structures using similar, if not identical, stimuli, behavioral tasks, and analyses.
We previously tested the reference frame of auditory neurons in the IC (Groh et al. 2001), and the core region of the auditory cortex (Werner-Reiss et al. 2003). These brain regions are important test sites because together with the auditory cortical belt, parabelt, and frontal cortex they form a network thought to process auditory-spatial information (for reviews, see Cohen and Knudsen 1999; Rauschecker and Tian 2000; see also: Azuma and Suzuki 1984; Leinonen et al. 1980; Recanzone 2000; Russo and Bruce 1994; Vaadia et al. 1986). Overall, our data indicate that the representation of auditory-spatial information does not change between the IC and intraparietal sulcus (Fig. 15). This implies that the reference frame transformation begins at an early stage of the auditory pathway and is not substantially altered by later processing stages.
Visual signals are known to be influenced by eye position as early as the LGN and V1 (LGN: Lal and Friedlander 1990a,b; Weyand and Malpeli 1993; V1: Guo and Li 1997; Rosenbluth and Allman 2002; Trotter and Celebrini 1999). There is disagreement as to whether all receptive fields in V1 are perfectly stable on the retina as the eyes move, or if some might potentially shift slightly (Gur and Snodderly 1997; Motter and Poggio 1990). Eye position has been shown to affect neural responses at later stages of the visual pathway (Bremmer 2000; Bremmer et al. 1997b; Rosenbluth and Allman 2002), but the impact of changes in eye position on the location of the receptive field has not been examined. At later stages of processing, such as parietal areas (this study and Duhamel et al. 1997; Fattori et al. 1992; Galletti et al. 1993, 1995) and premotor areas (Gentilucci et al. 1983), mixtures of head- and eye-centered coding have been found. It may be possible, then, that the visual pathway employs mixtures of head- and eye-centered information at earlier stages in keeping with the pattern that has emerged from the auditory pathway.
It has been proposed that parietal cortex is part of a neural pathway that creates saccade command signals in response to visual and auditory events (Andersen et al. 1998; Cohen and Andersen 2002). Additional transformations of these signals may yet occur before they arrive at the extraocular muscles. For example, the representations in the intraparietal sulcus and the superior colliculus (SC) appear to be different. The SC is a brain stem oculomotor structure that receives input from parietal cortex (Fries 1984; Pare and Wurtz 1997) and is known to play an important role in guiding saccadic eye movements to the locations of both auditory and visual stimuli (for review, see Sparks and Groh 1995). Microstimulation and reversible activation studies suggest that the SC encodes saccades in an eye-centered frame of reference (Lee et al. 1988; Mays and Sparks 1980b; Robinson 1972; Schiller and Stryker 1972). Single-unit recordings have shown that the visual activity is predominantly eye-centered (Jay and Sparks 1987; Mays and Sparks 1980a; but see Van Opstal et al. 1995), whereas the auditory activity is intermediate between head- and eye-centered (Jay and Sparks 1984, 1987). This auditory representation, then, is qualitatively similar to the representation found in the parietal sulcus in the current study and past studies (Cohen and Andersen 2000; Schlack et al. 2003; Stricanne et al. 1996). Thus in a structure that is closely associated with brain stem motor generators, auditory-target location is still coded in a reference frame that is not purely eye-centered. Oddly, though, the SC's visual representation is different, both from the auditory representation in the SC itself and from the visual and auditory representations in parietal cortex. This discrepancy is puzzling, and additional studies of the visual and auditory motor-related activity in the SC and in the areas between the SC and the extraocular muscles will be needed to resolve these issues (for further discussion, see Metzger et al. 2004).
Our findings have important implications for views of visual-auditory integration. In broad strokes, our results support the hypothesis that across the population visual and auditory signals are coded in a generally similar reference frame and that individual bimodal neurons have similar receptive fields for both visual and auditory targets. However, the nature of this representation is surprising: the common reference frame is neither solely head- nor purely eye-centered but instead consists of a continuum of sensitivity to eye-position signals as well as head- and eye-centered target information. The degree of correspondence between visual and auditory receptive fields is weak in individual bimodal neurons, which are rather rare, but adequate at the level of the population to support a reasonably accurate read-out of target location independent of modality.
There are now many studies of coordinate transformations in the brain, nearly all of which find evidence for mixtures of different reference frames. It may be the case that reference frames that reflect only a single sensory variable exist only at very early sensory processing stages. The earliest points in the visual and auditory pathways where the effects of eye position have been studied are the lateral geniculate nucleus (Lal and Friedlander 1990a,b) and the IC (Groh et al. 2001). In both of these areas, eye position has been found to affect neural activity. This effect of eye position is likely to be mediated by descending feedback connections. As descending connections affect the very earliest possible levels of auditory processing (for reviews, see Giard et al. 2000; Simmons 2002) and also reach the retina (Brooke et al. 1965; Labandeira-Garcia et al. 1990; Noback and Mettler 1973), we may be only scratching the surface of eye-position effects in the brain.
Why does the brain mix head- and eye-centered coordinates rather than creating a pure code in one format or the other? The answer to this question may lie in the eventual output: motor commands. The motor output does not necessarily employ a pure reference frame. The pattern of force needed to generate a saccade, for example, depends on both the position of the target with respect to the head and the position of the target with respect to the eyes. Thus ultimate output of the system is a mixture of head- and eye-centered information which may be similar to the representation contained in the intraparietal sulcus. This is not to say that the function of the intraparietal sulcus and other brain areas that employ mixed reference frames is exclusively related to saccade programming—even if this were the case for parietal cortex (a possibility that is much debated), it surely would not hold true for the many other brain areas in which similar affects have been observed. Nevertheless, many perceptual processes ultimately do lead to the generation of eye movements, and it is possible that the constraints of the oculomotor system have led to the use of a common oculomotor language to permit communication between the visual and auditory systems in a similar fashion in both sensory- and motor-related brain regions.
J. M. Groh received support from Alfred P. Sloan Foundation, McKnight Endowment Fund for Neuroscience, John Merck Scholars Program, Office of Naval Research Young Investigator Program, EJLB Foundation, National Institute of Neurological Disorders and Stroke Grant NS-50942, National Science Foundation Grant 0415634, and The Nelson A. Rockefeller Center at Dartmouth. Y. E. Cohen and J. M. Groh received support from the Whitehall Foundation and National Institute of Neurological Disorders and Stroke Grant NS-17778. Y. E. Cohen also received support from National Institutes of Health B/START and Shannon Awards.
We thank A. Underhill, H. Farid, G. Gifford, P. Glimcher, J. Moran, K. K. Porter, P. Tse, and U. Werner-Reiss for insight and many helpful comments and ideas throughout this project. We thank P. Glimcher, S. Grafton, and H. Hughes for comments on the manuscript.
Present address of O. A. Mullette-Gillman: Center for Neural Science, New York University, New York, NY 10003.
↵1 The Supplementary Material for this article (3 figures) is available online at http://jn.physiology.org/cgi/content/full/00021.2005/DC1.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Copyright © 2005 by the American Physiological Society