Auditory Saccades From Different Eye Positions in the Monkey: Implications for Coordinate Transformations

Ryan R. Metzger, O'Dhaniel A. Mullette-Gillman, Abigail M. Underhill, Yale E. Cohen, Jennifer M. Groh


Auditory spatial information arises in a head-centered coordinate frame, whereas the saccade command signals generated by the superior colliculus (SC) are thought to specify target locations in an eye-centered frame. However, auditory activity in the SC appears to be neither head- nor eye-centered but in a reference frame that is intermediate between both of these reference frames. This neurophysiological finding suggests that auditory saccades might not fully compensate for changes in initial eye position. Here, we investigated whether the accuracy of saccades to sounds is affected by initial eye position in rhesus monkeys. We found that, on average, a 12° horizontal shift in initial eye position produced only a 0.6 to 1.6° horizontal shift in the endpoints of auditory saccades made to targets at a range of locations along the horizontal meridian. This shift was similar in size to the modest influence of eye position on visual saccades. This virtually complete compensation for initial eye position implies that auditory activity in the SC is read out in a manner that is appropriate for generating accurate saccades to sounds.


The intermediate and deep layers of the superior colliculus (SC) play an important role in guiding saccadic eye movements to the locations of both auditory and visual stimuli. Experiments involving visually guided saccades, microstimulation, and reversible inactivation have suggested that the SC provides a motor command signal specifying the location of the target with respect to the eyes (Lee et al. 1988; Mays and Sparks 1980; Robinson 1972; Schiller and Stryker 1972). However, it is not clear how auditory information is transformed into a motor command appropriate to move the eyes to look toward the sound-source location. Initially, auditory stimuli are coded in a head-centered frame of reference based on differences in the time of arrival and the level of a sound at the two ears (i.e., binaural cues) and on the frequency- and location-dependent filtering (i.e., monaural spectral cues) of the sound by the head and ears (for review, see Blauert 1997). This information is not sufficient to guide a saccadic eye movement to a sound-source location because the pattern of muscle contraction needed to bring the eyes to the location of a target depends on both the target's location and the initial position of the eyes in the orbits. Moreover, auditory sensory activity in the SC does not appear to be appropriate either: the SC's auditory receptive fields shift when the eyes move, but, on average, they shift only about half as far as the eyes move (Jay and Sparks 1984, 1987b). If these signals feed in to the brain stem saccadic pulse-step generator unaltered, then one would expect that saccades to sounds would fail to fully compensate for initial eye position.

Currently little is known about the effects of initial eye position on auditory saccades in monkeys. Although some studies have suggested that monkey auditory saccades are accurate (Jay and Sparks 1990), other studies have disagreed (Grunewald et al. 1999), and the contribution of initial eye position to auditory saccade accuracy has not been studied in quantitative detail (but see Whittington et al. 1981). The question has come to be of particular interest because eye position has been shown to influence localization behavior in humans (Lewald 1997, 1998; Lewald and Ehrenstein 1996a, b) as well as neural activity in many brain regions in monkeys (for review, see Salinas and Thier 2000; see also Groh et al. 2001; Werner-Reiss et al. 2003). The effects of eye position on neural activity in monkeys have been invoked as the underpinnings of the observed behavioral effects in humans. However, the behavior of humans and monkeys can differ: for example, humans show approximately complete compensation for initial eye position in making saccades to tactile stimuli, but monkeys show only partial compensation (Groh and Sparks 1996). Accordingly, we sought to determine whether monkeys compensate for initial eye position when making saccades to sounds.


Two adult female rhesus monkeys (monkeys C and X) were subjects for these experiments. All animal procedures were conducted in accordance with the principles of laboratory animal care of the National Institutes of Health (Publication No. 86–23, revised 1985) and were approved by the Institutional Animal Care and Use Committee at Dartmouth. The monkeys underwent a surgery using general anesthesia and aseptic techniques to implant a head post for restraining the head and a scleral eye coil for monitoring eye position (Judge et al. 1980; Robinson 1963).

The experiments were conducted in complete darkness in a single-walled sound-attenuation chamber (IAC) lined with sound-absorbent foam (3-in painted Sonex One) to reduce echoes. The monkeys were seated comfortably in a conventional plastic primate chair (Crist Instruments) with a standard neckplate. The supports for immobilizing the head were located behind the interaural axis. The monkeys faced an array of light-emitting diodes (LEDs) and loudspeakers (Audax TWO25V2) located 57 in away. The auditory stimulus was a band-limited white noise (500 Hz to 18 kHz, 10-ms onset ramp, variable duration) at 50 dB SPL (“A” weighting, Bruel and Kjaer, Model No. 2237 integrating sound level meter with Model No. 4137 condenser microphone) measured at the location normally occupied by the monkey's head. The input signal to each loudspeaker was adjusted to compensate for subtle differences between sound levels from each speaker (± 2 dB SPL). Eye position was sampled at 500 Hz.

Our training method capitalized on the innate tendency of monkeys to orient to unexpected sounds (Gifford et al. 2003). Monkeys were trained first to make visually guided saccades. Before and during this training period, they were never exposed to any sounds from the loudspeakers in the experimental apparatus. Once they were well accustomed to this visual task and to being in the silent experimental apparatus, we began training on a simple auditory saccade task. As reported by Gifford et al. (2003), monkeys oriented readily to the very first presentation of a sound, presumably because it was unexpected and they were startled by it. We were careful to reinforce these first few trials, before the monkeys habituated to the novel sound. In the first few days or weeks of training, and occasionally thereafter, we paired the auditory stimulus with a visual stimulus at the same location. This visual reinforcer came on after sound onset if the animal failed to look to the sound alone. Initially, we allowed the animal to look to this light-sound pairing to receive a reward but gradually we reduced the time allowable to look to the paired light-sound to force the animal to look to the sound alone. With this training regimen, we had little trouble eliciting auditory saccades from monkeys. Our methods contrast with those of Grunewald et al. (1999), who had difficulty training monkeys to perform auditory saccades when they were pretrained not to make auditory saccades, i.e., when they had been previously required to maintain fixation while sounds were presented. Before the data for this study were collected, the two monkeys included here had been performing auditory saccade tasks for the purpose of other unrelated neurophysiological experiments for 2 (monkey X) or 4 yr (monkey C).

During experimental sessions, monkeys performed an “overlap saccade task.” In this task, monkeys fixated an LED (the fixation LED) at one of three randomly presented locations (−12, 0, or 12° horizontally and 17° below the horizontal meridian). After a variable length of time (900–1,300 ms), a target (either visual or auditory) was randomly presented from one of nine locations across the horizontal meridian (−23 to 23° in 5–6° increments). The monkey maintained its gaze at the fixation LED for a variable length of time (600–900 ms) until the LED was turned off, cueing the monkey to saccade to the target to receive a reward. The target stayed on until after this saccade occurred. The sizeable vertical separation between the fixation and saccade targets ensured that a saccade was required for all fixation/target combinations and to avoid overlap with the target acceptance window. We used large target acceptance windows (16–17° in radius) for reasons explained below. With these acceptance windows, 96% of properly initiated trials (i.e., trials in which the monkeys maintained their gaze at the fixation LED until its offset) were rewarded in the experimental sessions. Data included here are from five experimental sessions per monkey with an average of 12 trials per condition (3 initial eye positions × 9 target locations × 2 target modalities) presented in each session. Visual and auditory trials were randomly interleaved during each session.

All properly initiated trials, rewarded and unrewarded, were included for data analysis provided a saccade (with any direction and amplitude) began >100 and <400 ms after the offset of the fixation LED. Saccade onset was defined as the time when eye-movement velocity exceeded 20°/s (velocity had to remain above this threshold for 20 ms to be considered a saccade), and offset was the time when velocity fell <20°/s (for a minimum of 10 ms; Fig. 1). With these inclusion criteria, 88% of all properly initiated trials were included in the final analysis. These generous inclusion criteria minimized experimental bias: limiting our analysis to saccades that landed within target-centered reward windows could potentially introduce a bias in favor of successful compensation for initial eye position. Undoubtedly, some trials in which the monkeys made no attempt to look to the sounds were included. These trials would tend to inflate the observed variability in saccade endpoints but would not introduce a bias either for or against an eye-position effect.

FIG. 1.

Sample auditory saccade traces and “main sequence” of visual and auditory saccades (Bahill et al. 1975). A and B: horizontal and vertical eye position (A) and speed (B) vs. time. The 1st 10 (randomly interleaved) trials from the central fixation position to the left-most auditory target from one session in monkey C are shown. The blue horizontal line in B indicates 20°/s, the criterion for identifying saccade onset and offset (see methods). Portions of the trace considered to be part of the saccade are shown in green. Traces are aligned on the saccade. The mean saccade latency (from the offset of the fixation point) for visual and auditory trials was 223 and 257 ms, respectively (data from both monkeys and all sessions combined). C: peak eye speed vs. saccade amplitude for visual (red) and auditory (black) saccades for both monkeys.


The effects of initial eye position on the horizontal accuracy of saccades to visual and auditory targets are shown in Figs. 2 and 3. In Fig. 2, each line represents the mean saccadic endpoints from one initial eye position (left, center, or right). Two important points can be taken from this figure. First, the fact that these three lines are largely superimposed suggests that the monkeys compensated appropriately for horizontal changes in initial eye position for both auditory and visual saccades. Second, even though virtually all trials were included in this analysis (see methods) and despite the large acceptance windows, the average location of the auditory saccade endpoints was quite close to the true locations of the targets. Figure 3 confirms this basic pattern by showing all of the raw saccade error data for both visual and auditory saccades from each of the three initial eye positions.

FIG. 2.

The mean horizontal endpoints of auditory (A and C) and visual (B and D) saccades as a function of sound location for 3 different initial eye positions (12° left, center, 12° right) for 2 monkeys. The bars indicate SE.

FIG. 3.

Horizontal and vertical saccade error for visual (left) and auditory (right) targets. Each row shows the data for saccades from 1 fixation position to all targets. ○, the means of the distributions for the 2 monkeys.

How complete was this compensation for initial eye position? The effects of initial eye position did reach statistical significance for both auditory and visual saccades (2-way ANOVA conducted separately for each monkey and target modality, with target location and initial eye position as the factors, P < 0.05). It was also significant using multiple linear regression (Table 1). However, it is known that saccades tend to fall short of an intended target by ∼10% of the distance to that target (for review, see Becker 1989). These effects of eye position could therefore reflect the normal hypometria of saccades.

View this table:

Results of multiple regression and average endpoint difference analysis

To determine whether the modest but significant effect of eye position that we observed was comparable in size to the predicted 10% undershoot, we calculated the average horizontal difference observed in the endpoints for a 12° horizontal shift in initial eye position. This average endpoint difference was calculated for each target location for the center versus left eye positions and the center versus right eye positions and then averaged together. The average endpoint differences for auditory saccades were 0.6° (monkey C) and 1.6° (monkey X) or 5–13% of the 12° difference in fixation position (Table 1). Shifts for visual endpoints were only slightly smaller: 0.4 and 0.9 or 3–8%, respectively. These effects were comparable to the normal 10% undershoot. We therefore consider the monkeys' compensation for changes in initial eye position to be virtually complete.

Although initial eye position did not vary vertically, we examined whether the horizontal shifts in initial eye position had an effect on the vertical accuracy of saccades to both the auditory and visual targets located along the horizontal meridian (0° vertically). While vertical localization of the visual targets was quite accurate (mean ± SE vertical endpoint across all initial eye positions for monkey C = 0.84 ± 0.05°, monkey X = 0.04 ± 0.06°), the monkeys differed in their vertical accuracy for auditory targets (mean vertical endpoint across all initial eye positions for monkey C = 7.60 ± 0.09°, monkey X = −2.90 ± 0.12°; Fig. 3). We note that these errors are smaller than the strong upward bias observed for saccades to the remembered locations of visual targets by Barton and Sparks (2001). The average vertical differences (shifts) observed in the endpoints for a 12° rightward shift in initial eye position were −0.08° for both visual and auditory targets for monkey C and −0.17 (visual) and −0.25° (auditory) for monkey X. We therefore consider the monkeys' vertical saccadic endpoints to be essentially unaffected by horizontal changes in initial eye position.


We have shown that monkeys can produce eye movements to sounds the endpoints of which are only minimally influenced by initial eye position. Our results are consistent with related studies of auditory saccades in humans (Frens and van Opstal 1994; Yao and Peck 1997; Zahn et al. 1979; see also Makous and Middlebrooks 1990) and extend these findings to monkeys (see also Whittington et al. 1981 for a brief consideration of this question). The modest influence of eye position that we observed is in some respects similar to that reported by Lewald and Ehrenstein for sound-location discrimination tasks in humans (Lewald 1997, 1998; Lewald and Ehrenstein 1996a, b). However, the eye position effect in our study was similar for visual and auditory saccades, whereas Lewald et al. reported differences between visual and auditory localization judgments. Our findings differ from those of Groh and Sparks, who showed a more sizeable effect of initial eye position on somatosensory saccades in monkeys, although not in humans (Groh and Sparks 1996).

Our findings are surprising in light of the failure of auditory receptive fields in the SC to fully compensate for changes in eye position, unlike visual receptive fields which do show full compensation (Jay and Sparks 1987b). How can the visual and auditory representations in the SC be different and yet produce similar saccades? There are four classes of answers to this question. First, the coordinate transformation of auditory signals from head- to eye-centered coordinates is completed before the SC is “read out,” so that at read-out time the visual and auditory representations are no longer in different coordinates. Second, the representations remain different, but the read-out algorithm is robust to these differences. Third, the representations are read out using different algorithms, and these different algorithms rectify the discrepancies between the visual and auditory representations. Fourth, the SC is not involved in generating auditory saccades after all. We will elaborate on these possibilities in more detail.

The first possibility is that the discrepancy between the visual and auditory reference frames is resolved before the SC is read out. In comparing the visual and auditory representations in the SC, Jay and Sparks (1987b) considered only the activity in a fixed window of time locked to the sensory stimulus well in advance of the movement. It is possible that for auditory saccades, the frame of reference becomes fully eye-centered by the time of the movement. The motor-related portion of the activity would thus be expected to show full compensation for changes in eye position. Jay and Sparks did not analyze the shifting of the motor fields. They did report that many saccade-related burst neurons were active for both visual and auditory saccades (see also Cohen and Andersen 2000; Jay and Sparks 1987a; Linden et al. 1999; Russo and Bruce 1994) for related findings in frontal eye fields and intraparietal cortex) but these data do not fully answer the question because the SC has sufficiently large movement fields that there would be considerable overlap between the visual and auditory movement fields even if they remain in different frames of reference.

The second possibility is that the visual and auditory representations remain different, but the mechanism by which the SC is read out is robust to these differences because it factors in both the site and level of activity. Previous work has shown that both the site of activation in the SC (Lee et al. 1988; Robinson 1972) as well as the level of activation at that site are important for determining the resulting saccade vector (Stanford et al. 1996). Variations in the level of auditory activity with eye position (an eye-position gain field) could therefore serve to complete the compensation for changes in eye position. Indeed, such eye-position gain fields have been demonstrated for visual saccade-related activity, (Van Opstal and Hepp 1995; Van Opstal et al. 1995), but would have been overlooked in the Jay and Sparks study, which focused on the borders of the receptive fields. Groh (2001) proposed a model for reading out representations like the SC that would factor in both the site and level of activity. In principle, an algorithm such as this could appropriately read the discrepant visual and auditory-saccade-related activity using the same mechanism. The same answer (same saccade vector) could be obtained either through a shift in the site of activity or a change in the level of activity or a combination of both.

Third, it is possible that the SC is simply read out differently for visual and auditory saccades. Either a head-centered signal or an eye-centered signal specifying target location can in principle be successfully combined with information about current eye position to generate the appropriate pattern of force to move and hold the eyes at the appropriate position. Indeed, Robinson's original model for the saccadic pulse-step generator called for a head-centered signal of target location as the input (Robinson 1975). More recent versions call for the input signal to be an eye-centered signal of target location (e.g., Jurgens et al. 1981; Scudder 1988; for review, see Van Opstal et al. 1995), but these models are motivated by the SC's representation of visual saccades, and, as we have already mentioned, the visual and auditory representations appear to be different.

Precedence for the idea that the SC might provide a saccade command signal that is not limited to strictly eye-centered information comes from work by van Opstal and colleagues (Van Opstal and Hepp 1995; Van Opstal et al. 1995). As noted in the preceding text, Van Opstal et al. have demonstrated that changes in initial eye position do affect the magnitude of the visual saccade-related burst in many SC neurons (Van Opstal et al. 1995). This finding illustrates that the visual saccade representation contains information about both the eye-centered location of the target and the current position of the eyes. Van Opstal et al. noted that retaining information about eye position would be valuable for ensuring that saccades comply with Listing's law and proposed that the SC provides a dual head- and eye-centered signal of target location. The pulse-step generator circuit receiving these inputs was proposed to reflect a blend of the basic components of the competing head- and eye-centered models. We speculate that for visual saccades, the eye-centered output of the SC might predominate, whereas for auditory saccades, the balance might be more equal. Because either circuit can work, the result would be successful compensation for initial eye position for both visual and auditory saccades. This view also provides an explanation for the fact that auditory saccades exhibit a different main sequence from visual saccades: if the pulse-step generator works differently for these two types of saccades, the dynamics of the saccades can therefore differ. On the other hand, it is difficult to rectify this dual-read-out view with the results of microstimulation experiments. Microstimulation presumably activates all the output pathways of the SC and produces results that are more consistent with the eye-centered read-out models (Nichols and Sparks 1995; Robinson 1972). Perhaps the two circuits mutually inhibit one another, and microstimulation preferentially activates the dominant visual read-out pathway.

The final possibility is that the SC is simply not involved in guiding auditory saccades. Under this scenario, the auditory activity observed via single-unit recording in the SC might never contribute to the saccade command at all. Instead, the frontal eye fields (FEF) would serve to generate auditory saccade command signals. The FEF is known to be active for auditory guided saccades, but the effects of initial eye position have not been explored quantitatively (Russo and Bruce 1994). Combined ablation of the FEF and SC eliminate visually guided saccades, whereas lesions of either the FEF or the SC alone do not (Schiller et al. 1980). Auditory saccades have not been studied after lesions of either structure. Arguing against the possibility that the FEF might bear exclusive responsibility for auditory saccades is a recent study by Hanes and Wurtz (2001). These authors simultaneously stimulated the FEF while inactivating the corresponding location in the SC using lidocaine. Under these circumstances, the FEF stimulation failed to evoke a saccade. This result suggests that the direct FEF input to the oculomotor brain stem circuits (downstream from the SC) is not normally sufficient to evoke a saccade. It is, however, possible that this direct FEF-brain stem pathway is recruited more successfully when the saccade target is auditory.

These are but a few potential ways of reconciling our behavioral findings with previous neurophysiological results, and undoubtedly other possibilities exist as well. Ultimately, additional experimental evidence will be needed to provide a full answer to the question. Not only will it be of interest to study the SC's representation of auditory space in greater detail, but also the assumption that the premotor circuitry is accessed in the same way for auditory and visual saccades needs to be tested. Recording the activity of brain stem excitatory burst neurons (EBNs) or other saccade-related neurons in the brain stem reticular formation and/or cerebellum during both visual and auditory saccades would shed important light on this question.

Since the seminal work of Jay and Sparks in the SC, many other regions of the primate brain have been found to combine information about eye position with their sensory responses to either visual or auditory stimuli (Andersen and Zipser 1988; Andersen and Mountcastle 1983; Andersen et al. 1985; Boussaoud and Bremmer 1999; Bremmer et al. 1997a, b, 1998, 1999; Cohen and Andersen 2000; Duhamel et al. 1997; Galletti et al. 1993; Groh et al. 2001; Guo and Li 1997; Jay and Sparks 1984 1987b; Mullette-Gillman et al. 2003; Russo and Bruce 1994; Squatrito and Maioli 1996; Stricanne et al. 1996; Trotter and Celebrini 1999; Werner-Reiss et al. 2003; Weyand and Malpeli 1993). In many of these neurons, the resulting activity conveys a blend of information about the external sensory stimulus and the eye-position signal. Although some of these representations appear to reflect a full transformation of sensory information into a novel frame of reference (e.g., Cohen and Andersen 2000), in other instances, the coordinate transformation seems to remain incomplete. Although there has been ample modeling work to suggest that these representations can in principle be read out successfully (Groh et al. 2001; Pouget and Sejnowski 1997; Van Opstal and Hepp 1995; Van Opstal et al. 1995), behavioral confirmation of this in non-human primates has been lacking. Our results provide support for this view.


This research was supported by the Alfred P. Sloan Foundation (J. M. Groh), McKnight Endowment Fund for Neuroscience (J. M. Groh), Whitehall Foundation (Y. E. Cohen and J. M. Groh), John Merck Scholars Program (J. M. Groh), Office of Naval Research Young Investigator Program (J. M. Groh), EJLB Foundation (J. M. Groh), National Institutes of Health Grant NS-17778-19 (Y. E. Cohen and J. M. Groh), NIH B/START and Shannon Awards (Y. E. Cohen), NIH Grant DC-05292 (R. R. Metzger), and The Nelson A. Rockefeller Center at Dartmouth (J. M. Groh).


We thank A. S. Clark, K. A. Kelly, S. G. Lisberger, D. L. Sparks, and U. Werner-Reiss for helpful comments.


  • The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.


View Abstract