|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1Nijmegen Institute for Cognition and Information, 2FC Donders Centre for Cognitive Neuroimaging. Radboud University Nijmegen, Nijmegen, The Netherlands
Submitted 18 August 2006; accepted in final form 22 November 2006
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
A critical aspect of this issue is the reference frame with respect to which object locations for actions are encoded. A reference frame is characterized by a coordinate system, which represents locations using a set of coordinate axes fixed relative to some origin, like the eyes, head, body, or earth. Obviously, in theoretical terms, spatial updating could work in any coordinate frame as long as the correct updating signals and computational operations are used (Medendorp et al. 2003b
). Adding to this notion, various studies have argued that the reference frame used to encode a spatial memory is not fixed but depends on several factors, including the sensory inputs, task constraints, the visual background, memory interval, and the cognitive context (Battaglia-Mayer et al. 2003
; Bridgeman et al. 1997
; Carrozzo et al. 2002
; Hayhoe et al. 2003
Snyder et al. 1998
; Van Pelt et al. 2005
). Within this view, psychophysical evidence obtained in neutral open-loop testing situations has suggested that the early feedforward mechanisms for internal spatial updating operate in gaze-centered coordinates (Baker et al. 2003
; Henriques et al. 1998
; Medendorp and Crawford 2002
). In further support of this evidence, many brain regions in parietal and frontal cortex have been shown to update their activity patterns relative to the new gaze direction after an eye movement has occurred (Batista et al. 1999
; Duhamel et al. 1992
; Medendorp et al. 2003a
; Merriam et al. 2003
; Sommer and Wurz 2002
).
It is important to point out though that most of the actual evidence for gaze-centered updating was obtained using simple eye rotations only with the head and body restrained, ignoring the fact that in natural situations our eyes also translate through space, as for example when we walk. When the body translates, correct updating in a gaze-centered frame seems computationally much more demanding because the required updating varies from object to object, depending nonlinearly on their depth and direction as in motion parallax (Li et al. 2005
; Medendorp et al. 2003b
). In this respect, updating for translational motion seems much simpler if object locations were stored in, say, Cartesian body-centered coordinates because then the required updating would be the same for each object: the opposite of the amount of body displacement (Medendorp et al. 1999
).
At present, it is unknown which reference frame is involved in the computations for the translational updating of remembered visual space. Here we address this question by characterizing the pattern of errors in manual reaching movements toward briefly flashed targets presented prior to a whole-body translation. Our goal is not to merely characterize a subjects ability to update spatial information for intervening translations. In fact, recent studies have already shown that humans and monkeys can look to remembered locations in near space, compensating for intervening eye translation induced by head or body motion (Israel et al. 1999
; Li et al. 2005
; Medendorp et al. 2003b
). However, the computational principles underlying the spatial constancy in this behavior, whether gaze-related or not, remain to be revealed.
We designed a novel experiment to discriminate between a gaze-dependent and gaze-independent model of visuospatial memory updating during translations. In our test, subjects fixate centrally at fixation point (FP) while a far or near target (Tf, Tn) is flashed onto the retinal periphery (Fig. 1, middle). Subjects then translate sideways (by making an active whole-body step displacement) while keeping their gaze at FP and subsequently reach to the remembered target location. The logic behind the test is the following. Suppose that the targets were visible at all times, including when the body translates sideways. Then parallax geometry dictates that targets in front and behind the eyes fixation point (FP) shift in opposite directions on the retinas. Thus if the brain is to simulate motion parallax also in the active updating of memorized targets (left, black arrows), it can be predicted that if the body translation is not correctly taken into account (Glasauer et al. 1994
; Medendorp et al. 1999
), the updated locations (gray arrows) will deviate from the actual locations, leading to reach errors (Ef, En) in opposite directions for targets in front of and behind the FP (hypothesis A: gaze-dependent updating). Alternatively, parallax geometry plays no role if the brain codes locations in a gaze-independent reference frame, e.g., in a body-fixed frame (right). If then translations are misjudged, the updated locations will also deviate from the actual locations, but with updating errors (as probed by the reach) in the same direction for all targets (hypothesis B: gaze-independent updating).
|
| METHODS |
|---|
|
|
|---|
Fifteen human subjects (4 female, 11 male, mean age of 26 ± 4 yr) were tested in four different task conditions as described in the following text. The main experiment involved 10 naïve subjects and the 2 authors. Each of the three additional control experiments tested five subjects (3 naïve). All subjects signed informed consent to participate in the experiment. All subjects were right-handed, and all were free of any sensory, perceptual, or motor disorders. All pointing movements were made using the right arm.
Experimental setup
Subjects were standing in a completely darkened room, within a designated area of 60 cm width, which we will refer to as the "translation zone." A U-shaped ridge of 6 cm height was attached to the floor indicating the outer borders of the translation zone to the left, right, and back of the subject. During the experiments, this ridge served as a reference for subjects to position their feet to accurately control their own positions and self-induced translations. This configuration led to lateral body translations with an amplitude of 30 ± 7 (SD) cm averaged over all subjects. Within subjects, positions and translations were reproduced with an accuracy >3 cm.
We used an OPTOTRAK 3020 digitizing and motion analysis system (Northern Digital) to record the position and orientation of various body parts in three dimensions (3-D). This system tracks the 3-D position of infrared-emitting diodes (ireds) with an accuracy >0.2 mm. We determined head position and orientation by means of four ireds attached to the eye tracking helmet worn by the subject (see following text). Prior to the experiment, we calibrated the locations of the eyes and ears with respect to the ireds on the helmet. During this calibration procedure, the subject faced the OPTOTRAK camera while wearing the helmet with three additional temporary ireds, one near the right auditory meatus and one on each closed eyelid. The three-dimensional (3-D) locations of these ireds, which uniquely defined the location of the right ear and both eyes relative to the helmet, were recorded together with the ireds on the helmet. With this information, we were able, during the subsequent experiment, to compute the positions of the eyes and ear in space on the basis of the helmet ireds alone. The actual location of each eye, defined as its rotation center, was assumed to be 1.3 cm behind its cornea. In a similar fashion, we calibrated the position of the tip of the right index finger relative to four ireds attached to the middle phalanx of this finger. We further used the OPTOTRAK system to record the position of the shoulder (acromion) as well as the positions of the stimulus targets. OPTOTRAK data were sampled at 125 Hz. The ired coordinates were transformed to a right-handed space-fixed coordinate system. The x-y plane was aligned with the subjects horizontal plane. The positive x axis was pointing forward, perpendicular to the subjects shoulder line; the positive y axis was pointing leftward along the shoulder line, seen from the subject; and the z axis was pointing upward. The position of the central light-emitting diode (LED) on the stimulus array (see following text) served as the origin of the coordinate system. The orientation of the head was determined with respect to a reference position adopted when the subject faced straight ahead. Orientation and location measurements were accurate to within 0.2° and 0.2 mm.
We used an Eyelink II eyetracker (SR Research) to record binocular eye movements. We ensured that its camera system, which was mounted to the helmet, remained stable on the head during the entire experiment. Stable recording of eye position was further warranted by measuring corneal reflections in combination with pupil tracking, which reduces the errors caused by any helmet slip and vibration. As a further precaution, subjects were also instructed to minimize speaking during the experiments. Eye movements were calibrated before the experiment by having subjects face straight ahead and fixate the stimulus LEDs two times each, in complete darkness, both when standing left and right within in the translation zone. Eye recordings were calibrated in the head-fixed coordinate system of the eye tracker. By combining the locations of the stimuli and the reconstructed locations of both eyes (using the helmet calibration data) as well as current head orientation, we computed the direction of the stimulus LEDs with respect to the subjects eyes in head-fixed coordinates. In this way, the eye-tracker data of both eyes could be matched to the corresponding vertical and horizontal stimulus directions and expressed as eye-in-head orientation signals. During the actual experiments, eye-in-space orientation was calculated by combining head orientation and calibrated eye-in-head orientation signals. The eye-calibration procedure resulted in a directional accuracy of the eye-in-head orientation >1.5°. Version and vergence positions were calculated from the left (L) and right (R) eye positions as (R + L)/2 and L R, respectively.
Two PCs controlled the experiment. A master PC was equipped with hardware for data acquisition of the OPTOTRAK and Eyelink measurements, as well as visual stimulus control, while a slave PC contained the hardware from the Eyelink system.
Stimuli
Nine red LEDs (luminance <20 mcd/m2) served as stimuli. They were attached to a frame in the shape of a cross that was mounted on a two-link robot arm. This robot arm, equipped with stepping motors (type Animatics SmartMotors; Servo Systems), could rapidly position the center of the frame to virtually any desired position within a hemisphere (radius: 1 m) centered at its base. The frame was positioned with an accuracy of >0.2 mm, as confirmed by OPTOTRAK recordings. During the experiment, the stimuli were presented at space-fixed locations, at eye level in the subjects transverse plane (Fig. 2A). The location of the central LED, which served as fixation point (FP), corresponded to our space-fixed coordinate systems origin, which was straight in front of the center of the translation zone, at a distance of
35 cm. Four other LEDs were lined up with the x axis of the coordinate system and served as visual targets for task conditions described in the following text. Two of these targets were behind the central LED (from the subjects perspective) at distances of 7 and 17 cm (T1, T2), and two were in front of the central LED at distances of 6 and 10 cm (T3, T4). Using this configuration, we ensured that the target flashes stimulated both retinas during the experiments, at equal intervals of
4°. We further positioned four other LEDS along the y axis of the coordinate system, at either side of the central light at 6 and 12 cm (not shown in Fig. 2A). These targets were used in catch trials to ensure that subjects did not simply make repeated stereotypic responses. Data for these catch targets were excluded from further analysis. We also made sure that subjects never saw the target configuration when the room lights were on by positioning it to an elevated level using the robot.
|
The experiments were designed to test between a gaze-dependent and gaze-independent model of visuospatial updating for translational motion. In our test, subjects were instructed to perform memory-guided reaching movements under two conditions, which will be referred to as "stationary" and "translation" tasks. The experimental paradigm of the translation task is illustrated in detail in Fig. 2A. Before the start of each trial, subjects positioned their feet on either the left or right end of the translation zone to certify a fixed starting position. A trial started with the onset of FP which was illuminated for 4.3 s and had to be fixated by the subject for its entire duration. At 1,500 ms after the onset of FP, a target for memory (here T1), closer or farther than FP, appeared in the visual periphery for 500 ms. Then a 2.3-s time interval followed in which subjects were instructed to either remain stationary (stationary task) or to make a sideward step to the opposite side of the translation zone (translation task) while still fixating FP. Then FP was extinguished, the stimulus frame was retracted, and 100 ms later an auditory signal cued the subject to conjointly look and reach at the remembered location of T, keeping the body and head still. Subjects had to hold that reaching position until another auditory signal was presented 3.6 s later. Then the next trial started. Targets were randomly chosen from the four locations. Each target location was tested 20 times for both starting positions, resulting in a total of 160 trials for each of the two tasks. Test trials were randomly interspersed with 32 catch trials. Subjects never received any tactile feedback during their reach. In all trials, subjects had to keep their head and body aligned in the straight ahead direction. In the translation trials, the starting position of a trial was the end position of the previous trial, whereas in the stationary task, the subject first moved to the other end within the translation zone before testing the next trial. Thus in the stationary task, response data were gathered at positions that also served either as initial or as final position in the translation task (F-test, P > 0.05). This allowed direct comparison of response behavior when updating was necessary (translation task) with behavior where no updating was needed (stationary task). For both test conditions, the total duration of each trial was 8.0 s.
During the reaching movement, visual feedback about hand position was provided by means of an LED attached to the fingertip. This way we tried to minimize the error attributable to an erroneous estimate of fingertip position during pointing (Beurze et al. 2006
). We also allowed subjects to look where they were reaching to eliminate contributions of errors occurring otherwise, i.e., when gaze would be off the reach location (Henriques et al. 2003
; see control experiment 2 in the following text).
The total experiment was divided into two sessions tested on different days. In each experimental session, half of the translation trials were tested first, followed by half of the stationary trials. Subjects performed blocks of 12 consecutive trials between which a brief rest was provided with the room lights on to prevent dark adaptation. During these periods, the stimulus frame was out of view. Each session lasted for
60 min. One subject was tested over three sessions. During the experiments, subjects never received feedback about their performance. Before the actual experiment began, subjects practiced a few blocks to become familiar with the two task conditions.
Control tasks
We also performed three control experiments in which we varied a number of task parameters to test their implications for updating behavior. All controls were performed with the same timing and stimulus durations as in the main experiment, unless indicated otherwise. First, we tested updating performance in the absence of visual feedback about fingertip position during the reaching movement (control 1: reaching without feedback). This clarified whether the results in the main experiment were not critically dependent on a visually monitored hand position during the reach. The next control experiment was inspired by the fact that reaching while looking where you reach is generally more accurate than reaching to a retinally peripheral location (Henriques et al. 2003
). Therefore in contrast with the main experiment, subjects performed the reaching movement in this task by keeping gaze fixed at the remembered location of FP (control 2: reaching without looking). This tested whether the results of the main experiments were not mainly driven by one of the two motor systems (eye vs. arm). The final control was designed to test the effect of a visual fixation point (FP) during the updating task (control 3: updating without FP). Therefore in this task, FP was turned off immediately after the target flash, and subjects were instructed to make their body translation by keeping their gaze fixed on remembered FP. Reaching was performed under visual feedback of the fingertip, which had to be fixated. As the eyes may diverge from the remembered FP during the translation in darkness (Medendorp et al. 2003b
), updating was tested for the two outermost targets only because these were most discriminative in terms of the models outlined in Fig. 1.
Data analysis
Data were analyzed off-line using Matlab (The Mathworks). We excluded trials in which subjects did not keep their eyes directed at FP within a 3° interval or made a saccade during target presentation. We also discarded trials in which the subject had not correctly followed other instructions of the paradigm, e.g., when stepping or reaching too early, or not making a step when this was required. Typically, 23 ± 11 trials (
7%) were discarded based on the arm and eye movement criteria. For each of the remaining trials, final reaching positions were selected manually at the time when the arm had the greatest degree of stability within the last 2 s of the response interval. For each trial, an average position was computed over a six-sample interval (48 ms) centered at this point in time. After categorizing the stationary and translation trials by starting position and translation direction, respectively, we computed the mean reach endpoint separately for each of the targets within these categories. Starting and final body positions were defined by the location of the center of the two eyes at the time of target presentation and reach response, respectively. The difference between these two positions determined the amplitude of the translation (step size). We tested between gaze-dependent and gaze-independent updating models by comparing the horizontal components of the updating errors of reaches toward the targets flashed in front of and behind FP in the translation trials. Because both variables are subject to natural variation and measurement error, a model 2 regression (also referred to as a major-axis regression) was used to determine their relationship, with slope and confidence limits estimated by the bootstrap method (Press et al. 1992
). We used the results of the stationary paradigm as a measure for errors attributable to perception or motor effects assuming that both contributed equally. A further 2-D vectorial analysis was performed to entail how the interaction between initial target position, translational motion, and reach response can be described in both gaze-dependent and gaze-independent coordinate frames (see later). Statistical tests were performed at the 0.05 level (P < 0.05).
Neural network model
To understand our findings in neurophysiological terms, we trained a simple recurrent three-layer Elman-type neural network using backpropagation to perform gaze-centered updating for both intervening rotations and translations of the eye. We used a similar type of network architecture as White and Snyder (2004)
. who modeled the updating process for (conjugate) eye rotations only. The predictions of this model will be discussed in the DISCUSSION. In the present model, the input layer of the network includes a map of neurons with similar spatial tuning properties as those observed in parietal region LIP: Gaussian-like receptive fields for the eye-centered direction of a stimulus and its relative depth from the plane of fixation (retinal disparity) (Gnadt and Mays 1995
). For simplicity, we used a 2-D horizontal-disparity map of 121 units (11 x 11 units; horizontal range 5050° disparity range 2525°). Each unit within the map had a 2-D Gaussian tuning curve, with a 10 x 5° horizontal-disparity receptive field (1/e2 width), so that receptive fields of units at neighboring locations overlap considerably. Stimulus direction and disparity input to the network were limited to <20 and <9°, respectively. The network also received four eye-position units: one pair of units represented binocular gaze (version); another pair encoded binocular depth (vergence). For each unit, the activity was linearly scaled within the range 1 to +1, corresponding to 40 to +40° version angle and 0 to +10° vergence angle, respectively. In each pair, the second unit had the opposite activity of the first (push-pull arrangement). Another two pairs of push-pull input units coded for version velocity between 250 and 250°/s and vergence velocity between 10 and 10°/s, respectively. Finally, two push-pull units encoded translation velocity of the eye between 250 and 250 cm/s; another unit pair represented the integrated velocity between 50 and 50 cm (translational path) of the eyes. The output layer was modeled corresponding to the input map. All units in the network were fully connected with each input unit connected to all hidden units and each hidden unit connected to all output units. The hidden layer had recurrent connections to enable the network to remember past events. Both the hidden layer units and the output neurons were characterized by a logarithmic sigmoid activation function of the form A(x) = 1/[1+exp(x)]. We simulated a trial as a series of 11 consecutive time steps with each step defined as a 200-ms interval. We tested the network with different numbers of units (25, 50, and 100) in the hidden layer. Each type of network was trained four times with random initial weights to validate reproducibility of behavior. The analysis presented in this paper was performed with 50 hidden units.
During training, targets were presented at one of five locations in space, at 25, 29, 35, 42, and 52 cm in front of the subject when viewing them from straight ahead (translation position 0). The other translational positions of the eyes at the start of the trial were 5, 10, 15, and 18 cm to the left or right from position 0. The binocular point of fixation was at the location of the 25-, 35-, or 52-cm target. The simulated translational motion was 0 (no translation), ±10, ±20, ±30, and ±36 cm. To simulate trial conditions with only rotational motion of the eyes (without translational motion), the fixation spot was moved by either 0, 5, 10, 15, or 18 cm to the left or right. Targets were presented for one time step, i.e., 200 ms, at the onset of a trial. Translation of the subject, or translation of binocular fixation point, which followed a bell-shaped velocity profile, was initiated 400 ms after the target disappeared, and lasted for 1 s. The networks output, the direction and disparity of the target in eye-centered coordinates, was read at the final time step of the trial. Trial types which moved the horizontal target direction >20° in the output map were excluded to minimize edge effects at the boundaries of the workspace. Together, this led to 1,129 different types of trials in the training set.
Network testing included all combinations that comprise the binocular fixation position at 33 cm, targets presented at 27, 35, or 48 cm, the translational offset of the eyes 16, 6, 0, 3, or 9 cm, translation motion of 25, 12, 8, 0, and 14 cm, and movements of the fixation point of 13, 7, 0, 4, and 15 cm. The network was built, trained, and tested using the Matlab Neural Network Toolbox with a training function that updates weight and bias values according to gradient descent momentum and an adaptive learning rate. For training, individual weights were initially set to random values between 0.1 and +0.1.
| RESULTS |
|---|
|
|
|---|
Task performance
Twelve subjects participated in the main experiment, outlined in Fig. 2A. Using the stationary trials, we first tested the ability of stationary subjects to look and reach to memorized locations of space-fixed targets flashed at different distances from the fixation point. Figure 2B, left, shows the performance of a typical subject over the time course of sixteen trials, either when standing at the leftward position (black traces) or at the rightward position (gray traces) within the translation zone with a target that was flashed 17 cm behind the eyes fixation point (T1, see Fig. 2A). The top panel depicts the horizontal component of the subjects body position during the entire trial. Both within and across trials, this position remained constant, as instructed, also during reaching at about 17 cm left or right of the center of the translation zone. The second panel displays binocular gaze direction superimposed on the average signals for ideal performance (dotted lines) that were computed on the basis of the Optotrak data. Binocular gaze showed steady fixation when the target was presented and during the memory interval (as required to meet the 3° accuracy range of the trial inclusion criteria, see METHODS), and small saccades at the time of pointing. These saccades direct the eyes toward the finger tip, which is to point at the remembered location of the stimulus flash. The third panel shows a similar pattern for binocular fixation depth (in degrees, as indicated by the vergence component of the eye positions). The decline in vergence during the reach seems to match the requirements (dotted lines) to look at the remembered location of the flash, which is farther away than the fixation point. Finally, the bottom panel demonstrates the horizontal position of the finger tip (in cm), showing that the subject reached fairly accurately to the remembered location of the stimulus flash, with errors <3 cm. These few trials are exemplary for the performance of all subjects in the stationary trials, showing that they can localize a nonfoveated flashed target fairly well.
The question is how well are these subjects able to localize these flashed targets when they have translated after viewing the flash? This was tested using the translation task. Recall that a whole-body translation effectively disturbs the spatial registry of the location of the flash relative to any reference frame attached to the body. Hence, in any egocentric reference frame, whether gaze-dependent or gaze-independent, the location of the reach goal after the translation is different from the location of the flash before the translation.
Figure 2B, right, shows the typical performance of the same subject over the time course of 16 translation trials in which the translation was either rightward (gray traces) or leftward (black traces). As in the stationary examples (left), the target for updating was T1, flashed 17 cm behind the fixation point. As instructed, the subject only began moving after the target had flashed, and reached his final position before FP offset (top). Kinematics of the self-induced translation were highly reproducible across trials, with a mean displacement of 32 ± 2 (SD) cm. During the translation, changes in binocular fixation direction and depth matched the geometrically required modulations (dotted lines) to keep gaze fixed at FP quite well (2nd and 3rd panels). In other words, the body translation had negligible influence on the ability to keep fixation at a lit fixation target. In accordance with the instructions, the changes of these signals during the reach period indicate a change in the binocular fixation point toward the remembered location of the target. The accuracy of the respective reaching movement reflects the accuracy of the spatial memory update as well as the perceptual and motor deficits involved. The reaching movements here show clearly larger errors than in the stationary condition, ranging up to
7 cm.
To demonstrate the differences in performance in both tasks more clearly, Fig. 3 compares the reach endpoints in the stationary (left) and the translation task (right), in separate top-view panels for the four targets, ordered by their location from FP, for one subject. In both conditions, a general underestimation of target distance seems to be present. In the stationary task, errors are only small with a slight dependence on the subjects body position. Undeniably, errors in the translation trials exceed those in the stationary trials irrespective of step direction. Both size and horizontal direction of this error seem to depend on the direction of the intervened translation and on the location of the target. For rightward translations, the subject reached too far to the right for the farthest target, whereas there was a leftward bias for the nearest target. The opposite pattern is observed for a leftward translation. There is also a tendency for errors to increase for the targets flashed at farther distances from the fixation point despite the same amount of intervened translation. Thus for this one subject, the pattern of errors in the translation trials seem to follow the prediction by the gaze-dependent updating model: pointing positions deviate in opposite directions for targets in front and behind the FP, with a nearly mirror-symmetric pattern of errors for leftward and rightward translations.
|
To analyze these findings quantitatively, we assumed that the reach errors in the static trials reflect a sensorimotor deficit, whereas the reach errors in the translation trials reflect sensorimotor deficits as well as deficits in the spatial memory update (see METHODS). Therefore to compute the latter, i.e., the updating errors, we subtracted the mean horizontal reach error observed in the static trials from the horizontal reach errors that occur in the translation trials, for each target separately. Figure 4A plots these horizontal updating errors for targets behind FP, versus the errors for their corresponding equiangular counterparts in front of FP, for each translation direction. Thus updating errors of target T1 were plotted versus the updating errors of target T4 and errors from target T2 with target T3. This pair-wise comparison was performed by picking, without return, the errors randomly from the respective trials, yielding a maximum of 80 data points. The gaze-dependent updating hypothesis predicts that these errors have equal size but opposite signs (Fig. 1). Accordingly, data points should fall in the even quadrants, ideally along the dashed line with slope 1. In contrast, the gaze-independent updating hypothesis predicts that these errors have equal size and signs, which would be indicated by data points along the positive diagonal (slope +1). Any other slope values, whether 0 (the data scatter around the x axis), infinity (the data scatter about the y axis) or any other value reflect a measure intermediate these two models. To deal with this in further analysis, we converted all slope values to a reference frame index (RFI) between 1 (perfect gaze-dependent coding) and +1 (perfect gaze-independent coding). For example, slopes of +2 and 2 correspond to a reference frame index of 0.5 and 0.5, respectively. Figure 4A presents the results of this analysis for the same subject as in Fig. 3, showing that the majority of the data points fall in the even quadrants. According to a model 2 regression, the best-fit line that characterized the direction of the data point clustering was closely directed along the line with slope 1. The reference frame index of this subject had a value of 0.93 ± 0.06 (mean ± SD), which is illustrative for a data distribution that best supports the gaze-dependent updating model. The best-fit lines of all 12 subjects are superimposed in Fig. 4B, generally indicating an orientation in the direction predicted by the gaze-dependent model. Figure 4C summarizes the corresponding reference frame indices (± SD) for all subjects (black bars), showing a clear bias toward the gaze-dependent model. Averaged across subjects, the reference frame index was 0.68 ± 0.23, which was significantly different from zero (t-test, P < 0.05), indicating that our data are most supportive for a gaze-centered coding and updating of spatial memory.
|
|
Although the data of most of our subjects lend support for the gaze-dependent updating hypothesis, it should be pointed out that this conclusion is based on an (1-D) analysis of the horizontal reach errors. Because subjects also make updating errors in depth (see Figs. 2), it is desirable to validate this conclusion in a 2-D analysis. Therefore we investigated how the position of the target before the translation (
i, estimated by the average response in the stationary task), the position of the same target after the translation (
f), the actual translational motion (
f
i) and reach response (
), expressed as Cartesian 2-D vectors, are related in the coordinate frames of the two updating models (see Fig. 5A). The two coordinate axes of the gaze-dependent model were chosen to be aligned with and orthogonal to the gaze line, respectively, with the origin at the center of the two eyes (cyclopean eye). At the same origin, the coordinate axes of the gaze-independent model were arranged to be aligned with and orthogonal to the shoulder line, respectively. Note that the same (space-fixed) target
i in this example is described by quite different vectors in each coordinate system. In both coordinate frames, the following updating relationship can be specified
![]() | (1) |
f
i represents the ideally required updating vector,
i the actual updating vector, fit parameter a the updating gain, and vector
the bias in the updating process. If a subject had a correct percept of
i, but did not account for the intervening translation, reach vector
would be equivalent to target vector
i, and hence the internal updating vector
i would equal zero, thus a = 0,
=
. In contrast, if translational updating were flawless, reach vector
would be identical to the new target vector
f, and thus a = 1, and
=
.
|
(see dashed gray vector in Fig. 5A). The results of this analysis are shown in Fig. 5B for one subject for the rightward translation trials. The actual average endpoints (left) are compared with those predicted by each of the two models on basis of the fit parameters of Eq. 1. Close scrutiny indicates that the predictions of gaze-dependent model (middle) better match the observed reach endpoints than the gaze-independent model (right). The gaze-dependent model seems to capture the observed pattern of opposite errors for targets behind and in front of the fixation point, whereas the gaze-independent model shows only a small rightward shift of each of the reach endpoints. On a population level (Fig. 5C), Eq. 1 gave a better description (higher correlation coefficients) of the updating errors when expressed in gaze-dependent coordinates than in gaze-independent coordinates (t-test, P < 0.01), which is consistent with the 1-D analysis described in the preceding text. Within individual subjects, the gaze-dependent model produced the best description for 9 of 12 subjects. The gaze-independent model performed slightly better in three subjects although its performance remained at a rather low level in two of them (see also Table 2).
|
, for both models for each subject separately. Across the population, the bias vector was not significantly different from a zero vector (t-test, P > 0.05 for all components) for both of the two models. In the gaze-dependent model, the updating gain, a, specifies how well the translational-depth geometry is taken into account in the updating of remembered visual space. Averaged across subjects, its value was 1.16 ± 0.15 (SD), which was significantly different from 1 (t-test, P < 0.05). This suggests that this model takes the systematic reach errors into account in fitting the data or, in other words, that subjects generally overestimated the amount of self-motion when updating targets in 3-D space during active whole-body translations. In contrast, the gaze-independent model yielded an average updating gain that was statistically not distinguishable from 1 (t-test, P = 0.62), which essentially indicates that this model has no provision to account for the systematic errors observed in the data.
Control experiments
To determine the robustness of these findings, we performed three control experiments (see METHODS). The task designs of these controls were kept identical to that of the main experiment as much as possible. In the analysis of these experiments, each performed on five subjects, we focused on the horizontal reaching errors, investigating the relationship between the errors for targets in front of FP and errors to targets behind FP. As in the preceding text (see Fig. 4), a negative relationship would confirm gaze-dependent coding (ideal slope 1); a positive relationship would be suggestive of a gaze-independent coding scheme (ideal slope +1). We first asked whether the same results would be obtained if the reaching movement toward the updated target locations were not accompanied by any visual feedback about hand position (control I: reaching without feedback). The results show that the absence of hand feedback does not alter our main conclusion. All subjects performing the task without hand feedback produced data consistent with the gaze-centered updating hypothesis (see Fig. 6A). This is reflected by the average reference frame index, which was 0.70 ± 0.18 and significantly different from a value of 0 (t-test, P > 0.05).
|
Finally we asked whether the visual FP, available during the main experiments, was a biasing factor for the gaze-centered updating hypothesis. To test this, we conducted an experiment in which subjects had to keep their eyes fixated on the remembered FP during the self-motion and then looked and reached to the remembered location of the flashed target (control 3: updating without FP, Fig. 6C). It is important to realize that in this situation, our test has less discriminative capabilities. Because of possible vergence drift caused by the absence of a visual FP during translation in this paradigm, updating vectors in gaze-coordinates will not be of equal size for targets in front and behind FP (compare Fig. 1). In spite of that, across the five subjects that participated here, three followed the gaze-dependent model. The RFIs in the other two subjects had values around zero. Averaged across subjects, we found a RFI of 0.48 ± 0.47a clear bias in favor of the gaze-dependent updating model.
Taken together, the results of all our experiments lead to the conclusion that the brain uses a gaze-dependent reference frame to store and update visuospatial memories during self-generated whole-body translations.
| DISCUSSION |
|---|
|
|
|---|
We will now list a number of observations that further support this conclusion. First, reaching errors were larger in translation trials (with intervening body translation) than in the stationary trials (without body translation), suggesting that the differences indeed arose during the updating of spatial information (Fig. 3). Second, a quantitative analysis of these errors revealed that they were opposite for targets in front of and behind FP (Fig. 4). Third, a two-dimensional vectorial analysis of the translational-depth geometry in the transversal plane showed that the interaction among target location, translational motion, and reaching response is much better described in a gaze-centered than in a gaze-independent coordinate system (Fig. 5). Fourth, the gaze-centered updating errors were quite robust and invariable among various task constraints (Fig. 6). More specifically, the same error pattern was found irrespective of whether the eyes and hand moved to the memorized target location or the hand alone. Neither did the pattern of errors change when subjects performed the reaching movement with or without visual feedback of hand position. Even the presence or absence of a visual fixation point during the translations was not essential for a gaze-centered description of updating errors.
Although our data provide support for the gaze-dependent model across subjects, it is important not to overstate this. The results are not perfect, and our conclusions follow from relatively small systematic errors. As a matter of fact, three of our subjects did not show support for the gaze-dependent hypothesis in all conditions and analyses (see Figs. 5C and 6C). It is also important to note that our test was based on relative simple geometry, whereas the brain may actually represent visual space in a more complex manner (Cuijpers et al. 2002
). Furthermore, we should emphasize that we have focused on only one important signal, the central representation of body translation, as an underlying basis for the updating errors, which is but one of a myriad of variables which might lead to errors. In this respect, further experiments are needed to isolate the various signals related to overall performance of the present task. Nevertheless, despite these reservations, we think that our behavioral tests provide evidence that the brain possesses a geometrically complete, dynamic map of remembered space, the spatial accuracy of which is maintained by internally simulating motion parallax during volitional translatory body movements.
It is true that even when you walk around normally in the environment, it is difficult to experience motion parallax even if you try (Palmer 1999
). And without doubt it is even harder to imagine motion parallax with locations of remembered objects or objects that are out of view. Nevertheless, this cannot be taken to imply that the neural mechanism for spatial coding cannot act by simulating the parallax geometry to maintain spatial constancy as we have shown here.
Recently various studies have shown that both human and non-human primates can adjust the amplitude of memory-guided eye movements after intervening translation, taking into account the amount of translation and distance of the memorized target (Israel and Berthoz 1989
; Li and Angelaki 2005
; Li et al. 2005
; Medendorp et al. 2003b
). None of these studies, however, explicitly assessed the exact nature of the representation of remembered visual space during these tasks. Here for the first time, we were able to establish that targets in such tasks are stored in a gaze-centered reference frame, an inference based on the assessment of the operational errors in the system.
Our evidence for gaze-centered updating during translational motion agrees well with recent studies showing gaze-centered updating for rotational motion (Baker et al. 2003
; Henriques et al. 1998
; Medendorp and Crawford 2002
; Pouget et al. 2002
). The first three showed that subjects overshoot the direction of a previously seen but foveally viewed target when reaching toward it after an intervening eye rotation. Interestingly, here we show a similar type of overshoot for translation-induced changes of gaze, corroborating these gaze-centered results. Baker et al. (2003)
investigated updating behavior during horizontal whole-body rotations using a memory-guided saccade task. Based on the assumption of noise propagation at various processing stages in the brain, they found their results most consistent with a gaze-centered representational system for storing the spatial locations of memorized objects.
Which signals are needed in the updating process? In the present study, the updating mechanism may have received information about the self-motion through efference copy and proprioceptive signals (available in the context of active motion), and by vestibular inputs (Klier et al. 2005
; Li and Angelaki 2005
; Li et al. 2005
; Medendorp et al. 2003b
; Van Pelt et al. 2005
). Li et al. (2005)
found updating during passive translation to be compromised after bilateral labyrinthectomy, attributing an important role of the vestibular system. Also, Israel and Berthoz (1989)
have provided evidence for spatial updating with the vestibular system as the main extraretinal source of motion-related information. Furthermore, in the present study, the changes in eye position to keep the eyes fixed at FP during the translationthe version and vergence eye movementsare essential for a well-functioning updating system. All of this information must be must be integrated at a central level within the brain and unified with retinal information about target direction and depth to mediate the computations for gaze-centered spatial updating, as outlined in detail in Medendorp et al. (2003b)
.
In line with our findings, many brain regions have been demonstrated to store and update target locations within an eye-fixed, gaze-centered reference frame (Batista et al. 1999
; Duhamel et al. 1992
; Gnadt and Andersen 1988
; Medendorp et al. 2003a
; Merriam et al. 2003
; Sommer and Wurtz 2002
). However, the majority of these studies have focused on directional updating of target location in the frontal parallel plane. For example, the lateral intraparietal area and superior colliculus have been shown to update its retinotopic map of target directions for each eye movement (Duhamel et al. 1992
; Walker et al. 1995
). On the other hand, it also known that the activity of LIP neurons is modulated by retinal disparity information, providing them with three-dimensional receptive fields (Genovesio and Ferraina 2004
; Gnadt and Mays 1995
). Moreover, Cumming and DeAngelis (2001)
indicated that the updating of target distance may be expressed by changes in retinal disparity representations.
To obtain further insights in the interactions between self-motion information and retinal signals at the level of the parietal cortex, we designed a simple recurrent neural network performing gaze-centered target updating during translations and rotations (see Fig. 7A and METHODS). The input to the network was a transient distributed representation of target direction and disparity in a 2-D retinotopic map (as a hill of activity) as well as a variety of extraretinal signals, including angular gaze position and velocity signals (version/vergence), and translational velocity and path signals of the eyes. The network was trained to store the memory of the target for successive time intervals and update its representation for any intervening rotational or translational eye motion.
|
), which closely follow the geometrically required changes for ideal updating over time (thin lines). Likewise, the network also incorporated the geometrically-required properties of updating targets in the same direction on the map, irrespective of their depth, when the eyes rotate only (not shown). Using 25 neurons in the hidden layer was already sufficient to learn the task acceptably, but performance improved for the 50 and 100 hidden units networks.
|
| GRANTS |
|---|
|
|
|---|
| ACKNOWLEDGMENTS |
|---|