|
|
||||||||
Department of Anatomy and Neurobiology, Washington University School of Medicine, St. Louis, Missouri 63110
Submitted 21 March 2003; accepted in final form 30 November 2003
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
A number of cortical areas store remembered spatial locations (Funahashi et al. 1989
, 1990
; Gnadt and Andersen 1988
). A simple means of storing world-fixed locations would be an explicit world-centered representation. However, many visuospatial representations are gaze-centered (e.g., Colby et al. 1995
; but see Duhamel et al. 1997
; Galletti et al. 1993
; Olson 2003
), and behavioral evidence supports the use of gazecentered representations to store world-fixed locations (Baker et al. 2003
). For a gaze-centered representation to encode information about a world-fixed location, the representation must be updated each time gaze shifts. There is evidence for such updating in many cortical and subcortical areas (Batista and Andersen 2001
; Bruce and Goldberg 1990; Duhamel et al. 1992
; Mazzoni et al. 1996
; Nakumura and Colby 2000, 2002; Walker et al. 1995
). For example, neurons in the lateral intraparietal (LIP) area in the posterior parietal cortex encode the goal location of an impending saccade (Gnadt and Andersen 1988
). If an intervening saccade to a new fixation point is performed between target presentation and the final saccade to the target, neurons in LIP update their activity to represent the oculocentric coordinates of the final goal relative to the new eye position (Duhamel et al. 1992
; Mazzoni et al. 1996
). As a result of this updating, neurons in LIP encode information about targets that are world-fixed without explicit world-centered encoding. Updating occurs not only in response to saccades, but to any gaze perturbation, including whole body rotation [with vestibuloocular reflex (VOR) suppression] and smooth pursuit eye movements (Baker et al. 2002
; Powell and Goldberg 1997
).
What mechanisms underlie the capacity of gaze-centered cortical areas to compensate for shifts in gaze? The receptive fields of LIP neurons are fixed with respect to the retina (Colby et al. 1995
), but their visual responsiveness (gain) can be modulated by changes in gaze position. This modulation is called a gaze position gain field (Andersen and Mountcastle 1983
). By virtue of combining eye position information with retinotopic information, gain fields may provide an implicit head-centered representation of visuospatial information (Zipser and Andersen 1988
). These gain fields could be used in connection with updating in a double-step saccade task (Xing and Andersen 2000a
).
Although updating is appropriate for world-fixed targets, it is inappropriate for gaze-fixed targets. Separate neural networks could be responsible for encoding spatial memories in world-fixed and gaze-fixed frames of reference. Alternatively, spatial memories could be computed by a single network that appropriately updates memories based on a particular reference frame.
To explore the computational basis of updating, we created a neural network model using a 3-layer recurrent architecture and trained it to flexibly update contingent on whether a target was world-fixed or gaze-fixed. Neural network models have been used to approximate the inputoutput relationships of neural circuits in the brain (reviewed in Zipser 1992
). The internal behavior of such models often mimics the properties of single neurons in the brain (Mitchell and Zipser 2001
; Xing and Andersen 2000a
,b
; Zipser and Andersen 1988
). We asked whether a simple neural network model was capable of performing spatial updating in a reference frame-dependent manner. We then compared the output of the model with the behavior of macaques performing a similar task (Baker et al. 2003
). Finally, we probed the internal representation of the model to make predictions about how neurons in the brain might implement the computations under consideration, focusing on the roles of gaze position signals, gaze velocity signals, and gaze position gain fields.
We found that a simple recurrent network could successfully compensate for or ignore gaze perturbation signals based on a contextual cue. When we compared the model to monkeys performing a similar task, we found that both animal and model were less precise in generating world-fixed output. Units in the hidden layer of the network performed a distributed computation: individual units were not specialized for performing only the gaze-fixed or world-fixed task, but in fact could discriminate well on both types of trials. The model could perform the updating task with either a gaze position or gaze velocity signal, but performed better with and would self-organize to select the gaze velocity signal. Finally, we found that the presence of gain fields in the model's hidden layer was dependent on the type of gaze signal used by the network. Gain fields were present only in networks that relied on position signals to perform the updating task.
| METHODS |
|---|
|
|
|---|
|
Task and training
The task of the network was to store and if necessary transform a pattern of activity representing the spatial location of a target. To correctly perform the task, the network had to either compensate for or ignore any changes in gaze that occurred during the storage interval, depending on the instruction provided by the 2 reference frame input units. The network output provides an eye-centered representation of the stored spatial location, which can be interpreted as a location in a salience map (Colby and Goldberg 1999
) or as a goal location for an upcoming saccade (Snyder et al. 2000
). For this particular network, these 2 descriptions are interchangeable.
A target was presented for one time step, or 100 ms, at the onset of each trial. A target could appear at 1 of 9 locations, evenly spaced within the central one third (40°) of the workspace. The reference frame cue, which determined whether gaze perturbations should be incorporated into the output or ignored, was present for the entire duration of the trial. Gaze position at the start of the trial was at one of 4 locations (15, 5, 5, or 15°). Memory-period gaze perturbations were 10 or 20°/s to the left or right, were initiated 300 ms after the target disappeared, and lasted for 500 ms. The network output (saccade amplitude or target location) was read out from the network 0 to 400 ms after the end of the gaze perturbation.
In an analogous task, 2 rhesus monkeys performed memory-guided saccades to world-fixed and gaze-fixed target locations after whole body rotations (VOR cancellation), smooth pursuit eye movements, or saccades (see Baker et al. 2003
). Slow gaze perturbations (rotations, pursuit) were 10°/s to the left or right from fixation points at 20, 10, 0, 10, or 20° and lasted for either 300 or 600 ms. Monkeys were cued 4001,200 ms after the end of the gaze perturbation to make a saccade to the remembered target location.
In the modeling experiments, the source of the gaze perturbation signal (e.g., rotation, pursuit, saccade) is relevant only inasmuch as different sources might have different properties. Three properties are of particular interest: 1) whether the source provides gaze position or gaze velocity information, 2) whether the gaze perturbation extends over a long or short period of time, and 3) whether the endpoint of the change is known at the start of the movement (e.g., saccades). Here we focus on gaze position and gaze velocity signals and briefly address duration of movement. We discuss the third property in the discussion.
The network was often grossly inaccurate when the saccadic goal lay on the boundaries of the workspace (defined by the output). To minimize these edge effects, trial types requiring saccades >20° in either reference frame were excluded. This left 96 remaining combinations of target location, eye position, and gaze perturbation vectors that constituted the training set.
The network was trained using the "backpropagation through time" algorithm (Rumelhart et al. 1986
; Williams and Zipser 1995
). The algorithm optimizes the connection weights between units to minimize the error produced at the output. An additional constraint was enforced during training: the hidden-to-output weights were constrained to values greater than 0.1. Forcing positive weights to the output makes the hidden units more likely to develop response fields similar to those of the output units (Mitchell and Zipser 2001
). Removing this constraint did not change our overall conclusions, however.
From our experience training monkeys on an identical task (Baker et al. 2003
), we suspected that training the network in stages might facilitate learning, rather than presenting the complete task and training set from the very start. This suspicion was confirmed by preliminary studies. Networks presented with the entire training set from the start of training converged slowly or converged to local minima (unpublished observations). Therefore we chose a graduated training regime analogous to that used for the monkeys.
The network was initially trained using a simple saccade task: no gaze perturbation and no memory period. Once this was mastered, a short (100-ms) memory period was introduced, and then gradually lengthened to 1,200 ms. Weights were adjusted after every cycle of the complete training set until an error threshold was reached. The threshold was set at a high value to avoid overtraining the network on the memory saccade task. Again, this is analogous to our experience in training monkeys; we do not dwell too long on any one stage, else achieving the next stage becomes more difficult (unpublished observations). With the network, each time the duration of the memory period was increased, smaller and fewer weight updates needed to be performed to reach the threshold. Hence, the learning rate started at
= 0.05 and was decreased inversely with the number of steps.
Once it could perform memory saccades, the network was introduced to the full task. Training had an equal probability of occurring on any one of the 4 postperturbation time steps or on the time step directly preceding the onset of gaze perturbation. The latter case promotes stability in the network output and enforces the condition that world-fixed and gaze-fixed trials should produce the same output if no perturbation has occurred. Training was not performed during the gaze perturbation to avoid enforcing a particular time course of output during the perturbation period (a post hoc comparison revealed slightly improved performance when training also occurred during the perturbation, but no other differences). Weights were adjusted after each complete cycle through the training set. Training proceeded for 5,000 cycles at a learning rate of
= 0.001. Subsequent training for 7,500 cycles occurred with the learning rate halved every 2,500 cycles. This process of simulated annealing helps avoid both local minima (by using an initially high learning rate) and limit cycles (by moving to a low learning rate) and generally helps optimize both the speed and the accuracy of training (Mitchell 1996
).
ROC analysis
A receiver operating characteristic (ROC) was calculated for each unit in the hidden layer in the manner of Britten et al. (1992
). The receiver discriminated between 2 conditions [whether a target was inside or outside of the response field (RF)] based on the activity of the unit. For a given criterion level of activity, the proportion of trials on which the inside-RF response exceeded the criterion was plotted against the proportion of trials on which the outside-RF response exceeded the criterion. Points were calculated for 3,334 criterion values uniformly distributed over the range of possible activity values ([0, 1]). The collection of points forms a stair-step function on the unit square from (0, 0) to (1, 1). The area under this curve (AUC) was calculated. The AUC describes how well an ideal observer could correctly discriminate between the inside-RF and outside-RF conditions based on the responses of the unit.
| RESULTS |
|---|
|
|
|---|
We constructed and trained a recurrent 3-layer neural network to flexibly perform a spatial updating task based on a contextual cue. The neural network (Fig. 1) received information from simulated retinal and extraretinal inputs. The retinal inputs encoded the position of a visual stimulus in one dimension of space. The extraretinal inputs provided signals proportional to gaze position and gaze velocity, as well as a binary reference frame cue (either world-fixed or gaze-fixed). The task required the network to store and later report the location of the target. Like the representations of many visual and oculomotor cortical areas (Colby et al. 1995
), both the network input and output are eye-centered. As a result, the representation of a world-fixed target must be modified to compensate for an intervening gaze perturbation, whereas the representation of a gaze-fixed target remains unchanged (Fig. 2).
|
|
![]() |
We computed the root-mean-squared (RMS) error for the all combinations of target locations and gaze perturbations for world-fixed and gaze-fixed trials. RMS error varied with the number of hidden units, but error on world-fixed trials was always greater than that on gaze-fixed trials (Fig. 4). Error as a function of the number of hidden units described a U-shaped function, with the least error occurring between 20 and 40 units. Networks with fewer than 5 hidden units did not converge on a solution within 12,500 cycles, and networks with >40 hidden units also performed increasingly poorly. The performance of networks with >40 hidden units was improved by training over more cycles or with larger learning rates, whereas the performance of networks with fewer than 40 hidden units was not significantly altered with these manipulations. For a network with 25 hidden units, the mean RMS error across 3 simulations was 1.93° for world-fixed targets and 1.19° for gaze-fixed targets. The data shown in Figs. 3 and 5, 6, 7 are taken from a network with 25 hidden units, but varying this number within the range of 2040 did not affect our conclusions.
|
|
|
|
Network inputs and outputs were chosen to be biologically plausible representations in an effort to model what may occur in monkey posterior parietal cortex (Duhamel et al. 1992
; Mazzoni et al. 1996
; Xing and Andersen 2000a
). To justify comparisons between model units and neurons, we first compared network output with animal behavior on a similar task. Monkeys accurately directed saccades to the remembered locations of world- and gaze-fixed targets after the direction of gaze was shifted during a memory period (Baker et al. 2003
). Gaze was perturbed by either a whole body rotation, smooth pursuit eye movement, or visually guided saccade. The precision of memory-guided saccades after the gaze perturbation differed depending on whether slow (rotation, pursuit) or visually guided saccadic perturbations were performed. Memory-guided saccades were less precise for world-fixed compared with gaze-fixed targets after slow gaze perturbations, but equally precise after saccadic gaze perturbations.
To compare network and monkey performance, we measured the accuracy of the network by comparing the coded location during the postperturbation interval to the correct target location for all gaze perturbations and starting positions (Fig. 5A, left panel). The network is accurate in directing saccades to the appropriate target but, like the animals (Fig. 5A, right panel), was hypometric for the most peripheral (20°) target locations. Note that decreased accuracy and decreased variability for the most peripheral target locations (±20°) are not at the edges of the input/output range (±60°).
To extract a measure of variability from the network, it is necessary to introduce noise. We introduced Gaussian white noise (SD
n = 0.05) into both the position and velocity inputs of a trained network and measured the variability of the resulting output (Fig. 5B, left panel) over 15 repetitions. The output variability ranged from 2 to 6° and, as with the monkey performing the task with slow gaze perturbations, was greater for world-fixed compared with gaze-fixed saccades. The variability of world-fixed saccades was smaller for the most peripheral targets in monkeys and the network. Thus the network reproduces the patterns of both accuracy and variability seen in the monkey when a whole body rotation or a smooth pursuit eye movement perturbs gaze, but not when gaze is perturbed by a saccade.
When the monkey's gaze was perturbed by a saccade, memory precision was equal for world-fixed and gaze-fixed trials. The difference in the patterns of behavior seen after saccades compared with smooth pursuit or rotations might be attributable to a difference in the duration the gaze change. However, when gaze was shifted in a single time step (100 ms), the pattern of model output remained the same (data not shown).
The difference in the patterns of behavior with saccades compared with pursuit or rotation might also be attributable to a difference in the form of the gaze signal. We therefore tested whether a gaze displacement signal, rather than a gaze position or velocity signal, could approximate the behavior seen in the animal with saccades. A displacement signal differs from a velocity signal in that it encodes a position error (the difference between the initial and goal positions) and not a rate of change; it differs from a position signal in that it encodes only relative position (referenced to the initial position) that resets at the end of each gaze perturbation, such that the goal position then becomes the new initial position when the perturbation is complete. When the network was trained with the displacement signal, variability of memory-guided saccades to world-fixed targets was greater than the variability of saccades to gaze-fixed targets (data not shown), exactly as in the simulated pursuit and rotation tasks. The result was the same regardless of whether the entire displacement occurred in a single time step or across 5 time steps. Thus training with the displacement signal or the position/velocity signal both failed to reproduce the pattern of animal behavior seen with saccades. We conclude that the network reproduces the pattern of behavior after a slow gaze perturbation, and shortening the time course of the gaze shift or providing a displacement signal does not affect the network output.
Hidden units contribute to both gaze-fixed and world-fixed transformations
Although the network is capable of tracking both world-fixed and gaze-fixed targets, the possibility exists that training creates 2 distinct populations of hidden layers units, each of which contributes exclusively to either the gaze-fixed or the world-fixed transformation. Alternatively, a single uniform population of hidden units could contribute equally to both transformations. To assess whether the network's processing was segregated or uniform, we examined the response properties of single units in the hidden layer.
Many hidden layer units updated their activity in response to the gaze perturbation on world-fixed trials, but not on gaze-fixed trials. An example of such a unit is shown in Fig. 6. When a target is flashed in the unit's response field (RF; the region of the retinal input layer that maximally activates the unit) in the context of the gaze-fixed cue, the unit maintains its activity throughout the trial (Fig. 6A). If the target is flashed outside the RF, the unit does not respond (Fig. 6B). On world-fixed trials in which the target is initially flashed inside the RF, the unit's activity decreases as the gaze perturbation moves the remembered target location outside the RF (Fig. 6C). Inversely, when the target is flashed outside the RF the unit increases its activity as the gaze perturbation brings the remembered target location into the RF (Fig. 6D).
To quantify the relative amounts of gaze-fixed and world-fixed information each unit conveys, we measured the ROC of an ideal observer of a hidden unit's activity. We introduced a fixed amount of noise to all of the inputs (
n = 0.25) and measured the activity of each unit for 15 repetitions of selected trial types. Trial types were selected to be those in which the target was flashed either inside or outside the RF, followed by a 20° gaze perturbation that brought the target outside or inside the RF, respectively. We constructed our receiver to discriminate whether a target was either inside or outside of the RF (see METHODS). We examined the receiver's performance by calculating the area under the ROC curve (AUC). Perfect discrimination corresponds to an AUC of 1. Chance performance corresponds to an AUC of 0.5. A comparison of the AUC for gaze-fixed versus world-fixed trials reveals that individual hidden units carry different amounts of information in the 2 tasks (Fig. 7).
If the hidden layer had segregated during training into 2 populations, we would expect to see 2 clusters of units in Fig. 7: one cluster in the lower right quadrant that discriminates well only on gaze-fixed trials (gaze-fixed AUC near 1, but world-fixed AUC below 0.5) and another cluster in the upper left quadrant that discriminates well only on world-fixed trials (world-fixed AUC near 1, but gaze-fixed AUC below 0.5). Instead, the population is unimodally distributed and performs better than chance for both the gaze-fixed and world-fixed tasks.
Network updates using gaze position, gaze velocity, or gaze displacement signals, but strongly prefers velocity
An important question is whether gaze position or gaze velocity signals are used to update spatial memories. This is a difficult question to address in the animal because one cannot easily decorrelate the 2 signals, nor selectively eliminate one input while leaving the other intact. However, with the model, we can train the network with position, velocity, or displacement inputs and compare their performance. We can also train a network with 2 inputs (e.g., both position and velocity) and then selectively lesion either set of inputs after training is complete. From the performance of the lesioned model we can determine whether the network selforganizes during training to rely in part or in whole on just one input or the other. We will refer to this as the network "preferring" or "choosing" one input over the other. We tested the preference of networks for position, velocity, or displacement inputs. To assess the relative preference for one input over another, we presented a maximum of 2 signals during training.
With only position information available during training (velocity inputs set to zero) the network converged on an acceptable solution. The network also converged when only velocity information or displacement information was available (position inputs set to zero). RMS error under world-fixed conditions was statistically identical for displacement and velocity networks (Student's t-test; P > 0.05, n = 3). In contrast, RMS error was 2-fold greater when the network was forced to rely on gaze position signals (Fig. 8A). Thus although the network can be trained to use any input, performance is superior with velocity or displacement information when compared with position information.
|
When trained with both displacement and velocity, the performance of the network decreased markedly when either the displacement or the velocity input was removed (Fig. 8C). Thus the network relied specifically on the combined input and did not prefer either displacement or velocity in this case. Similarly, networks trained with both displacement and position failed without the combined input (Fig. 8D), showing in this case no preference between position and displacement signals.
What causes the clear preference for velocity over position inputs? If the network prefers velocity because of an absolute constraint of the network architecture, then we would expect that it would consistently make the same choice regardless of the quality of the velocity information. Alternatively, if the preference reflects a relative advantage in using velocity compared with position, then changing the relative reliability of the 2 signals should shift the preference of the network.
We manipulated the reliability of gaze information by introducing variable amounts of noise to the position and velocity inputs. To select a suitable amount of noise, we added equal amounts of noise to both of these inputs and measured network performance (Fig. 8E, inset). It can be seen that the noise degrades performance on the world-fixed task in a log-linear fashion when
n is >0.01. Increasing noise on gaze-fixed trials has no effect on performance. A value of total noise
n = 0.1 (arrow) was selected for manipulation of the relative noise to ensure that such manipulation would have observable effects on network performance.
To assess how noise during training might affect the network's choice of gaze signal, we modulated the ratio of noise between velocity and position inputs [noise ratio (NR) =
vel/
pos]. The sum of the noise was fixed at 0.1 and the noise ratio was varied between 0.01 (
pos = 0.01;
vel = 0.09) and 9 (
pos = 0.09;
vel = 0.01). Figure 8E shows the network's performance as a function of the noise ratio. The intact network (triangles) performed better when the noise ratio favored a cleaner velocity signal. When position information was removed after training (hollow squares), the network was only modestly affected when trained with position noise greater than velocity noise (NR <1), but performance worsened as velocity noise increased. When velocity was removed after training (solid squares), performance was severely affected, but more so at low NR values. These data imply that the network's preference to use velocity rather than position information reflects a relative advantage of using velocity information and not an absolute constraint.
Gaze position gain fields are not present in a networks that rely on velocity or displacement
A number of visual and oculomotor-related areas in monkey posterior parietal cortex have been implicated in the updating of spatial information in response to gaze perturbations (Duhamel et al. 1992
; Mazzoni et al. 1996
; Nakamura and Colby 2000
, 2002
). One computational theory regarding how updating is accomplished relies on the occurrence of gaze position gain fields, that is, modulation of responses by eye position. A previous study showed that hidden units activities were modulated by gaze position in a network trained to update target position in a double-step saccade task (Xing and Andersen 2000a
). We used our network model to explore whether gaze position gain fields arise whenever retinotopic signals are updated to account for gaze perturbations, or if gaze position gain fields are specific to networks that rely on position information. We tested for gaze position gain fields in the hidden layer of networks trained using gaze position, gaze velocity, or both as input. After training, individual hidden units were tested for modulation of visual responses by gaze position. Gain fields were observed in our network when gaze position information was available. An example of a unit with a gain field is shown in Fig. 9A. The unit has a spatially tuned visual receptive field at 0° that remains roughly constant at gaze positions from 15 to +15° (left panel). However, the magnitude of the peak visual response varies linearly with gaze position (right panel).
|
We measured the number of units with gain fields in networks trained with displacement, velocity, position, or combined position and velocity gaze signals. A criterion value of 0.2%/degree (Fig. 9B, solid line) was chosen based on published data such as Brotchie et al. (1995
) and Snyder et al. (1998
); using different criterion values yielded similar results. When only gaze position information was available, 35% of units showed gain fields stronger than 0.2%/degree. When both position and velocity were available, 12% of units had gain fields stronger that 0.2%/degree. However, when networks were trained with velocity alone, only 4% of units displayed gaze position gain fields, despite the fact that these networks performed updating as well as or better than networks trained with only position information. In networks trained with the displacement signal, only 4% of hidden units showed a gain field modulation.
These results suggest that gain fields are associated specifically with the use of gaze position signals to update spatial information, rather than being a general feature of all networks that perform updating. We hypothesize that a gain field representation is present in networks that rely on position, but not in those that rely on velocity or displacement. The small number of gain fields in networks trained using both velocity and position (12%) suggests a small contribution from the position inputs.
We compared gain fields in the network to those of single neurons in LIP (Fig. 9C; neuronal data from Snyder et al. 1998
). Gain field modulations were measured for both head-on-body position (orientation of the head on the body) and head-in-world position (orientation of the head relative to an external reference). Many LIP neurons (34%, n = 59) showed significant modulations for gaze position referenced to the body (Student's t-test, P < 0.05), whereas only 10 of 109 cells (9%) were significantly modulated by gaze position referenced to the world. Despite the scarcity of gaze-in-the-world gain fields, spatial representations of world-fixed targets are nonetheless updated in LIP after passive whole body rotations when gaze rotates with the body (VOR suppression paradigm; Baker et al. 2002
; Powell and Goldberg 1997
). These data imply that gaze position information is not used for this task. Instead it seems probable that, like our network model, LIP may use gaze velocity signals to update world-fixed target locations in response to whole-body rotation.
| DISCUSSION |
|---|
|
|
|---|
Performance
Posterior parietal neurons, specifically those in LIP, have been shown to encode remembered spatial locations that are likely targets of an impending eye movement (Gnadt and Andersen 1995; Platt and Glimcher 1997
; Snyder et al. 1997
). These neurons use extraretinal signals to update spatial information to compensate for self-motion (Duhamel et al. 1992
; Mazzoni et al. 1996
; Powell and Goldberg 1997
). However, the appropriate combination of retinal and extraretinal signals depends on the reference frame of the remembered target. Most studies have used targets that remained fixed in the world. Features of stationary (i.e., world-fixed) objects constitute the vast majority of saccade targets; thus one could imagine that the responses observed in LIP neurons are specialized for encoding saccadic targets in a world-fixed frame of reference. However, humans and animals are also capable of directing saccades to gaze-fixed targets (Baker et al. 2003
; Israel et al. 1999
). Are the same neurons that perform spatial updating capable of suppressing their responses to self-motion signals in the gaze-fixed context? Or does the capacity to encode targets in a reference frame require a network of neurons specialized for encoding targets in that frame?
The model we describe reveals that a simple distributed network of neuron-like units in the hidden layer are capable of flexibly representing targets of saccades in either a world-fixed or gaze-fixed reference frame. Based on previous successful comparisons between hidden layer units and posterior parietal neurons (Xing and Andersen 2000a
,b
; Zipser and Andersen 1988
), we predict that neurons in LIP should also be capable of such flexible responses.
Droulez and Berthoz (1991
) demonstrated that world-fixed targets could be tracked entirely within a retinotopic coordinate system, without an explicit world-centered representation. We have further demonstrated that a simple neural network can track targets in both world-fixed and gaze-fixed frames of reference in an oculocentric coordinate system. The network effectively ignores gaze shift information when presented with a gaze-fixed target (Fig. 8), extending the finding of Droulez and Berthoz to show that not only is an explicit world-centered representation not required, but also that updating within the retinotopic coordinate system need not be obligatory.
Model versus behavior
In comparing our artificial neural network to real networks in the brain, it is important for the model to be validated by comparisons of network output to either neuronal or behavioral output. Output is less precise in the world-fixed task than in the gaze-fixed task for both animal (behavior: Baker et al. 2003
; neurons: Baker et al. 2002
) and model when updating occurs in response to a slow gaze perturbation. In addition, the neural network and the monkeys showed similar edge effects (Fig. 5). Both were hypometric in directing saccades to the most peripheral targets. At the same time, world-fixed saccades to the periphery were less variable than world-fixed saccades to central targets. The fact that the model reproduced aspects of the animals' behavior that were not explicitly reinforced by our training is evidence that the choice of model architecture and parameters was appropriate.
The results from these simulations help to illuminate behavioral data from the monkey. In vivo saccade variability is greater in world-fixed than in gaze-fixed trials. This same pattern is observed from the network when the only source of noise is the gaze signal. These data support the hypothesis that the pattern of behavioral variability reflects, in part, whether a noisy input signal is incorporated into or excluded from the output (Baker et al. 2003
). It is also possible that noise is exacerbated on world-fixed but not gaze-fixed trials as a result of imprecise computation by neurons (Shadlen and Newsome 1998
).
Although the network serves as a good model of animal behavior in the rotation and pursuit trials, the model did not mimic the behavior observed in saccade trials. On saccade trials, both gaze-fixed and world-fixed memory saccades were equally precise and more accurate than in rotation and pursuit (Baker et al. 2003
). Simply shortening the duration of the gaze shift or using a displacement signal as an input had no effect on the pattern of model output. The differences between saccadic and slow gaze shifts may be attributable to the different nature of the movements. At the initiation of a movement, the endpoint is known for a saccade, but not for a pursuit movement or whole-body rotation. In addition, the brain may be optimized for scanning the visual environment using saccades (Niemeier et al. 2003
). These differences may result in different mechanisms for integrating self-motion information into the stored representation of salient targets for slow versus saccadic gaze perturbations.
Hidden units share both transformations
When the network is trained, the output layer correctly codes the location of a remembered target in one of 2 reference frames. The activity of the output layer is created by projections from a single population of hidden units that contributes to both the gaze-fixed and world-fixed transformations (Fig. 7). The alternative, that 2 subpopulations of hidden units would emerge (each responsible for one of the transformations), would imply that flexible output requires gating one of 2 dedicated subpopulations of neurons. Instead, we found that a single population of units can encode the locations of remembered targets on both gaze-fixed and world-fixed trials (Fig. 7). We hypothesize that the neurons involved in updating can also flexibly track targets in both tasks.
The model prefers velocity
The neural network model prefers velocity information to perform the updating task. This is supported by improved network performance with velocity compared with position inputs (Fig. 8A) and by the preference for velocity exhibited by a network trained with both position and velocity inputs (Fig. 8B). The noise analysis shows that the choice of gaze velocity over gaze position is rather insensitive to the relative signal-to-noise ratios of the 2 inputs (Fig. 8E). These results indicate that velocity information may be simpler to incorporate than position information into a dynamically updating network. This may occur because the recurrent layer already functions as an integrator to maintain the memory of the transiently presented target. Transient velocity signals express the shift in gaze, which can be integrated by the network to compensate for gaze shifts. The position input, in contrast, does not directly encode shift in gaze. Instead, the gaze signal must be first differentiated to obtain the change in gaze required for updating. Alternatively, a representation of the original eye position could be stored in memory buffer, and this stored signal subtracted from the final gaze position at the end of the perturbation (Xing and Andersen 2000a
).
Gaze displacement signals are intermediate: although they directly encode the magnitude of the gaze shift, they must also be integrated, given that the network must store the gaze shift information after the displacement signal is reset. Networks did not show a preference between velocity and displacement or between displacement and position (Fig. 8, C and D).
Neural circuits involved in controlling saccades at the level of the superior colliculus and brain stem, like the cortical circuits involved in updating, also use information about ongoing gaze shifts. Initially, these low-level control circuits were modeled using only gaze position signals (Jürgens et al. 1981
; Robinson 1975
) or only gaze velocity (Scudder 1998). However, more recent work suggests that both position and velocity signals are required (Arai et al. 1999
) to explain the behavior observed when omnipause neurons are stimulated during a saccade (Keller et al. 1996
). By analogy, it would seem likely that the cortex would also use both position and velocity signals.
Explicit gaze position or velocity signals have not been observed in LIP (Bremmer et al. 1997
; Fukushima 1997
). However, there are many potential sources of these signals in the brain. Horizontal and vertical eye position information can be found in prepositus hypoglossi, the interstitial nucleus of Cajal, and the vestibular nuclei, for example (reviewed in Sparks 2002
). Gaze velocity signals resulting from smooth pursuit and from head rotation occur in gaze velocity Purkinje cells in the cerebellar flocculus, the dorsolateral pons, and vestibular nuclei. Smooth pursuit eye movement signals can also be found in a region next to the frontal eye fields (Tian and Lynch 1996
) and in area MST (Churchland and Lisberger 2002
; Newsome et al. 1988
). Gain fields in LIP encode both eye position in the head and head direction on the body (Snyder et al. 1998
). Gain fields in parietal area 7a encode gaze position in the world (Snyder et al. 1998
). Head direction in the world is commonly found in parietal and perihippocampal head-direction neurons in rats, but in primates the direct encoding of head direction in the world is less clear (Froehler and Duffy 2002
; Ono and Nishijo 1999
; Rolls 1999
). To summarize, neurons can be found in the primate that encode gaze position with respect to the head and body and that encode gaze velocity with respect to the head, body, and world. Whether position in the world is directly encoded in the brain remains an open question. All of the signals described can provide the necessary gaze perturbation information (gaze position or velocity) for updating.
Gaze position gain fields
A previous model has demonstrated that a distributed neural network can effectively update a memorized spatial location based on a changes in a position signal (Xing and Andersen 2000a
). In this network, gain fields for position were observed in the hidden layer. It has been proposed that gain fields may serve as a mechanism for updating, given that gain fields have been observed in many of the areas where spatial signals are updated.
Spatial memories in LIP can also be updated when the whole body is passively rotated and gaze rotates with the body (Baker et al. 2002
; Powell and Goldberg 1997
). However, a gain modulation by position in the world has not been observed (Snyder et al. 1998
). Neurons in LIP display gain fields for head orientation (angular position) on the body, but not for absolute head orientation in the world (Fig. 9B). The absence of LIP gain fields for position in the world suggests that an allocentric (world-referenced) position signal is not available to neurons in this area.
How can neurons in LIP update target locations in response to passive whole body rotation without an appropriate gaze position signal? The model we describe indicates that a distributed neural network can use a gaze velocity or displacement signal for this purpose. Such networks do not display gain fields for gaze position, indicating that units in the hidden layer are capable of integrating the velocity signal or storing the displacement signal to generate the appropriately updated target position.
Zipser and Andersen (1988
) demonstrated gaze position gain fields in hidden layer units that combine retinal and extraretinal information. Here, we show that gain fields are present only when the extraretinal information is in the form of a gaze position signal. When gaze velocity or gaze displacement is provided instead of gaze position, the network is still able to update the retinal input, but the hidden layer units do not show gaze position gain fields. Thus gain fields are not an inevitable consequence of combining retinal and extraretinal information, but instead may be a signature of the use of position information.
| ACKNOWLEDGMENTS |
|---|
|
|
|---|
GRANTS
This work was supported by the EJLB Foundation, the McDonnell Center for Higher Brain Function, and the National Eye Institute. R. White was supported by National Institutes of Health Grants GM-07200 and EY-13360.
| FOOTNOTES |
|---|
Address for reprint requests and other correspondence: L. H. Snyder, Department of Anatomy and Neurobiology, Box 8108, Washington University School of Medicine, 660 S. Euclid Ave., St. Louis, MO 63110 (E-mail: larry{at}eye-hand.wustl.edu).
| REFERENCES |
|---|
|
|
|---|
Andersen RA and Mountcastle VB. The influence of the angle of gaze upon the excitability of the light-sensitive neurons of the posterior parietal cortex. J Neurosci 3: 532548, 1983.[Abstract]
Arai K, Das S, Keller EL, and Aiyoshi E. A distributed model of the saccadic system: simulations of temporally perturbed saccades using position and velocity feedback. Neural Networks 12: 13591375, 1999.[CrossRef][ISI][Medline]
Baker JT, Harper T, and Snyder LH. Spatial memory following shifts of gaze. I. Saccades to memorized world-fixed and gaze-fixed targets. J Neurophysiol 89: 25642576, 2003.
Baker JT, White RL, and Snyder LH. Reference frames and spatial memory operations: area LIP and saccade behavior. Soc Neurosci Abstr 57: 16, 2002.
Batista AP and Andersen RA. The parietal reach region codes the next planned movement in a sequential reach task. J Neurophysiol 85: 539544, 2001.
Bremmer F, Distler C, and Hoffmann KP. Eye position effects in monkey cortex. II. Pursuit- and fixation-related activity in posterior parietal areas LIP and 7A. J Neurophysiol 77: 962977, 1997.
Britten KH, Shadlen MN, Newsome WT, and Movshon JA. The analysis of visual motion: a comparison of neuronal and psychophysical performance. J Neurosci 12: 47454765, 1992.[Abstract]
Brotchie PR, Andersen RA, Snyder LH, and Goodman SJ. Head position signals used by parietal neurons to encode locations of visual stimuli. Nature 375: 232235, 1995.[CrossRef][Medline]
Churchland AK and Lisberger SG. Eye velocity tuning of extraretinal responses in MST. Soc Neurosci Abstr 56: 2, 2002.
Colby CL, Duhamel JR, and Goldberg ME. Oculocentric spatial representation in parietal cortex. Cereb Cortex 5: 470481, 1995.
Colby CL, Duhamel JR, and Goldberg ME. Visual, presaccadic, and cognitive activation of single neurons in monkey lateral intraparietal area. J Neurophysiol 76: 28412852, 1996.
Colby CL and Goldberg ME. Space and attention in parietal cortex. Annu Rev Neurosci 22: 319349, 1999.[CrossRef][ISI][Medline]
Droulez J and Berthoz A. A neural network model of sensoritopic maps with predictive short-term memory properties. Proc Natl Acad Sci USA 88: 96539657, 1991.
Duhamel JR, Bremmer F, BenHamed S, and Graf W. Spatial invariance of visual receptive fields in parietal cortex neurons. Nature 389: 845848, 1997.[CrossRef][Medline]
Duhamel JR, Colby CL, and Goldberg ME. The updating of the representation of visual space in parietal cortex by intended eye movements. Science 255: 9092, 1992.
Froehler MT and Duffy CJ. Cortical neurons encoding path and place: where you go is where you are. Science 295: 24622465, 2002.
Fukushima K. Corticovestibular interactions: anatomy, electrophysiology, and functional considerations. Exp Brain Res 117: 116, 1997.[CrossRef][ISI][Medline]
Funahashi S, Bruce CJ, and Goldman-Rakic PS. Mnemonic coding of visual space in the monkey's dorsolateral prefrontal cortex. J Neurophysiol 61: 331349, 1989.
Funahashi S, Bruce CJ, and Goldman-Rakic PS. Visuospatial coding in primate prefrontal neurons revealed by oculomotor paradigms. J Neurophysiol 63: 814831, 1990.
Galletti C, Battaglini PP, and Fattori P. Parietal neurons encoding spatial locations in craniotopic coordinates. Exp Brain Res 96: 221229, 1993.[ISI][Medline]
Gnadt JW and Andersen RA. Memory related motor planning activity in posterior parietal cortex of macaque. Exp Brain Res 70: 216220, 1988.[ISI][Medline]
Goldberg ME and Bruce CJ. Primate frontal eye fields. III. Maintenance of a spatially accurate saccade signal. J Neurophysiol 64: 489508, 1990.
Israel I, Ventre-Dominey J, and Denise P. Vestibular information contributes to update retinotopic maps. Neuroreport 10: 34793483, 1999.