|
|
||||||||
York Centre for Vision Research, Canadian Institute of Health Research Group for Action and Perception, Departments of Psychology, Biology, and Kinesiology and Health Sciences, York University, Toronto, Ontario, Canada
Submitted 25 March 2004; accepted in final form 1 November 2004
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
The fundamental reason for this is that visual signals originate in the retina and are therefore defined in an eye-fixed frame, whereas Listing's law constrains the eye to rotate around certain axes defined in a head-fixed coordinate system (Crawford and Guitton 1997
). As a result, simulations of a fixed mapping between the 2-D retinal error and the 3-D motor error command produce movements that diverge erroneously, as a nonlinear function of initial eye orientation. Moreover, because this arises from the geometry of projecting an eye-fixed spatial vector onto effectors that do not rotate about eye-fixed axes, essentially the same problem arises for visually guided gaze shifts involving both the eyes and head (Klier et al. 2001
) and visually guided arm movements (Crawford et al. 2000
), only with potentially much larger errors.
Behavioral studies have shown that the human saccade generator does not make such eye errors (Klier and Crawford 1998
), even for express or memory-guided saccades (Henriques and Crawford 2001
), but rather maps RE onto different saccade vectors as a function of initial eye orientation. In particular, it has been shown that one visual RE (measured in eye-fixed coordinates) must be mapped onto saccade trajectories with orthogonal components that diverge by as much as ±15° or more in the oculomotor range, depending on the size of the saccade and initial eye position (Crawford and Guittion 1997
; Klier and Crawford 1998
). Thus there must be some physiological mechanism that accounts for eye orientation in the visuomotor transformation.
In theory, this problem could be solved by fixing the eye muscle pulling directions with respect to the eyes, but this would violate Listing's law, producing large pseudo-random torsional positions at the end of each saccade that do not occur in real life (Crawford and Vilis 1991
; Tweed and Vilis 1990a)
. (Arranging neck and shoulder muscles in eye coordinates to solve the problem is even less plausible.) Instead, eye muscle pulling directions appear to be optimized for Listing's law, which requires saccade velocity axes to tilt halfway, not all the way, with eye position (Demer et al. 1995
; Quaia and Optican 1998
). This leaves the question, how does the brain implement the reference frame transformation required to map visual signals onto the kind of motor commands appropriate to provide accurate eye movements that satisfy the constraints of Listing's law?
One clue is that we may be able to sketch out the region within the brain where this transformation takes place. The superior colliculus (SC) appears to encode visual and motor signals much like the retina, in a 2-D topographic map (Hepp et al. 1993
; Van Opstal et al. 1991
). Moreover, stimulation of the SC in the head-unrestrained animal evokes gaze shifts with fixed directions in eye coordinates and, conversely, vary as a function of initial gaze position when plotted in space coordinates (Klier et al. 2001
). This suggests that the SC encodes a gaze command in retinal coordinates, and the position-dependent reference frame transformation is implemented downstream. Further, this transformation is probably complete in the oculomotor system at the level of the premotor short-lead burst neurons because these appear to encode displacements in eye orientation within a 3-D head-fixed coordinate system aligned with Listing's plane (Crawford 1994
; Crawford and Vilis 1992
; Henn et al. 1989
).
Thus the reference frame transformation for saccades appears to occur between the SC and the brain stem burst neurons (perhaps with help from side-loop structures like the cerebellum) (Lefevre et al. 1998
; Quaia et al. 1999
), although we have little clue as to how this might occur. In models of the saccade generator that touch on this problem, the reference frame transformation involves a stage in which the retinal goal of the saccade is transformed into a head-fixed desired gaze direction, from which the system could then compute the correct head-fixed kinematics to drive a 3-D saccade (Crawford and Guitton 1997
; Glasauer et al. 2001
; Tweed and Vilis 1990; Zee et al. 1976
). This includes network models that used eye position "gain fields" in a distributed neural net to transform RE into a head-centric representation of target direction (Zipser and Andersen 1988
). Such head-centric representations have never been observed physiologically in the cortical saccade control centers, the SC, or brain stem (Batista et al. 1999
; Colby and Goldberg 1999
; Colby et al. 1995
; Russo and Bruce 1996
; Sabes et al. 2002
; Sparks 1988
, 1989
).
This leaves open the question of how the required reference frame transformations take place. Moreover, any inquiry into this question also has to consider that the same transformations must also deal with other computational problems, such as the degrees of freedom problem. Because the gaze command signal from the SC is 2-D (Hepp et al. 1993
; Klier et al. 2003
; Van Opstal et al. 1991
), whereas the brain stem burst generator possesses a 3-D coordinate system that can generate eye rotations about any axis (Crawford and Vilis 1992
; Henn et al. 1989
), something between must select the right axis for Listing's law (which only allows saccades to reach eye orientations that are rotated from primary position about an axis in Listing's plane). As discussed above, the implementation of Listing's law places constraints on the visuomotor reference frame transformation. Most models have included a Listing's law box that calculates a desired eye orientation in Listing's plane (Crawford and Guitton 1997
; Glasauer et al. 2001
; Tweed and Vilis 1990a)
.
Finally, the circuits that derive saccadic motor commands must also deal with the spatiotemporal transformation. The spatiotemporal transformation arises in the oculomotor system because the SC encodes saccade metrics with the use of a place code on a topographic map, whereas the reticular burst neurons must supply a rate code in the appropriate coordinate system to the extraocular muscles (Becker and Klein 1973
; Robinson 1974
; Westheimer and Blair 1973
). A number of theoretical and physiological investigations have considered this problem (e.g., Hepp and Henn 1983
; Optican and Quaia 2002
; Tweed and Vilis 1990a
), including the question of how this fits within dynamic feedback loops for saccade control that are beyond the scope of the present investigation. However, the physiological transformation from a topographic code to a coordinate system vector representation is probably inseparable from the oculomotor reference frame transformation.
The problem facing neurophysiologists trying to solve these questions experimentally, as we see it, is that we really do not know what type of signals to look for on a neuron-by-neuron basis. In fact, it is quite possible and likely that the solution is right at the tip of our electrodes and we do not know it. Most current investigators would agree that signal processing in the braineven for the feedforward transformations in the brain stemis a distributed process (Sparks 2002
). Therefore one potentially fruitful approach is to train simple neural nets to perform these transformations, and then analyze the workings of such networks to gain insights that we can apply to the real system (Krommenhoek and Wiegerinck 1998
; Zipser and Andersen 1988
). If we can understand how a simple neural net works, where we have complete knowledge of the inputs, outputs, and architecture, as well as the ability to manipulate these values at will, then we might hope to understand the limited amount of data that one can collect from the real brain.
The current investigation applies this approach to understand how visual information encoded at the level of the SC (or any similar topographically organized, eye-centered structure) might be transformed into a geometrically correct signal to drive brain stem burst neuron populations. Note that the focus of our study was to see how such a network would implement the reference frame transformation, but the other transformations (spatiotemporal and 2-D to 3-D) were implicit in our network. For example, our networks produced a motor displacement code (initial motor error for a saccade) that could easily be converted into a rate code for instantaneous motor error, by placing it within a local feedback loop (Crawford and Guitton 1997
). The networks described below do not perform this complete transformation, but do perform the transformation from a topographic "place code" to a coordinate system that could encode saccade dynamics. Similarly, our networks were required to convert a 2-D visual input into a motor command that actively maintained ocular torsion at zero (to comply with Listing's law). Thus these networks implicitly learned to solve the 2-D to 3-D transformation and one aspect of the spatiotemporal transformation, while also learning to perform the eye-to-head reference frame transformation that was the focus of our investigation.
Unlike most previous neural net studies of reference frame transformations (e.g., Krommenhoek et al. 1993
; Zipser and Andersen 1988
), we used the noncommutative mathematics of 3-D rotations in our inputoutput relations (Crawford and Guitton 1997
; Tweed and Vilis 1987
); and, unlike our previous study of this topic (Smith and Crawford 2001a
), these inputs/outputs were biologically inspired, based on the known topography of visual signals in the SC and coordinate systems observed in the brain stem saccade generator. The results suggest that an understanding of visuomotor reference frame transformations may be obtainable within the framework of currently known physiology.
| METHODS |
|---|
|
|
|---|
Network architecture
Figure 1A shows a diagrammatic view of the network structure and the general nature of its inputoutput signals that were used in this study. Networks were composed of 3 layers: an input layer consisting of 513 units; a hidden layer with 4, 9, 16, 25, 36, or 49 hidden units; and an output layer consisting of 6 units. When referring to a specific size, or class, of network we use the number of hidden units as a nomenclature. Thus a network with 9 hidden units will be referred to as a 9-unit network. All networks had feedforward connections only (i.e., no connections existed between any units within a layer).
|
Inputoutput signals
Networks were provided with 2 input signals: 1) an eye position (EP) signal specifying initial eye position in orthogonal 3-D head-centric coordinates, where each EP was constrained to be within a 100° diameter ocular motor range (OMR); and 2) a retinal error (RE) signal, in eye coordinates, that was specified on a 2-dimensional (2-D) map and was constrained to be within 80° of map center. In addition, the target location specified by the RE was constrained to be within 50° of primary position so that the model would not attempt to saccade beyond the oculomotor range. The teaching signal specified the vectorial motor error (ME) required to drive the eye from its current position to the one specified by the visual target location, such that the target was acquired by gaze and final EP landed in the normal "Listing's plane" range of 3-D eye positions observed during head-fixed saccades (Crawford and Guitton 1997
). That is, we chose a teaching signal such that the head-fixed torsional component of EP was set to zero. However, this does not invalidate the 3-D aspect of the model because this 3rd degree of freedom must be as strictly controlled by the networks as the other two.
Visual inputs
The most important input to the saccade generator for visually guided saccades is vision itself. For physiological inspiration, we based the properties of the visual inputs to our network on neurophysiological recordings from the superficial layers of the SC. We chose this structure because it probably encodes visual information to be used by the saccadic system in a form that is relatively unprocessed.
Figure 1B shows a magnified representation of the visual inputs to the network as illustrated in Fig. 1A. The "floor" of the figure represents the full extent of the virtual visual map in the vertical and horizontal dimensions (90° to 90°). As shown, the input units to the network were topographically arranged where the density of input units decreased with distance from the center of the map. That is, the input units monitored a central region covering 2° of extent with a spacing of 1°, whereas adjacent annular areas covering 4°, 10°, and
80° of extent encoded resolutions of 2°, 4°, and 10°, respectively (solid dots). Note that because of the viewpoint in this figure, the units covering the central 1° and 2° areas coalesce into a homogeneous representation. Each of the input units had an eye-fixed visual receptive field determined by a Gaussian function with a maximum activity level of 1 at its center.
Also shown is a vertically displaced pictorial representation of 2 Gaussian receptive fields at 2 different distances from the center of the map, one at horizontal and vertical coordinates of (20°, 20°) and the other at (40°, 40°). Projected onto the map is the corresponding contour outline of each of the 2 Gaussian curves' receptive fields. As illustrated, the width of each input unit's receptive field depended on its (virtual) topographical location. So, for example, the unit located at (20°, 20°) has a receptive field width (sigma) of 7.8°, whereas the unit located at (40°, 40°) has a receptive field width of 14.8°. In general, the sigma value for the Gaussian receptive field of the units ranged from 1° to 20° with increasing eccentricity from the center of the map. This approach was intended to mimic the larger receptive field sizes of eccentric locations found in sensory structures such as the SC and the retina. The receptive field widths used here were estimated from data presented in Cynader and Berman (1972)
.
Visual target inputs to the networks were computed from 2-D vectors that specified the horizontal and vertical integer components of peripheral targets on the virtual 2-D map. The vector components of the visual targets were specified to the nearest degree and covered an area within an 80° visual range (large circle). Each of these "stimuli" then activated a number of the visual input units, depending on the location of the stimulus on the virtual topographic map and the width of the Gaussian receptive field of the units in that part of the map.
Eye position inputs
Based on our previous studies (Crawford and Guitton 1997
; Klier and Crawford 1998
; Smith and Crawford 2001
a) we knew that our network would require a 3-D EP input to solve the reference frame transformation for saccades. This could come from proprioceptive sensors in the eye muscles, but most oculomotor models assume that it arises as an efference copy from the oculomotor "neural integrator" that provides the motor signal to hold eye position between saccades (Crawford 1994
; Quaia et al. 1999
; Robinson 1975
; Tweed 1997
). However, no previous model of the saccadic visuomotor transformation that we know of has used a realistic representation of these 3-D neural integrator signals.
Most recent studies of 3-D eye movements have described eye orientations using vectors that are parallel to the axis of rotation that brings the eye to that position from some central reference position. Listing's law states that during saccades with the head fixed, these eye orientation vectors all align with a head-fixed plane (Tweed and Vilis 1990). The gaze direction at one particular reference position, the primary position, is orthogonal to the special plane of vectors, called Listing's plane, and parallel to the head-fixed torsional axis (Westheimer 1957
).
The oculomotor neural integrator is a distributed structure with a critical region for horizontal integration occurring in the nucleus prepositus hypoglossus (Cannon and Robinson 1987
; Cheron and Godaux 1987
) and a critical region for vertical/torsion integration in the midbrain interstitial nucleus of Cajal (Crawford et al. 1991
; Fukushima et al. 1990
). Unlike the sensory codes found in visual structures, the brain stem neural integrator does not use a topographic map but rather represents similar directions in clustered populations, each controlling individual coordinate axes. Experimental studies suggest that the 3-D oculomotor neural integrator is organized in a head-fixed coordinate system similar to that of the eye muscles, but aligned with Listing's plane. That is, with the vertical axis (for the horizontal component of eye orientation) aligned within Listing's plane, and the torsionalvertical coordinate axes aligned in the horizontal plane orthogonal to Listing's plane (Crawford 1994
; Crawford et al. 1991). Thus to encode a purely horizontal position, only the horizontal integrator needs to be activated, but the torsionalvertical coordinates are arranged in such a way that, e.g., an upward EP would require the coactivation of an up-counterclockwise population of neurons on the left side of the midbrain and an up-clockwise population on the right side (Crawford et al. 1991).
To mimic the geometric output of such a structure, we started with a 3-D coordinate system for angular orientation of the eye (torsional, vertical, and horizontal) in orthogonal, right-handed head-centric coordinates, and using the right-hand rule to define the direction of rotation (Tweed and Vilis 1990b
). These coordinates were then rotated 45° about the vertical axis to form a new coordinate system (Fig. 1A: "Eye Position Signal") symmetric about Listing's plane as described above (Crawford 1994
; Crawford et al. 1991
).
A second physiological constraint is that real neurons do not have negative firing rates. To represent "negative directions" the vestibuloocular system, including the neural integrator, has developed a pushpull mechanism where positive and negative values correspond to modulations up or down about a background firing rate (Fukushima et al. 1990
, 1992
; King et al. 1981
). To mimic this, we transformed the 3-D EP vector into a 6-D vector that represented the angular rotation around each of the 3 axes by a yoked pair of components. Component pairs 1 and 2, 3 and 4, and 5 and 6 encoded the torsional, vertical, and horizontal components of rotation, respectively (in right-hand coordinates). Each component ranged between 0 and 1, where a balance of activity, indicated by the values (0.5, 0.5), for a single pair, represents 0° rotation around a particular axis.
In calculating the values for each of the 6-D components we used the following formulation
![]() |
Motor outputs
We wanted the motor output signal of the network to represent the 3-D motor error command that drives all reticular formation short-lead burst neurons (one synapse up from both motoneurons and the neural integrator) at the beginning of saccades. Little is known about the complex synaptic physiology of the inputs to short-lead burst neurons. These may include direct input from the SC, from "long-lead" burst neurons in the reticular formation, burst-driver neurons, and vestibular-related inputs (e.g., Hepp and Henn 1983
; Kaneko and Fukushima 1998
; Moschovakis and Highstein 1994
; Schnyder et al. 1985
; Sparks 1988
, 1989
; Stanton et al. 1988
). Therefore it is easier to define the physiological correlate of this command according to its target (i.e., as the command coded by the total ensemble of synaptic input to the short-lead burst neurons), which are much better understood. Equivalently, one can view this command as representing the initial motor error encoded by the total ensemble of short-lead burst neurons at the commencement of a saccade.
We encoded motor error as the vectorial change in eye orientation (
E) required to move the eye from its initial 3-D orientation (Ei) to the final desired orientation (Ed) and takes the form
E = Ed Ei. This command has been shown (Crawford and Guitton 1997
) to be the appropriate code to drive the oculomotor short lead burst neurons if the latter code is "rate of change in 3-D eye orientation" & Edot; (Hepp et al. 1994
; Quaia and Optican 1998
; Suzuki et al. 1995
). This burst signal has been shown in simulations (Crawford and Guitton 1997
) to work equally well at driving a 3-D plant model with head-fixed muscle pulling directions (Tweed and Vilis 1987
) or in a "pulley" plant model where the eye muscle pulling directions tilt according to the half-angle rule for Listing's law (Demer et al. 1995
; Quaia and Optican 1998
). For the head-fixed plant model, the burst signal has to be modified downstream by position signals (Crawford 1994
). That is, if the burst neurons encode angular velocity in head coordinates (explicitly coding the axis tilts for Listing's law) then the reference frame transformation problem is magnified, producing twice the deviation between the retinal code and the motor code as a function of eye position.
We then needed to choose a coordinate system to encode this command. Again, because little is known about the way that short-lead burst neuron inputs code the spatial aspects of motor error, we took our cue from the short-lead burst neurons themselves. These neurons are arranged into populations within anatomic nuclei that appear to use a spatial coding scheme similar to that of the neural integrator; that is, they control saccade components in a similar way to the semicircular canals and eye muscles, forming a set of coordinate axes that are aligned with and symmetric to Listing's plane (Crawford and Vilis 1992
; Suzuki et al. 1999
).
Like the neural integrator, saccade burst neurons are arranged into pairings that control opposite saccade directions. There are also indications that these neurons have a "background firing rate," in the sense that their firing rates are well above zero for saccades in the direction orthogonal to their preferred direction (Cullen and Guitton 1997
; Van Gisbergen et al. 1981
). In this sense, burst neurons also appear to use a pushpull arrangement to linearize their output within a certain range. Therefore taking these as clues as to what might be the appropriate signal to drive such neurons, we coded motor output using the same 6-dimensional coordinate system that was used for the neural integrator.
For example, to encode a 15° purely upward eye movement, which would be represented by the 3-D rotation vector (0, 15, 0) in standard right-hand coordinates, in our 6-D vector components 1 and 2 evaluate to (0.5, 0.5), whereas components 3 and 4 code (0.425, 0.575) and components 5 and 6 encode (0.5, 0.5). Similarly, an 80° rightward movement from an initial eye position of, say, 50° to the left of center, would be encoded by the activations (0.5, 0.5), (0.5, 0.5), and (0.1, 0.9). Note that saccades of this amplitude were the maximum allowable because supplied RE never exceeded 80° and initial EP was always within 50° of center.
Network training
Learning in the networks was accomplished by using the standard back-propagation algorithm (Rumelhart et al. 1986
) with the addition of a momentum term (10%). Weights were updated incrementally, that is, weights were updated after the presentation of each exemplar (for a more complete description of this learning algorithm, see Smith and Crawford 2001a
).
Briefly, learning in the network occurs when each output unit computes its error term (E) from the teaching signal (the mean squared error between network output and the desired output), computes its weight correction term
Wij (by multiplying the input to the unit [f(i)], the error computed above), which is used to update the weight matrices, and then multiplies the weight correction term by a learning constant (
) that, in these networks, was set to 0.5. That is
![]() |
W. Of course, the error is passed through the same weights that generated the current output pattern (before the update of the weights). Thus units that contributed more to the error in the pattern will be changed more by the weight update procedure, whereas those units contributing little to the current pattern error will be changed little.
In our networks, the output signals of all units were positive real values ranging from 0 to 1. This output constraint was accomplished by using a standard sigmoid transfer function, specified by
![]() |
is the range of the sigmoid (0 to 1); to achieve this range,
= 2 (because the maximum
is later subtracted).
is the slope of the linear portion of the sigmoid (
= 1);
is the maximum of the sigmoid (
= 1). (For a complete mathematical description of back-propagation see the APPENDIX of Smith and Crawford 2001.)
To train the networks we used a series of combinations of initial eye positions and retinal errors selected randomly from the following training set: Initial eye positions were laid out in a horizontalvertical grid with 20° spacing, centered on (0, 0) and limited to fall within the round 50° OMR. Initial eye positions always had zero torsion in Listing's coordinates (i.e., they were in Listing's plane). Each initial eye position had associated with it
56 REs where such eye position and retinal error pairings would not exceed the 50° OMR. These REs were chosen randomly from an array of directions resembling a "star pattern" (imagine a cross superimposed on an x), such that all of the cardinal and oblique directions were represented in the training set. For amplitude, each direction had
7 REs located at 2°, 5°, and then in 10° increments from 10° through 50°. (In preliminary trials we found that training on the small retinal errors <10° was necessary to avoid bizarre behavior in very small saccades.)
To compute the ideal motor error (ME) for these visual and eye position combinations we followed the algorithm outlined in Crawford and Guitton (1997)
, also used in our previous study (Smith and Crawford 2001). This algorithm provides the motor error (
E) required to take the eye from its current 3-D orientation to the orientation in Listing's plane that satisfies the desired 2-D gaze direction. In brief, using quaternion representations, we first computed the desired gaze relative to the eye by converting the 2-D visual signal into a desired gaze signal in eye coordinates (DGeye). Using this convention, the first and second components of 3-D gaze were the vertical and horizontal measurements of RE and the third component was a forward-pointing unit vector. We then rotated DGeye by initial eye position (Ei), which results in the desired 2-D gaze relative to the head (DGhead). Next DGhead was put through a Listing's law operator (Tweed and Vilis 1990), resulting in a desired 3-D eye position command (Ed). Finally, the ME was then computed by subtracting Ei from Ed by first converting the quaternion representation to vectors while maintaining the same coordinate system. These vectors were then scaled as a function of the angle of rotation and converted into the 6-D format required by the neural networks. Thus the error signal for the back-propagation algorithm was the difference between these computed ideal values and the actual values output by the network.
We initially trained 12 networks with the following increments of hidden unit number: 4, 9, 16, 25, 36, and 49, training 2 of each size to determine the minimum network size required to learn the task. We defined this as the minimum number of units required to successfully reduce the sum of squared error of the network output to <0.01. This training goal was chosen because it resulted in a network output within about 1° of ideal performance. Figure 2 shows the error curves generated by a representative network of each size as indicated by the associated number. The y-axis represents the ongoing error (in degrees) during training, whereas the dashed line indicates the training goal. The x-axis indicates the number of epochs (where an epoch indicates one pass through the entire training set). Note that, although networks typically started with errors of >0.25 (about 20°), values above this are cut off the scale of the figure. All networks quickly reduced their training error to below 0.25 in an initial near-vertical drop. After our initial observations, we tried smaller increments of network size to establish the exact threshold.
|
| RESULTS |
|---|
|
|
|---|
Because all of the networks that we accepted for analysis generated saccades accurate to within 1° of ideal performance on average, clearly they were able to generate essentially accurate saccades. Here we checked certain crucial aspects of their performance before considering their "neural coding mechanisms." In particular, we carefully checked the position-dependent aspects of their performance related to the reference frame transformation.
The saccade reference frame transformation requires that the network map any one RE onto different MEs as a function of EP (Crawford and Guitton 1997
). To determine how well the networks performed compared with ideal behavior, we examined the networks with a test set consisting of initial eye positions at 5° intervals from 40° left to 40° right with 0° centered at the straight-ahead eye position. Each position was tested with a 30° upward and a 30° downward RE (Fig. 3A). Another similar test set was also used except that initial eye positions ranged from 40° down to 40° up where each position was tested with a 30° leftward and a 30° rightward RE (Fig. 3B). Thus there were 16 representative eye positions and 32 retinal errors associated with each of these tests. These combinations of eye positions and retinal errors were chosen so that the requisite movements would remain inside the 50° oculomotor range within which the networks were trained.
|
), and the actual ME output of the trained network (
). The RE is plotted in retinal coordinates, whereas EP and ME are plotted in head-centric coordinates.
Note that RE (dashed trace with star) and ideal ME (
) diverged from one another in a position-dependent fashion, as demonstrated previously both theoretically and experimentally (Crawford and Guitton 1997
; Klier et al. 1998; Smith and Crawford 2001b
). This is the position-dependent reference frame transformation that the network had to learn: if it did so, actual performance (
) would follow the ideal ME vectors (
), whereas the absence of any position-dependent transformation would cause the actual network output to follow the RE vectors (dashed lines ending in a star). The actual output of the network closely followed that of ideal performance, and in these graphs often occlude it, with a mean error for this network in this task of 0.55° (SD: 0.43°). For the horizontal saccade task (Fig. 3B) the mean network error was 0.67° (SD: 0.37°). Across both of these performance tests the network's mean error was 0.62° (SD: 0.40°). Thus this network learned the position-dependent visuomotor transformation within the 1° of error required by our training goal, similar to the performance observed in human saccades (Klier et al. 1998).
To quantify performance across networks, we compared network performance with that of a model that does not take eye position into account, but that simply maps RE onto ME without accounting for the different reference frames (Crawford and Guitton 1997
). Figure 3 CF, shows the results of this comparison for all 9-unit networks. The open circles represent the response of the previously tested 9-unit network (A/B), whereas the thin 2nd-order fits illustrate the responses of the other 9-unit networks. Plots C and E show the vertical saccade task, whereas D and F show the horizontal saccade task, both corresponding to the graphical depictions in the left column. In each panel, the values along the x-axis correspond to these initial eye positions, whereas values along the y-axis represent the angular difference (in degrees) between the directions of ideal motor performance and either actual network performance or the values predicted for the no-position-compensation model. Perfect position compensation would result in all performance values aligning with the abscissa. The thick dashed line represents errors in the predicted motor performance of the no-position-compensation model.
Residual position-dependent directional errors in the actual performances were nonsystematic and small in all of the networks. A performance estimate of the residual errors across these networks revealed that network outputs on average were 16% of those predicted by no eye position compensation in the vertical saccade task (Fig. 3 C and E,), whereas network outputs in the horizontal saccade (Fig. 3 D and F,) task were 17% of the "no position compensation" prediction. Similar patterns of behavior for 16-, 25-, 36-, and 49-unit networks were observed (not shown), but with even lower residual errors. Moreover, errors in torsional eye position from Listing's plane (not shown) were so low as to be uninteresting: they were essentially zero for networks with <36 units and <0.03° in the larger networks.
Visual receptive fields of hidden units
Having established that our biologically inspired networks had learned the correct transformations at the behavioral level, we set out to determine the "neural mechanisms" for these transformations. To characterize the visual receptive field of a saccade-related neuron, investigators systematically move pinpoints of light to cover the visual space and record when the target neuron is active (Hamed et al. 2001
; Hubel and Wiesel 1959
; Russo and Bruce 2000
). In this way, a visual response profile of the target cell can be constructed. We followed a similar procedure to characterize the visual receptive field of each hidden unit; that is, we sequentially "stimulated" every possible target location on the visual input map with a unit stimulus and recorded the output of each hidden unit (while eye position was fixed at center).
The resultant visual response profiles for a typical 9-unit network are shown in Fig. 4A, showing all 9 hidden units (19). (The same network is used in subsequent figures to facilitate comparisons between figures.) For completeness, we show the entire visual map that ranges from 90 ° to 90° in both the horizontal and vertical dimensions, although the actual visual range of a unit was restricted to ±80° (white circle overlaid on the receptive field of unit 1). Clearly, the visual receptive fields of the hidden units are complex with regions of highest sensitivity (red) corresponding to the unit's preferred direction, regions of lowest sensitivity (black), and graded transitional regions between these. The complexity of these visual receptive fields is reminiscent of those found in the lateral intraparietal area (e.g., Hamed et al. 2001
).
|
We performed such a principal-components analysis with the eye looking at the straight-ahead primary reference position. The original receptive fields were then remapped into the new data space formed by the principal components. Figure 4 BI, shows the results of this analysis for exemplary 9-unit (left column, BE) and 36-unit (right column, FI) networks.
In both of these networks (and across all 9 of the networks tested: 5 of the 9-unit class and 2 each of each of the 16-, 25-, 36-, and 49-unit classes) 4 principal components accounted for about 95% of the variance in the data (80% by the first 2 and 15% by the remaining 2). The dominant, first 2 components (Fig. 4 B, C and F, G,) always showed a fairly simple organization. First, each component had an antipodal structure with oppositely tuned maximal and minimal response zones. Second, the orientations of the grading for the first (B and F) and second (C and G) components were orthogonal. This appears to provide the basis coordinates for specifying target direction in 2-dimensional space. The other 2 remapped receptive fields (D and E; H and I) showed a more complex structure capturing some of the complexity in the original receptive fields. Again, these 4 components captured 95% of the visual target data required to produce accurate saccades, but this is not to imply that only a few units in the trained networks do the majority of the work with the remaining units contributing little to the solution. In the complete networks this information is distributed across all of the units and is not present in this orthogonalized and optimized form. Also, as we shall see, it is the interaction between different units that is critical for the overall performance of the model.
Reference frames for the sensory signal
The previous section describes the visual receptive fields of hidden units at one, fixed central eye position. To determine the reference frame for these visual receptive fields (i.e., are they fixed relative to the eye, relative to the head, or some other alternative) we had to retest these visual receptive fields at different eye positions. Figure 5 shows the visual receptive field surface of hidden unit 6 from the same 9-unit network illustrated in Fig. 4; however, now the network was tested with eye position fixated at 4 different eye positions: 30° up, down, left, and right. It is important to note that this receptive field map is plotted in retinal coordinates; that is, responses are mapped according to the locations of the stimuli relative to the fovea, in eye-fixed coordinates. If the visual receptive fields were fixed in head (or space) coordinates, they would shift in these plots in the direction and amount opposite to the eye position shift. However, no such shift was observed. If one observes the key topological features in the receptive field, such as the location of maximal and minimal response areas, one can see that the hidden unit visual receptive field appears to stay absolutely fixed relative to the simulated retina.
|
Note first, that these maximum points were almost always found at the edge of the visual range, resembling the "open-ended" response fields observed in some SC units (Freedman and Sparks 1997
). More important, we have coded the maximal response points for each of the 4 eye positions as a different sized circle. Note that each set of circles forms a perfectly overlapping set of concentric rings; in other words these points overlap perfectly in eye coordinates with absolutely no change in direction or distance relative to the central fovea. A similar result for a 16-unit network is shown in Fig. 5F, and indeed we found the same result held for all networks. Thus the topological organization of the visual receptive fields in our hidden units was always absolutely eye-fixed. The central remaining question hereas in real neurophysiologyis: how do these eye-fixed visual receptive fields get mapped onto the correct saccade vector in motor coordinates?
Relationships between visual, eye position, and motor sensitivity vectors
In our previous network (Smith and Crawford 2001a
), which solved the identical geometric problems using a simpler vector-only inputoutput organization, the hidden units formed certain invariant functional classes that subdivided the transformation into specific parallel-task modules. For example, a dominant class of units, called the vector propagation class, were organized into an orthogonal coordinate system where the retinal error tuning vector and the motor error tuning vector were more or less aligned, which was shown to provide the main drive that moved saccades in the correct direction. We also showed that the eye-positiondependent modification of motor error was supplied by a smaller module of units, called position-opposite, because eye position and motor tuning were more or less opposite in direction. This modularity was revealed when we computed a series of "sensitivity vectors."
Because this was a previously fruitful approach we began with the same type of analysis here. That is, we looked for a similar organization within and between the visual, eye position, and motor error output of the hidden units by constructing a similar series of sensitivity vectors. That is, we constructed 1) a sensitivity vector for maximal and minimal visual activity, 2) a sensitivity vector for eye position tuning, and 3) a sensitivity vector for motor tuning. The visual sensitivity vector was simply the vector from the origin to the location of the maximum and minimum values in the visual receptive field (see Fig. 4A for an example of the visual receptive fields of a 9-unit network). Other methods, like the calculation of the "center of mass" of these visual activation fields, did not appear to provide better information.
The eye position sensitivity vector was the computed 3-D vector coded by the 6 activation weights of the appropriate inputs to the hidden units (the last 6 weight values of the input to hidden unit weight matrix). To accomplish this, we first performed the inverse of the transformation that we originally used to convert the 3-D eye position vector into a 6-D representation (see above). Operating on the resulting 3-D vector, we then counterrotated it to restore it into standard Cartesian coordinates.
For the motor sensitivity vector we used the weights between the hidden and the output layers. That is, we computed the direction and magnitude of the motor sensitivity vector for each hidden unit based on its projection weights to the output layer using the same inverse procedure as described above for the eye position activations (also, see METHODS and Smith and Crawford 2001a
). The horizontal and vertical components of these sensitivity vectors are plotted in Fig. 6 for each of the 9 hidden units in the network.
|
10°). For example, an inspection of units 2, 3, 4, 6, 8, and 9 shows little alignment between visual and motor tuning [mean of 93° (SD: 62°)], whereas the remaining units (1, 5, and 7) show a somewhat better alignment with a mean angular difference of 47° (SD: 26°).
In addition, Smith and Crawford (2001a)
found a clear subclass of units with opposite motor and eye position tuning (i.e., with the difference between motor and eye position tuning
150°). However, between motor and eye position tuning vectors in the illustrated network we found a mean relative difference of 76° (SD: 59°). As well peak visual tuning and eye position tuning vectors showed a mean relative rotation of 82° (SD: 69°), unlike Smith and Crawford (2001)
. Despite intensive scrutiny, we were unable to discern any clear-cut functional relationships or groupings in the angular relationships between visual, motor, and eye position tuning vectors of this or any other network. Rather, we found that units had widely distributed visual, motor, and position tuning, with no clear functional relations between them. Thus although the receptive fields are oriented across units, they do not align to form a "coordinate system" (such as ordinary or rotated Cartesian coordinates ("+" or "x" in form) as was found in Smith and Crawford (2001)
.
The same conclusions remained when we approached this question more formally. A visual and quantitative cluster analysis (using standard Matlab routines) did not reveal the grouping characteristics seen in Smith and Crawford (2001)
. The relationships between the sensitivity vectors found here were rather loose and broadly tuned in this and in all 9-unit networks. It would appear that these networks, in simulating the distributed nature of the input and output signals, performed the visuomotor transformation in a more distributed manner than that evidenced previously in Smith and Crawford (2001)
. (Also, see on-line supplementary Fig. 1.) In light of this, our subsequent analysis focused on properties of the network that might give rise to this more distributed solution.
Motor coding in the hidden unit layer
As stated above, our hidden units showed widely dispersed motor tuning. Our next step was to characterize the position dependency of this motor tuning. In particular, we determined the reference frame and coding of the hidden unit motor output: the contribution that activation of each unit makes to the behavior of the network. This is determined by the final connection weights of the hidden unit to the output layer (at least within the linear working range of the output layer, corresponding to saccade components of ±60°). In fact, because we know that the output layer codes a fixed-vector 3-D eye displacement vector in head coordinates, and we know that the connections from each hidden unit to the output layer are fixed at the end of training, the motor output coding of hidden units was predetermined: the hidden units code fixed-vector eye orientation displacements in head coordinates, just like the output layer.1
|
40° down in B). The open circles represent the baseline control conditionhow well the network fixated these positions with a zero RE input; these "fixations" stayed within a mean window of 3°. This shows the background noise for our "stimulations." From each of these eye positions we then simulated stimulation by setting the output of the hidden units to 0.5 (chosen for illustration purposes). These "stimulations" drove the eye to a new location, as indicated by the filled circles. Note that the invoked saccades of the network are primarily vertical when unit 6 is stimulated, and primarily horizontal when unit 4 is stimulated, which agrees with their preferred motor tuning directions (see Fig. 6). More important, these saccade vectors were fixed (when plotted in head coordinates), independent of initial eye position. This resulted in a series of perfectly parallel vectors. Repeating this test with the largest stimulation value (1.0) produced "saccades" well outside the OMR (saccades of about 100°) but in exactly the same directions for each of the units. Similar results were obtained with different networks and units (not shown).
Thus each of the hidden units in our networks simultaneously existed in 2 reference frames: an eye-centered reference frame in their response to visual inputs and a head-centered reference frame in their fixed-vector output command. From this we can conclude two things: First, that the reference frame transformation was somehow occurring within the hidden unit layer. Second, that it was not occurring at the level of individual units because each such unit provided a fixed directional mapping of vision to movement independent of eye position.
Interactions between visual and eye position inputs
Somehow eye position signals were modulating the visual responses in our hidden unit layer to provide the position-dependent transformation illustrated in Fig. 3, but without shifting the topology of their eye-centered receptive fields. Could the mechanism take the form of a "firing rate" modulation like the classical gain fields of Zipser and Andersen (1988)
? To determine this, we examined the overall profile of the visual receptive field as a function of eye position (Salinas and Abbott 2001
). We constructed cross sections or "slices" of the visual receptive fields using the eye position sensitivity vector (Fig. 6) as a guide.
Figure 8 A and B,, shows the results of this investigation for unit number 2 of the same 9-unit network used in previous illustrations. Figure 8A shows the resultant cross sections with eye positions arrayed along their preferred axis at 20° spacing centered on the origin (see inset), whereas Fig. 8B shows the slices taken at eye positions lying along the axis orthogonal to the preferred one. The x-axis in both plots shows the full range of the visual map (±90°), although only the area between the dashed lines represents visual input to the network during training. Examination of Fig. 8B shows that in the direction orthogonal to the preferred axis, eye position does not modify the visual response because all 5 curves superimpose onto a single trace.
|