JN Add DOIs to your references at manuscript stage!
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J Neurophysiol 93: 1742-1761, 2005. First published November 10, 2004; doi:10.1152/jn.00306.2004
0022-3077/05 $8.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplementary data
Right arrow All Versions of this Article:
93/3/1742    most recent
00306.2004v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (12)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Smith, M. A.
Right arrow Articles by Crawford, J. D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Smith, M. A.
Right arrow Articles by Crawford, J. D.

Distributed Population Mechanism for the 3-D Oculomotor Reference Frame Transformation

Michael A. Smith and J. Douglas Crawford

York Centre for Vision Research, Canadian Institute of Health Research Group for Action and Perception, Departments of Psychology, Biology, and Kinesiology and Health Sciences, York University, Toronto, Ontario, Canada

Submitted 25 March 2004; accepted in final form 1 November 2004


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 GRANTS
 REFERENCES
 
Human saccades require a nonlinear, eye orientation–dependent reference frame transformation to transform visual codes to the motor commands for eye muscles. Primate neurophysiology suggests that this transformation is performed between the superior colliculus and brain stem burst neurons, but provides little clues as to how this is done. To understand how the brain might accomplish this, we trained a 3-layer neural net to generate accurate commands for kinematically correct 3-D saccades. The inputs to the network were a 2-D, eye-centered, topographic map of Gaussian visual receptive fields and an efference copy of eye position in 6-dimensional, push–pull "neural integrator" coordinates. The output was an eye orientation displacement command in similar coordinates appropriate to drive brain stem burst neurons. The network learned to generate accurate, kinematically correct saccades, including the eye orientation–dependent tilts in saccade motor error commands required to match saccade trajectories to their visual input. Our analysis showed that the hidden units developed complex, eye-centered visual receptive fields, widely distributed fixed-vector motor commands, and "gain field"–like eye position sensitivities. The latter evoked subtle adjustments in the relative motor contributions of each hidden unit, thereby rotating the population motor vector into the correct correspondence with the visual target input for each eye orientation: a distributed population mechanism for the visuomotor reference frame transformation. These findings were robust; there was little variation across networks with between 9 and 49 hidden units. Because essentially the same observations have been reported in the visuomotor transformations of the real oculomotor system, as well as other visuomotor systems (although interpreted elsewhere in terms of other models) we suggest that the mechanism for visuomotor reference frame transformations identified here is the same solution used in the real brain.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 GRANTS
 REFERENCES
 
The goal of the saccadic system is to orient the fovea toward points of interest within the visual field. The visuomotor transformation for this process is generally understood to begin with the specification of the vectorial difference between the point of current regard and the point of the peripheral retinal stimulation (RE: retinal error) and end with a vectorial motor command required to move the eyes from the current to the desired orientation (ME: motor error) (Moschovakis and Highstein 1994Go; Robinson 1968Go; Sparks 2002Go; Sparks and Mays 1990Go). For one-dimensional (1-D) saccades the geometry of this process is trivial (e.g., Jürgens et al. 1981Go; Scudder 1988Go), because for 1-D rotations retinal error and motor error are equivalent. However, for real 3-D eye rotations, a visuomotor reference frame transformation is required to map RE onto the ME command (Crawford and Guittion 1997Go; Hepp et al. 1993Go).

The fundamental reason for this is that visual signals originate in the retina and are therefore defined in an eye-fixed frame, whereas Listing's law constrains the eye to rotate around certain axes defined in a head-fixed coordinate system (Crawford and Guitton 1997Go). As a result, simulations of a fixed mapping between the 2-D retinal error and the 3-D motor error command produce movements that diverge erroneously, as a nonlinear function of initial eye orientation. Moreover, because this arises from the geometry of projecting an eye-fixed spatial vector onto effectors that do not rotate about eye-fixed axes, essentially the same problem arises for visually guided gaze shifts involving both the eyes and head (Klier et al. 2001Go) and visually guided arm movements (Crawford et al. 2000Go), only with potentially much larger errors.

Behavioral studies have shown that the human saccade generator does not make such eye errors (Klier and Crawford 1998Go), even for express or memory-guided saccades (Henriques and Crawford 2001Go), but rather maps RE onto different saccade vectors as a function of initial eye orientation. In particular, it has been shown that one visual RE (measured in eye-fixed coordinates) must be mapped onto saccade trajectories with orthogonal components that diverge by as much as ±15° or more in the oculomotor range, depending on the size of the saccade and initial eye position (Crawford and Guittion 1997Go; Klier and Crawford 1998Go). Thus there must be some physiological mechanism that accounts for eye orientation in the visuomotor transformation.

In theory, this problem could be solved by fixing the eye muscle pulling directions with respect to the eyes, but this would violate Listing's law, producing large pseudo-random torsional positions at the end of each saccade that do not occur in real life (Crawford and Vilis 1991Go; Tweed and Vilis 1990a)Go. (Arranging neck and shoulder muscles in eye coordinates to solve the problem is even less plausible.) Instead, eye muscle pulling directions appear to be optimized for Listing's law, which requires saccade velocity axes to tilt halfway, not all the way, with eye position (Demer et al. 1995Go; Quaia and Optican 1998Go). This leaves the question, how does the brain implement the reference frame transformation required to map visual signals onto the kind of motor commands appropriate to provide accurate eye movements that satisfy the constraints of Listing's law?

One clue is that we may be able to sketch out the region within the brain where this transformation takes place. The superior colliculus (SC) appears to encode visual and motor signals much like the retina, in a 2-D topographic map (Hepp et al. 1993Go; Van Opstal et al. 1991Go). Moreover, stimulation of the SC in the head-unrestrained animal evokes gaze shifts with fixed directions in eye coordinates and, conversely, vary as a function of initial gaze position when plotted in space coordinates (Klier et al. 2001Go). This suggests that the SC encodes a gaze command in retinal coordinates, and the position-dependent reference frame transformation is implemented downstream. Further, this transformation is probably complete in the oculomotor system at the level of the premotor short-lead burst neurons because these appear to encode displacements in eye orientation within a 3-D head-fixed coordinate system aligned with Listing's plane (Crawford 1994Go; Crawford and Vilis 1992Go; Henn et al. 1989Go).

Thus the reference frame transformation for saccades appears to occur between the SC and the brain stem burst neurons (perhaps with help from side-loop structures like the cerebellum) (Lefevre et al. 1998Go; Quaia et al. 1999Go), although we have little clue as to how this might occur. In models of the saccade generator that touch on this problem, the reference frame transformation involves a stage in which the retinal goal of the saccade is transformed into a head-fixed desired gaze direction, from which the system could then compute the correct head-fixed kinematics to drive a 3-D saccade (Crawford and Guitton 1997Go; Glasauer et al. 2001Go; Tweed and Vilis 1990; Zee et al. 1976Go). This includes network models that used eye position "gain fields" in a distributed neural net to transform RE into a head-centric representation of target direction (Zipser and Andersen 1988Go). Such head-centric representations have never been observed physiologically in the cortical saccade control centers, the SC, or brain stem (Batista et al. 1999Go; Colby and Goldberg 1999Go; Colby et al. 1995Go; Russo and Bruce 1996Go; Sabes et al. 2002Go; Sparks 1988Go, 1989Go).

This leaves open the question of how the required reference frame transformations take place. Moreover, any inquiry into this question also has to consider that the same transformations must also deal with other computational problems, such as the degrees of freedom problem. Because the gaze command signal from the SC is 2-D (Hepp et al. 1993Go; Klier et al. 2003Go; Van Opstal et al. 1991Go), whereas the brain stem burst generator possesses a 3-D coordinate system that can generate eye rotations about any axis (Crawford and Vilis 1992Go; Henn et al. 1989Go), something between must select the right axis for Listing's law (which only allows saccades to reach eye orientations that are rotated from primary position about an axis in Listing's plane). As discussed above, the implementation of Listing's law places constraints on the visuomotor reference frame transformation. Most models have included a Listing's law box that calculates a desired eye orientation in Listing's plane (Crawford and Guitton 1997Go; Glasauer et al. 2001Go; Tweed and Vilis 1990a)Go.

Finally, the circuits that derive saccadic motor commands must also deal with the spatiotemporal transformation. The spatiotemporal transformation arises in the oculomotor system because the SC encodes saccade metrics with the use of a place code on a topographic map, whereas the reticular burst neurons must supply a rate code in the appropriate coordinate system to the extraocular muscles (Becker and Klein 1973Go; Robinson 1974Go; Westheimer and Blair 1973Go). A number of theoretical and physiological investigations have considered this problem (e.g., Hepp and Henn 1983Go; Optican and Quaia 2002Go; Tweed and Vilis 1990aGo), including the question of how this fits within dynamic feedback loops for saccade control that are beyond the scope of the present investigation. However, the physiological transformation from a topographic code to a coordinate system vector representation is probably inseparable from the oculomotor reference frame transformation.

The problem facing neurophysiologists trying to solve these questions experimentally, as we see it, is that we really do not know what type of signals to look for on a neuron-by-neuron basis. In fact, it is quite possible and likely that the solution is right at the tip of our electrodes and we do not know it. Most current investigators would agree that signal processing in the brain—even for the feedforward transformations in the brain stem—is a distributed process (Sparks 2002Go). Therefore one potentially fruitful approach is to train simple neural nets to perform these transformations, and then analyze the workings of such networks to gain insights that we can apply to the real system (Krommenhoek and Wiegerinck 1998Go; Zipser and Andersen 1988Go). If we can understand how a simple neural net works, where we have complete knowledge of the inputs, outputs, and architecture, as well as the ability to manipulate these values at will, then we might hope to understand the limited amount of data that one can collect from the real brain.

The current investigation applies this approach to understand how visual information encoded at the level of the SC (or any similar topographically organized, eye-centered structure) might be transformed into a geometrically correct signal to drive brain stem burst neuron populations. Note that the focus of our study was to see how such a network would implement the reference frame transformation, but the other transformations (spatiotemporal and 2-D to 3-D) were implicit in our network. For example, our networks produced a motor displacement code (initial motor error for a saccade) that could easily be converted into a rate code for instantaneous motor error, by placing it within a local feedback loop (Crawford and Guitton 1997Go). The networks described below do not perform this complete transformation, but do perform the transformation from a topographic "place code" to a coordinate system that could encode saccade dynamics. Similarly, our networks were required to convert a 2-D visual input into a motor command that actively maintained ocular torsion at zero (to comply with Listing's law). Thus these networks implicitly learned to solve the 2-D to 3-D transformation and one aspect of the spatiotemporal transformation, while also learning to perform the eye-to-head reference frame transformation that was the focus of our investigation.

Unlike most previous neural net studies of reference frame transformations (e.g., Krommenhoek et al. 1993Go; Zipser and Andersen 1988Go), we used the noncommutative mathematics of 3-D rotations in our input–output relations (Crawford and Guitton 1997Go; Tweed and Vilis 1987Go); and, unlike our previous study of this topic (Smith and Crawford 2001aGo), these inputs/outputs were biologically inspired, based on the known topography of visual signals in the SC and coordinate systems observed in the brain stem saccade generator. The results suggest that an understanding of visuomotor reference frame transformations may be obtainable within the framework of currently known physiology.


    METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 GRANTS
 REFERENCES
 
We trained a 3-layer, feedforward network model to execute the visuomotor transformation required for accurate saccades that obey Listing's law. Because of practical limitations, no mathematical model of brain function can be physiologically realistic in more than a few respects. Therefore one must choose the mathematical constraints that are most pertinent to the physiological questions being investigated. In this case, we emphasized realism in the spatial properties of the inputs and outputs of the network, in the hope that this would maximize the realism within the intermediate, less-understood spatial coding schemes.

Network architecture

Figure 1A shows a diagrammatic view of the network structure and the general nature of its input–output signals that were used in this study. Networks were composed of 3 layers: an input layer consisting of 513 units; a hidden layer with 4, 9, 16, 25, 36, or 49 hidden units; and an output layer consisting of 6 units. When referring to a specific size, or class, of network we use the number of hidden units as a nomenclature. Thus a network with 9 hidden units will be referred to as a 9-unit network. All networks had feedforward connections only (i.e., no connections existed between any units within a layer).



View larger version (24K):
[in this window]
[in a new window]
 
FIG. 1. A: topography of neural network model. "Visual Signal": visual input signal to the network is specified on a 2-D topographic map as sensed by input unit with varying widths of Gaussian receptive fields (RFs). "Eye Position Signal" is represented by a 6-D vector, the components of which are arranged in a push–pull setup (by yoked pairs of values designed to mimic the push–pull arrangement of the extraocular muscles). These values specify eye position (EP) as the rotation around an orthonormal 3-D Cartesian coordinate system. "Motor Error": a 6-D output vector of the same format as the EP input. Values specify the 3-D vectorial motor error (ME) vector required to perform a saccade that obeys Listing's law. B: details of visual input representation. 2-D topographical map extended from ±90° in both the vertical and horizontal dimensions ("floor" of the figure). Intersection of the axes indicates the straight-ahead primary reference position. Visual input to the network was limited to ±80° (large circle). Density of input unit placement on this map (solid dots) became increasingly sparse with increasing distance from center. Two Gaussian surfaces are shown that represent the RF structure centered on 2 input units at different topographical locations. Note that coverage of the RF (contour map projected onto "floor" of figure) increases with distance from the origin; however, maximum activity level was always 1. Visual targets could be specified to within 1° of accuracy anywhere on the map.

 
In addition, all units between layers were fully interconnected, with each input unit connected to every hidden unit, and every hidden unit connected to every output unit. Between the input-to-hidden layer a multiplicative weight matrix allowed networks to change the connective strengths of the input-to-hidden unit projections. A similar weight matrix served the same function for the hidden-to-output unit projections. These weight matrices (combined with a learning rule) are what allow the networks to learn (see Network training below).

Input–output signals

Networks were provided with 2 input signals: 1) an eye position (EP) signal specifying initial eye position in orthogonal 3-D head-centric coordinates, where each EP was constrained to be within a 100° diameter ocular motor range (OMR); and 2) a retinal error (RE) signal, in eye coordinates, that was specified on a 2-dimensional (2-D) map and was constrained to be within 80° of map center. In addition, the target location specified by the RE was constrained to be within 50° of primary position so that the model would not attempt to saccade beyond the oculomotor range. The teaching signal specified the vectorial motor error (ME) required to drive the eye from its current position to the one specified by the visual target location, such that the target was acquired by gaze and final EP landed in the normal "Listing's plane" range of 3-D eye positions observed during head-fixed saccades (Crawford and Guitton 1997Go). That is, we chose a teaching signal such that the head-fixed torsional component of EP was set to zero. However, this does not invalidate the 3-D aspect of the model because this 3rd degree of freedom must be as strictly controlled by the networks as the other two.

Visual inputs

The most important input to the saccade generator for visually guided saccades is vision itself. For physiological inspiration, we based the properties of the visual inputs to our network on neurophysiological recordings from the superficial layers of the SC. We chose this structure because it probably encodes visual information to be used by the saccadic system in a form that is relatively unprocessed.

Figure 1B shows a magnified representation of the visual inputs to the network as illustrated in Fig. 1A. The "floor" of the figure represents the full extent of the virtual visual map in the vertical and horizontal dimensions (–90° to 90°). As shown, the input units to the network were topographically arranged where the density of input units decreased with distance from the center of the map. That is, the input units monitored a central region covering 2° of extent with a spacing of 1°, whereas adjacent annular areas covering 4°, 10°, and ≤80° of extent encoded resolutions of 2°, 4°, and 10°, respectively (solid dots). Note that because of the viewpoint in this figure, the units covering the central 1° and 2° areas coalesce into a homogeneous representation. Each of the input units had an eye-fixed visual receptive field determined by a Gaussian function with a maximum activity level of 1 at its center.

Also shown is a vertically displaced pictorial representation of 2 Gaussian receptive fields at 2 different distances from the center of the map, one at horizontal and vertical coordinates of (–20°, 20°) and the other at (40°, –40°). Projected onto the map is the corresponding contour outline of each of the 2 Gaussian curves' receptive fields. As illustrated, the width of each input unit's receptive field depended on its (virtual) topographical location. So, for example, the unit located at (20°, 20°) has a receptive field width (sigma) of 7.8°, whereas the unit located at (40°, –40°) has a receptive field width of 14.8°. In general, the sigma value for the Gaussian receptive field of the units ranged from 1° to 20° with increasing eccentricity from the center of the map. This approach was intended to mimic the larger receptive field sizes of eccentric locations found in sensory structures such as the SC and the retina. The receptive field widths used here were estimated from data presented in Cynader and Berman (1972)Go.

Visual target inputs to the networks were computed from 2-D vectors that specified the horizontal and vertical integer components of peripheral targets on the virtual 2-D map. The vector components of the visual targets were specified to the nearest degree and covered an area within an 80° visual range (large circle). Each of these "stimuli" then activated a number of the visual input units, depending on the location of the stimulus on the virtual topographic map and the width of the Gaussian receptive field of the units in that part of the map.

Eye position inputs

Based on our previous studies (Crawford and Guitton 1997Go; Klier and Crawford 1998Go; Smith and Crawford 2001Goa) we knew that our network would require a 3-D EP input to solve the reference frame transformation for saccades. This could come from proprioceptive sensors in the eye muscles, but most oculomotor models assume that it arises as an efference copy from the oculomotor "neural integrator" that provides the motor signal to hold eye position between saccades (Crawford 1994Go; Quaia et al. 1999Go; Robinson 1975Go; Tweed 1997Go). However, no previous model of the saccadic visuomotor transformation that we know of has used a realistic representation of these 3-D neural integrator signals.

Most recent studies of 3-D eye movements have described eye orientations using vectors that are parallel to the axis of rotation that brings the eye to that position from some central reference position. Listing's law states that during saccades with the head fixed, these eye orientation vectors all align with a head-fixed plane (Tweed and Vilis 1990). The gaze direction at one particular reference position, the primary position, is orthogonal to the special plane of vectors, called Listing's plane, and parallel to the head-fixed torsional axis (Westheimer 1957Go).

The oculomotor neural integrator is a distributed structure with a critical region for horizontal integration occurring in the nucleus prepositus hypoglossus (Cannon and Robinson 1987Go; Cheron and Godaux 1987Go) and a critical region for vertical/torsion integration in the midbrain interstitial nucleus of Cajal (Crawford et al. 1991Go; Fukushima et al. 1990Go). Unlike the sensory codes found in visual structures, the brain stem neural integrator does not use a topographic map but rather represents similar directions in clustered populations, each controlling individual coordinate axes. Experimental studies suggest that the 3-D oculomotor neural integrator is organized in a head-fixed coordinate system similar to that of the eye muscles, but aligned with Listing's plane. That is, with the vertical axis (for the horizontal component of eye orientation) aligned within Listing's plane, and the torsional–vertical coordinate axes aligned in the horizontal plane orthogonal to Listing's plane (Crawford 1994Go; Crawford et al. 1991). Thus to encode a purely horizontal position, only the horizontal integrator needs to be activated, but the torsional–vertical coordinates are arranged in such a way that, e.g., an upward EP would require the coactivation of an up-counterclockwise population of neurons on the left side of the midbrain and an up-clockwise population on the right side (Crawford et al. 1991).

To mimic the geometric output of such a structure, we started with a 3-D coordinate system for angular orientation of the eye (torsional, vertical, and horizontal) in orthogonal, right-handed head-centric coordinates, and using the right-hand rule to define the direction of rotation (Tweed and Vilis 1990bGo). These coordinates were then rotated 45° about the vertical axis to form a new coordinate system (Fig. 1A: "Eye Position Signal") symmetric about Listing's plane as described above (Crawford 1994Go; Crawford et al. 1991Go).

A second physiological constraint is that real neurons do not have negative firing rates. To represent "negative directions" the vestibuloocular system, including the neural integrator, has developed a push–pull mechanism where positive and negative values correspond to modulations up or down about a background firing rate (Fukushima et al. 1990Go, 1992Go; King et al. 1981Go). To mimic this, we transformed the 3-D EP vector into a 6-D vector that represented the angular rotation around each of the 3 axes by a yoked pair of components. Component pairs 1 and 2, 3 and 4, and 5 and 6 encoded the torsional, vertical, and horizontal components of rotation, respectively (in right-hand coordinates). Each component ranged between 0 and 1, where a balance of activity, indicated by the values (0.5, 0.5), for a single pair, represents 0° rotation around a particular axis.

In calculating the values for each of the 6-D components we used the following formulation

where S1 is the 1st component of a pair, S2 is the 2nd component of a pair, and TL represents target location in degrees along the appropriate axis. Note again that the 100° OMR referred to here is the entire width of the oculomotor range (±50° of center). This formula allowed the model to perform saccades where initial EP and paired RE could result in saccades as large as 80°.

Motor outputs

We wanted the motor output signal of the network to represent the 3-D motor error command that drives all reticular formation short-lead burst neurons (one synapse up from both motoneurons and the neural integrator) at the beginning of saccades. Little is known about the complex synaptic physiology of the inputs to short-lead burst neurons. These may include direct input from the SC, from "long-lead" burst neurons in the reticular formation, burst-driver neurons, and vestibular-related inputs (e.g., Hepp and Henn 1983Go; Kaneko and Fukushima 1998Go; Moschovakis and Highstein 1994Go; Schnyder et al. 1985Go; Sparks 1988Go, 1989Go; Stanton et al. 1988Go). Therefore it is easier to define the physiological correlate of this command according to its target (i.e., as the command coded by the total ensemble of synaptic input to the short-lead burst neurons), which are much better understood. Equivalently, one can view this command as representing the initial motor error encoded by the total ensemble of short-lead burst neurons at the commencement of a saccade.

We encoded motor error as the vectorial change in eye orientation ({Delta}E) required to move the eye from its initial 3-D orientation (Ei) to the final desired orientation (Ed) and takes the form {Delta}E = Ed – Ei. This command has been shown (Crawford and Guitton 1997Go) to be the appropriate code to drive the oculomotor short lead burst neurons if the latter code is "rate of change in 3-D eye orientation" & Edot; (Hepp et al. 1994Go; Quaia and Optican 1998Go; Suzuki et al. 1995Go). This burst signal has been shown in simulations (Crawford and Guitton 1997Go) to work equally well at driving a 3-D plant model with head-fixed muscle pulling directions (Tweed and Vilis 1987Go) or in a "pulley" plant model where the eye muscle pulling directions tilt according to the half-angle rule for Listing's law (Demer et al. 1995Go; Quaia and Optican 1998Go). For the head-fixed plant model, the burst signal has to be modified downstream by position signals (Crawford 1994Go). That is, if the burst neurons encode angular velocity in head coordinates (explicitly coding the axis tilts for Listing's law) then the reference frame transformation problem is magnified, producing twice the deviation between the retinal code and the motor code as a function of eye position.

We then needed to choose a coordinate system to encode this command. Again, because little is known about the way that short-lead burst neuron inputs code the spatial aspects of motor error, we took our cue from the short-lead burst neurons themselves. These neurons are arranged into populations within anatomic nuclei that appear to use a spatial coding scheme similar to that of the neural integrator; that is, they control saccade components in a similar way to the semicircular canals and eye muscles, forming a set of coordinate axes that are aligned with and symmetric to Listing's plane (Crawford and Vilis 1992Go; Suzuki et al. 1999Go).

Like the neural integrator, saccade burst neurons are arranged into pairings that control opposite saccade directions. There are also indications that these neurons have a "background firing rate," in the sense that their firing rates are well above zero for saccades in the direction orthogonal to their preferred direction (Cullen and Guitton 1997Go; Van Gisbergen et al. 1981Go). In this sense, burst neurons also appear to use a push–pull arrangement to linearize their output within a certain range. Therefore taking these as clues as to what might be the appropriate signal to drive such neurons, we coded motor output using the same 6-dimensional coordinate system that was used for the neural integrator.

For example, to encode a 15° purely upward eye movement, which would be represented by the 3-D rotation vector (0, –15, 0) in standard right-hand coordinates, in our 6-D vector components 1 and 2 evaluate to (0.5, 0.5), whereas components 3 and 4 code (0.425, 0.575) and components 5 and 6 encode (0.5, 0.5). Similarly, an 80° rightward movement from an initial eye position of, say, 50° to the left of center, would be encoded by the activations (0.5, 0.5), (0.5, 0.5), and (0.1, 0.9). Note that saccades of this amplitude were the maximum allowable because supplied RE never exceeded 80° and initial EP was always within 50° of center.

Network training

Learning in the networks was accomplished by using the standard back-propagation algorithm (Rumelhart et al. 1986Go) with the addition of a momentum term (10%). Weights were updated incrementally, that is, weights were updated after the presentation of each exemplar (for a more complete description of this learning algorithm, see Smith and Crawford 2001aGo).

Briefly, learning in the network occurs when each output unit computes its error term (E) from the teaching signal (the mean squared error between network output and the desired output), computes its weight correction term {Delta}Wij (by multiplying the input to the unit [f(i)], the error computed above), which is used to update the weight matrices, and then multiplies the weight correction term by a learning constant ({beta}) that, in these networks, was set to 0.5. That is

The error term is then passed to the previous (hidden) layer. Each unit in the hidden layer sums its error inputs [from the subsequent (output) layer] and calculates its error and weight correction terms. The output layer and the hidden layer then update their weight matrices based on their weight correction terms {Delta}W. Of course, the error is passed through the same weights that generated the current output pattern (before the update of the weights). Thus units that contributed more to the error in the pattern will be changed more by the weight update procedure, whereas those units contributing little to the current pattern error will be changed little.

In our networks, the output signals of all units were positive real values ranging from 0 to 1. This output constraint was accomplished by using a standard sigmoid transfer function, specified by

where {gamma} is the range of the sigmoid (0 to 1); to achieve this range, {gamma} = 2 (because the maximum {eta} is later subtracted). {sigma} is the slope of the linear portion of the sigmoid ({sigma} = 1); {eta} is the maximum of the sigmoid ({eta} = 1). (For a complete mathematical description of back-propagation see the APPENDIX of Smith and Crawford 2001.)

To train the networks we used a series of combinations of initial eye positions and retinal errors selected randomly from the following training set: Initial eye positions were laid out in a horizontal–vertical grid with 20° spacing, centered on (0, 0) and limited to fall within the round 50° OMR. Initial eye positions always had zero torsion in Listing's coordinates (i.e., they were in Listing's plane). Each initial eye position had associated with it ≤56 REs where such eye position and retinal error pairings would not exceed the 50° OMR. These REs were chosen randomly from an array of directions resembling a "star pattern" (imagine a cross superimposed on an x), such that all of the cardinal and oblique directions were represented in the training set. For amplitude, each direction had ≤7 REs located at 2°, 5°, and then in 10° increments from 10° through 50°. (In preliminary trials we found that training on the small retinal errors <10° was necessary to avoid bizarre behavior in very small saccades.)

To compute the ideal motor error (ME) for these visual and eye position combinations we followed the algorithm outlined in Crawford and Guitton (1997)Go, also used in our previous study (Smith and Crawford 2001). This algorithm provides the motor error ({Delta}E) required to take the eye from its current 3-D orientation to the orientation in Listing's plane that satisfies the desired 2-D gaze direction. In brief, using quaternion representations, we first computed the desired gaze relative to the eye by converting the 2-D visual signal into a desired gaze signal in eye coordinates (DGeye). Using this convention, the first and second components of 3-D gaze were the vertical and horizontal measurements of RE and the third component was a forward-pointing unit vector. We then rotated DGeye by initial eye position (Ei), which results in the desired 2-D gaze relative to the head (DGhead). Next DGhead was put through a Listing's law operator (Tweed and Vilis 1990), resulting in a desired 3-D eye position command (Ed). Finally, the ME was then computed by subtracting Ei from Ed by first converting the quaternion representation to vectors while maintaining the same coordinate system. These vectors were then scaled as a function of the angle of rotation and converted into the 6-D format required by the neural networks. Thus the error signal for the back-propagation algorithm was the difference between these computed ideal values and the actual values output by the network.

We initially trained 12 networks with the following increments of hidden unit number: 4, 9, 16, 25, 36, and 49, training 2 of each size to determine the minimum network size required to learn the task. We defined this as the minimum number of units required to successfully reduce the sum of squared error of the network output to <0.01. This training goal was chosen because it resulted in a network output within about 1° of ideal performance. Figure 2 shows the error curves generated by a representative network of each size as indicated by the associated number. The y-axis represents the ongoing error (in degrees) during training, whereas the dashed line indicates the training goal. The x-axis indicates the number of epochs (where an epoch indicates one pass through the entire training set). Note that, although networks typically started with errors of >0.25 (about 20°), values above this are cut off the scale of the figure. All networks quickly reduced their training error to below 0.25 in an initial near-vertical drop. After our initial observations, we tried smaller increments of network size to establish the exact threshold.



View larger version (20K):
[in this window]
[in a new window]
 
FIG. 2. Error curves during training: x-axis indicates number of epoch trained, where an epoch is a single pass through the entire training set; y-axis indicates the training error (sum of squared error) between current and ideal performance. Dashed line at 0.01 indicates the training goal (equivalent to behavioral performance of 1° within ideal). Numbered arrows point to the error trace of an exemplar network with the specified number of hidden units. Error traces that reached the training goal have solved the visuomotor transformation. Note that networks with <8 hidden units could not solve the problem.

 
We found that the lower limit for the number of hidden units required to reach the training goal was 8. The 8-unit network (Fig. 2, #8) reached this goal in about 20,000 training epochs. This is indicated by the error curve intersecting the training goal line at this point. Networks with a greater number of hidden units also solved the visuomotor transformation; however, networks with <8 hidden did not. For example, the 4-unit network did not reach this training goal even after extended training (about 45,000 epochs). This network was deemed unable to solve the visuomotor transformation because the estimated remaining epochs (which continued to rise) was in excess of 1 million at the time training was stopped. Similarly, 6- and 7-unit networks were also unable to reach the training goal. After this, we successfully trained a total of 30 networks (10 of the 9-unit networks and 5 each of the other classes: 16-, 25-, 36-, and 49-unit networks). The internal structure of all these networks was analyzed, but for sake of simplicity in presentation, most of the following results focus on 9-unit networks.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 GRANTS
 REFERENCES
 
Behavioral performance

Because all of the networks that we accepted for analysis generated saccades accurate to within 1° of ideal performance on average, clearly they were able to generate essentially accurate saccades. Here we checked certain crucial aspects of their performance before considering their "neural coding mechanisms." In particular, we carefully checked the position-dependent aspects of their performance related to the reference frame transformation.

The saccade reference frame transformation requires that the network map any one RE onto different MEs as a function of EP (Crawford and Guitton 1997Go). To determine how well the networks performed compared with ideal behavior, we examined the networks with a test set consisting of initial eye positions at 5° intervals from 40° left to 40° right with 0° centered at the straight-ahead eye position. Each position was tested with a 30° upward and a 30° downward RE (Fig. 3A). Another similar test set was also used except that initial eye positions ranged from 40° down to 40° up where each position was tested with a 30° leftward and a 30° rightward RE (Fig. 3B). Thus there were 16 representative eye positions and 32 retinal errors associated with each of these tests. These combinations of eye positions and retinal errors were chosen so that the requisite movements would remain inside the 50° oculomotor range within which the networks were trained.



View larger version (38K):
[in this window]
[in a new window]
 
FIG. 3. Behavioral performance test of a 9-unit network. A: vertical saccade task. Network was tested with 30° vertical retinal errors (REs) from 30° down to 30° up (vertical axis) from 16 initial EPs that ranged from 40° left to 40° right in 5° increments. However, only every other EP is shown (horizontal axis: indicated by open squares). Origin was centered on the straight-ahead primary reference position. Large-diameter circle indicates the limit of the oculomotor range (OMR) (±50°). Dashed lines (ending in a star) indicate the general direction of the supplied RE. Solid lines with open circles represent the ideal ME, whereas solid lines with closed circles represent actual network output. Note that actual network output closely follows ideal ME in an eye-position–dependent deviation from supplied RE and often obscures it. B: horizontal saccade task. Same conventions as above, except that REs are arrayed along the horizontal axis, whereas EPs are aligned with the vertical axis. Again, actual ME from the network closely aligns with ideal ME and often obscures it. CF: comparison of Network, Ideal, and No-Position-Compensation for a 9-unit network. For all graphs, the x-axis indicates the range of eye positions tested, whereas the y-axis indicates the angular difference between either the actual and ideal performance (—), or ideal performance and no-position-compensation. A counterclockwise angle moving from actual to ideal was measured as a negative angular error, whereas a clockwise angle would was classified as a positive angular difference. C: upward saccades as in A. D: downward saccades as in A. E: leftward saccades as in B. F: rightward saccades as in B. Open circles: actual performance for the same network tested in A and B. Dashed line indicates the predicted curve if 3-D eye orientation is not taken into account (Crawford and Guitton 1997). Horizontal axis represents ideal performance. Second-order fits to actual performance from all other 9-unit networks (light lines) show that residual error varies nonlinearly with eye position in a nonsystematic manner, but follows ideal model more closely than the no-position-compensation.

 
Figure 3 shows the results of this test for a representative 9-unit network, providing vector representations of the resulting saccade displacement commands from each initial eye position (it would be a trivial matter to hook these up to the appropriate brain stem and plant model to generate realistic trajectories and dynamics; Crawford and Guitton 1997Go). For illustration purposes we show only half of the tested eye positions in increments of 10° rather than 5°. In Fig. 3A, we show the vertical saccade task (horizontal eye positions paired with vertical REs), whereas in Fig. 3B we show the horizontal saccade task (vertical eye positions paired with horizontal REs). In both plots, the origin represents the primary position as defined by Listing's law, whereas the large circle represents the 50° OMR within which the networks were trained. The traces within each graph depict the RE of the supplied eccentric target (dashed trace with star), the ideal ME performance based on the correct geometry (—{circ}—), and the actual ME output of the trained network (—{bullet}—). The RE is plotted in retinal coordinates, whereas EP and ME are plotted in head-centric coordinates.

Note that RE (dashed trace with star) and ideal ME (—{circ}—) diverged from one another in a position-dependent fashion, as demonstrated previously both theoretically and experimentally (Crawford and Guitton 1997Go; Klier et al. 1998; Smith and Crawford 2001bGo). This is the position-dependent reference frame transformation that the network had to learn: if it did so, actual performance (—{bullet}—) would follow the ideal ME vectors (—{circ}—), whereas the absence of any position-dependent transformation would cause the actual network output to follow the RE vectors (dashed lines ending in a star). The actual output of the network closely followed that of ideal performance, and in these graphs often occlude it, with a mean error for this network in this task of 0.55° (SD: 0.43°). For the horizontal saccade task (Fig. 3B) the mean network error was 0.67° (SD: 0.37°). Across both of these performance tests the network's mean error was 0.62° (SD: 0.40°). Thus this network learned the position-dependent visuomotor transformation within the 1° of error required by our training goal, similar to the performance observed in human saccades (Klier et al. 1998).

To quantify performance across networks, we compared network performance with that of a model that does not take eye position into account, but that simply maps RE onto ME without accounting for the different reference frames (Crawford and Guitton 1997Go). Figure 3 CF, shows the results of this comparison for all 9-unit networks. The open circles represent the response of the previously tested 9-unit network (A/B), whereas the thin 2nd-order fits illustrate the responses of the other 9-unit networks. Plots C and E show the vertical saccade task, whereas D and F show the horizontal saccade task, both corresponding to the graphical depictions in the left column. In each panel, the values along the x-axis correspond to these initial eye positions, whereas values along the y-axis represent the angular difference (in degrees) between the directions of ideal motor performance and either actual network performance or the values predicted for the no-position-compensation model. Perfect position compensation would result in all performance values aligning with the abscissa. The thick dashed line represents errors in the predicted motor performance of the no-position-compensation model.

Residual position-dependent directional errors in the actual performances were nonsystematic and small in all of the networks. A performance estimate of the residual errors across these networks revealed that network outputs on average were 16% of those predicted by no eye position compensation in the vertical saccade task (Fig. 3 C and E,), whereas network outputs in the horizontal saccade (Fig. 3 D and F,) task were 17% of the "no position compensation" prediction. Similar patterns of behavior for 16-, 25-, 36-, and 49-unit networks were observed (not shown), but with even lower residual errors. Moreover, errors in torsional eye position from Listing's plane (not shown) were so low as to be uninteresting: they were essentially zero for networks with <36 units and <0.03° in the larger networks.

Visual receptive fields of hidden units

Having established that our biologically inspired networks had learned the correct transformations at the behavioral level, we set out to determine the "neural mechanisms" for these transformations. To characterize the visual receptive field of a saccade-related neuron, investigators systematically move pinpoints of light to cover the visual space and record when the target neuron is active (Hamed et al. 2001Go; Hubel and Wiesel 1959Go; Russo and Bruce 2000Go). In this way, a visual response profile of the target cell can be constructed. We followed a similar procedure to characterize the visual receptive field of each hidden unit; that is, we sequentially "stimulated" every possible target location on the visual input map with a unit stimulus and recorded the output of each hidden unit (while eye position was fixed at center).

The resultant visual response profiles for a typical 9-unit network are shown in Fig. 4A, showing all 9 hidden units (1–9). (The same network is used in subsequent figures to facilitate comparisons between figures.) For completeness, we show the entire visual map that ranges from –90 ° to 90° in both the horizontal and vertical dimensions, although the actual visual range of a unit was restricted to ±80° (white circle overlaid on the receptive field of unit 1). Clearly, the visual receptive fields of the hidden units are complex with regions of highest sensitivity (red) corresponding to the unit's preferred direction, regions of lowest sensitivity (black), and graded transitional regions between these. The complexity of these visual receptive fields is reminiscent of those found in the lateral intraparietal area (e.g., Hamed et al. 2001Go).



View larger version (85K):
[in this window]
[in a new window]
 
FIG. 4. Visual receptive fields of a 9-unit network. A: 9 graphs (numbered 1–9 on the upper-left corner of each plot) correspond to the visual receptive field of the numbered hidden unit. Entire visual map is displayed and ranges from ±90° as indicated by the axes of graph 1. Also indicated on graph 1 is the limit of the visual range of the network (large-diameter white circle). Colored regions indicate the visual response of the particular unit to a target presented at that location. Note that these receptive fields have complex shapes and regions of higher activity (red to red-black shading) and regions of lower activity (blue to blue-black shading), representing preferred and nonpreferred directions, respectively. BI: receptive fields derived by remapping the original visual receptive fields into the space defined by an orthogonal basis set produced by a principal-components analysis. Of the 4 receptive fields displayed, BE (9-unit network) and FI (36-unit network) account for about 95% of the variance in the data. First 2 fields (B and C; F and G) for each network account for about 80% of the variance in the data. First 2 remapped receptive fields show a simplified preferred nonpreferred organization, whereas the remaining 2 (D and E; H and I) show a more complex structure similar to the original receptive fields.

 
As a means to summarize these data and uncover any underlying structure across receptive fields, we performed a principal-components analysis, which reduces redundancy in a data set by computing a new set of variables called principal components. Each principal component is a linear combination of the original variables and is orthogonal to the other principal components. In constructing these new variables, the new data set contains no redundant information. Thus this procedure seeks to explain the variance in the original data set with fewer variables and ensures that the principal components form an orthogonal basis set spanning the space of the original data. We used the Matlab principal-components procedure "princomp" that returns, among other things, a vector of the variance accounted for by each of the principal components. We then simply computed the percentage of variance accounted for by each of the principal components using the formula: Variance explained = 100 x variance/sum(variance).

We performed such a principal-components analysis with the eye looking at the straight-ahead primary reference position. The original receptive fields were then remapped into the new data space formed by the principal components. Figure 4 BI, shows the results of this analysis for exemplary 9-unit (left column, BE) and 36-unit (right column, FI) networks.

In both of these networks (and across all 9 of the networks tested: 5 of the 9-unit class and 2 each of each of the 16-, 25-, 36-, and 49-unit classes) 4 principal components accounted for about 95% of the variance in the data (80% by the first 2 and 15% by the remaining 2). The dominant, first 2 components (Fig. 4 B, C and F, G,) always showed a fairly simple organization. First, each component had an antipodal structure with oppositely tuned maximal and minimal response zones. Second, the orientations of the grading for the first (B and F) and second (C and G) components were orthogonal. This appears to provide the basis coordinates for specifying target direction in 2-dimensional space. The other 2 remapped receptive fields (D and E; H and I) showed a more complex structure capturing some of the complexity in the original receptive fields. Again, these 4 components captured 95% of the visual target data required to produce accurate saccades, but this is not to imply that only a few units in the trained networks do the majority of the work with the remaining units contributing little to the solution. In the complete networks this information is distributed across all of the units and is not present in this orthogonalized and optimized form. Also, as we shall see, it is the interaction between different units that is critical for the overall performance of the model.

Reference frames for the sensory signal

The previous section describes the visual receptive fields of hidden units at one, fixed central eye position. To determine the reference frame for these visual receptive fields (i.e., are they fixed relative to the eye, relative to the head, or some other alternative) we had to retest these visual receptive fields at different eye positions. Figure 5 shows the visual receptive field surface of hidden unit 6 from the same 9-unit network illustrated in Fig. 4; however, now the network was tested with eye position fixated at 4 different eye positions: 30° up, down, left, and right. It is important to note that this receptive field map is plotted in retinal coordinates; that is, responses are mapped according to the locations of the stimuli relative to the fovea, in eye-fixed coordinates. If the visual receptive fields were fixed in head (or space) coordinates, they would shift in these plots in the direction and amount opposite to the eye position shift. However, no such shift was observed. If one observes the key topological features in the receptive field, such as the location of maximal and minimal response areas, one can see that the hidden unit visual receptive field appears to stay absolutely fixed relative to the simulated retina.



View larger version (61K):
[in this window]
[in a new window]
 
FIG. 5. Reference frame (RF) of hidden unit visual receptive fields. AD: visual receptive field of unit 6 at 4 different eye positions. A: 30° up; B: 30° down; C: 30° left; D: 30° right. Large white circle (A) shows range of visual information available to the network during training. E and F: location of peak activities at the 4 locations indicated by small (overlapping) circles, where the smallest indicates the upward EP progressing to the largest, which indicates the rightward eye position. E: results for the 9-unit network. F: (representative) results of a 16-unit network. Note that the peak of activity under the different eye positions align exactly, indicating that these visual receptive fields are in eye-centered coordinates.

 
To quantify this observation across units and across networks, we calculated the location of the maximum response to a visual stimulus for each hidden unit, at each of the same 4 eye positions tested in Fig. 5 AD, (30° up, down, left, right). Recall that during training, visual input was only allowed within circular ±80° range of the "fovea," so we tested only for maximal activity within this behaviorally relevant range. This range is shown by the large circle in Fig. 5E, along with the visual maximum points (small circles) for the 9 hidden units shown in Fig. 4, again plotted in eye-fixed coordinates.

Note first, that these maximum points were almost always found at the edge of the visual range, resembling the "open-ended" response fields observed in some SC units (Freedman and Sparks 1997Go). More important, we have coded the maximal response points for each of the 4 eye positions as a different sized circle. Note that each set of circles forms a perfectly overlapping set of concentric rings; in other words these points overlap perfectly in eye coordinates with absolutely no change in direction or distance relative to the central fovea. A similar result for a 16-unit network is shown in Fig. 5F, and indeed we found the same result held for all networks. Thus the topological organization of the visual receptive fields in our hidden units was always absolutely eye-fixed. The central remaining question here—as in real neurophysiology—is: how do these eye-fixed visual receptive fields get mapped onto the correct saccade vector in motor coordinates?

Relationships between visual, eye position, and motor sensitivity vectors

In our previous network (Smith and Crawford 2001aGo), which solved the identical geometric problems using a simpler vector-only input–output organization, the hidden units formed certain invariant functional classes that subdivided the transformation into specific parallel-task modules. For example, a dominant class of units, called the vector propagation class, were organized into an orthogonal coordinate system where the retinal error tuning vector and the motor error tuning vector were more or less aligned, which was shown to provide the main drive that moved saccades in the correct direction. We also showed that the eye-position–dependent modification of motor error was supplied by a smaller module of units, called position-opposite, because eye position and motor tuning were more or less opposite in direction. This modularity was revealed when we computed a series of "sensitivity vectors."

Because this was a previously fruitful approach we began with the same type of analysis here. That is, we looked for a similar organization within and between the visual, eye position, and motor error output of the hidden units by constructing a similar series of sensitivity vectors. That is, we constructed 1) a sensitivity vector for maximal and minimal visual activity, 2) a sensitivity vector for eye position tuning, and 3) a sensitivity vector for motor tuning. The visual sensitivity vector was simply the vector from the origin to the location of the maximum and minimum values in the visual receptive field (see Fig. 4A for an example of the visual receptive fields of a 9-unit network). Other methods, like the calculation of the "center of mass" of these visual activation fields, did not appear to provide better information.

The eye position sensitivity vector was the computed 3-D vector coded by the 6 activation weights of the appropriate inputs to the hidden units (the last 6 weight values of the input to hidden unit weight matrix). To accomplish this, we first performed the inverse of the transformation that we originally used to convert the 3-D eye position vector into a 6-D representation (see above). Operating on the resulting 3-D vector, we then counterrotated it to restore it into standard Cartesian coordinates.

For the motor sensitivity vector we used the weights between the hidden and the output layers. That is, we computed the direction and magnitude of the motor sensitivity vector for each hidden unit based on its projection weights to the output layer using the same inverse procedure as described above for the eye position activations (also, see METHODS and Smith and Crawford 2001aGo). The horizontal and vertical components of these sensitivity vectors are plotted in Fig. 6 for each of the 9 hidden units in the network.



View larger version (28K):
[in this window]
[in a new window]
 
FIG. 6. Sensitivity vectors of visual receptive fields for 9-unit network. Nine graphs (numbered 1–9 on the upper-left corner of each plot) correspond to the sensitivity vectors of the numbered hidden unit. Solid trace with square: 2-D vector indicating the maximum visual response. Dashed trace with square: 2-D vector indicating the minimum visual response. Solid trace with circle: 2-D vector indicating the direction of motor tuning. Solid trace with hexagon: 2-D vector indicating the direction of eye position tuning. Eye position and motor tuning vectors were constructed using the appropriate hidden layer weights (see text). In addition, these tuning vectors were normalized to the ±80° visual range (indicated by large-diameter circle overlaid on graph 1). Thickness of the red trace is a visual guide to the amount of visual activity at the peak (in a range from 0 to 1). Thickness of the blue and black traces is a visual guide to the relative magnitude of the hidden layer weights (synaptic strength) for these vectors. Note that there is a variety of angular relationships between the vectors with no simple modularity apparent.

 
In this study we found no groupings of units into functional modules or coordinate systems for any of the networks. For the illustrated 9-unit network, the maximal (solid trace with square) and minimal (dashed trace with square) activity centers were essentially opposite in direction with a mean angular difference of 165° (SD: 14°). This confirms the antipodal organization observed in our principal-component analysis (Fig. 4). We also failed to find a dominant class of units organized into an orthogonal coordinate system with nearly aligned visual and motor tuning (i.e., where the difference between maximum visual activity and motor tuning was ≤10°). For example, an inspection of units 2, 3, 4, 6, 8, and 9 shows little alignment between visual and motor tuning [mean of 93° (SD: 62°)], whereas the remaining units (1, 5, and 7) show a somewhat better alignment with a mean angular difference of 47° (SD: 26°).

In addition, Smith and Crawford (2001a)Go found a clear subclass of units with opposite motor and eye position tuning (i.e., with the difference between motor and eye position tuning ≥150°). However, between motor and eye position tuning vectors in the illustrated network we found a mean relative difference of 76° (SD: 59°). As well peak visual tuning and eye position tuning vectors showed a mean relative rotation of 82° (SD: 69°), unlike Smith and Crawford (2001)Go. Despite intensive scrutiny, we were unable to discern any clear-cut functional relationships or groupings in the angular relationships between visual, motor, and eye position tuning vectors of this or any other network. Rather, we found that units had widely distributed visual, motor, and position tuning, with no clear functional relations between them. Thus although the receptive fields are oriented across units, they do not align to form a "coordinate system" (such as ordinary or rotated Cartesian coordinates ("+" or "x" in form) as was found in Smith and Crawford (2001)Go.

The same conclusions remained when we approached this question more formally. A visual and quantitative cluster analysis (using standard Matlab routines) did not reveal the grouping characteristics seen in Smith and Crawford (2001)Go. The relationships between the sensitivity vectors found here were rather loose and broadly tuned in this and in all 9-unit networks. It would appear that these networks, in simulating the distributed nature of the input and output signals, performed the visuomotor transformation in a more distributed manner than that evidenced previously in Smith and Crawford (2001)Go. (Also, see on-line supplementary Fig. 1.) In light of this, our subsequent analysis focused on properties of the network that might give rise to this more distributed solution.

Motor coding in the hidden unit layer

As stated above, our hidden units showed widely dispersed motor tuning. Our next step was to characterize the position dependency of this motor tuning. In particular, we determined the reference frame and coding of the hidden unit motor output: the contribution that activation of each unit makes to the behavior of the network. This is determined by the final connection weights of the hidden unit to the output layer (at least within the linear working range of the output layer, corresponding to saccade components of ±60°). In fact, because we know that the output layer codes a fixed-vector 3-D eye displacement vector in head coordinates, and we know that the connections from each hidden unit to the output layer are fixed at the end of training, the motor output coding of hidden units was predetermined: the hidden units code fixed-vector eye orientation displacements in head coordinates, just like the output layer.1



View larger version (26K):
[in this window]
[in a new window]
 
FIG. 7. Stimulation study of a 9-unit network. Initial eye positions are the same as the task illustrated in Fig. 3, except all eye positions are shown and we include primary position. No REs are supplied [i.e., RE of (0, 0) is supplied]. Instead, the "eye" fixates at each initial position and a hidden unit is "stimulated" by setting its output to 0.5. Filled circles represent stimulated network output, whereas open circles represent the control condition (nonstimulated network output). A: network response when hidden unit 6 is stimulated at horizontal eye positions. Note that motor output of the network is a head-fixed stereotypical response that exceeds the OMR and is in the unit's preferred direction. Control response shows that the network fixates within about 3° with an increasing downward bias as the eye moves from right to left (also evident in the stimulated response). B: network response when hidden unit 4 is stimulated with initial vertical eye positions. Again network output is a head-fixed stereotypical response that exceeds the OMR and is in the unit's preferred direction.

 
To illustrate this, we simulated the results of "stimulating" individual hidden units, which (like stimulating saccade-related areas in the real brain) shows the motor output of the unit. Figure 7 plots typical results, showing simulated saccades in head coordinates. Results for 2 hidden units are shown, corresponding to 6 (A) and 4 (B) in Fig. 4. These units were chosen because they had strong motor outputs (see Fig. 6). We initialized the network at different eye positions, in 5° increments orthogonal to the motor vector of the unit (40° left to 40° right in A and 40° ≤ 40° down in B). The open circles represent the baseline control condition—how well the network fixated these positions with a zero RE input; these "fixations" stayed within a mean window of 3°. This shows the background noise for our "stimulations."

From each of these eye positions we then simulated stimulation by setting the output of the hidden units to 0.5 (chosen for illustration purposes). These "stimulations" drove the eye to a new location, as indicated by the filled circles. Note that the invoked saccades of the network are primarily vertical when unit 6 is stimulated, and primarily horizontal when unit 4 is stimulated, which agrees with their preferred motor tuning directions (see Fig. 6). More important, these saccade vectors were fixed (when plotted in head coordinates), independent of initial eye position. This resulted in a series of perfectly parallel vectors. Repeating this test with the largest stimulation value (1.0) produced "saccades" well outside the OMR (saccades of about 100°) but in exactly the same directions for each of the units. Similar results were obtained with different networks and units (not shown).

Thus each of the hidden units in our networks simultaneously existed in 2 reference frames: an eye-centered reference frame in their response to visual inputs and a head-centered reference frame in their fixed-vector output command. From this we can conclude two things: First, that the reference frame transformation was somehow occurring within the hidden unit layer. Second, that it was not occurring at the level of individual units because each such unit provided a fixed directional mapping of vision to movement independent of eye position.

Interactions between visual and eye position inputs

Somehow eye position signals were modulating the visual responses in our hidden unit layer to provide the position-dependent transformation illustrated in Fig. 3, but without shifting the topology of their eye-centered receptive fields. Could the mechanism take the form of a "firing rate" modulation like the classical gain fields of Zipser and Andersen (1988)Go? To determine this, we examined the overall profile of the visual receptive field as a function of eye position (Salinas and Abbott 2001Go). We constructed cross sections or "slices" of the visual receptive fields using the eye position sensitivity vector (Fig. 6) as a guide.

Figure 8 A and B,, shows the results of this investigation for unit number 2 of the same 9-unit network used in previous illustrations. Figure 8A shows the resultant cross sections with eye positions arrayed along their preferred axis at 20° spacing centered on the origin (see inset), whereas Fig. 8B shows the slices taken at eye positions lying along the axis orthogonal to the preferred one. The x-axis in both plots shows the full range of the visual map (±90°), although only the area between the dashed lines represents visual input to the network during training. Examination of Fig. 8B shows that in the direction orthogonal to the preferred axis, eye position does not modify the visual response because all 5 curves superimpose onto a single trace.



View larger version (30K):
[in this window]
[in a new window]
 
FIG. 8. Activity contours of slices through the visual RF of unit 2 of a 9-unit network: x-axis indicates the range of the entire visual map (±90°), whereas the y-axis shows the level of visual response for this unit (note the differing scales for plots A and B). Circular inset (both graphs) indicates the 5 eye positions (20° spacing centered on the origin) used along the preferred axis in determining the slices (see text). Origin of the inset represents the straight-ahead primary reference position. Dashed lines indicate the limits of visual space available to the network (±80°) during training. Contour traces of the visual responses are numbered with the appropriate eye position to which they belong (traces have been slightly smoothed for illustrative purposes). A: response along the preferred axis. B: response along the axis that is orthogonal to the preferred. Note that visual response for this unit is modified by eye position only along the preferred axis (A). C: preferred axis slice contours for all hidden units of a 9-unit network. Note that we do not show the responses for the direction orthogonal to the preferred one because there was no eye position modification of visual response in this direction for any of the units. Conventions are the same as A except that curves are not smoothed. Unlike a classic gain field, these units use a gain and a bias mechanism to modify visual sensitivity based on eye position. Compare units 2 and 8.

 
However, an examination of the on-axis traces reveals a modification to visual responsiveness similar to the linear multiplicative gain-field mechanism (Andersen et al. 1990Go; Zipser and Andersen 1988Go). That is, we can compare trace 3—the visual response when the eye is positioned at the straight-ahead primary reference position—with the traces when the eye is deviated from straight ahead (traces 1, 2, 4, and 5). It is apparent that visual se