The Journal of Neurophysiology Vol. 77 No. 4 April 1997, pp. 2115-2130
Copyright ©1997 by the American Physiological Society
Prediction of Complex Two-Dimensional Trajectories by a Cerebellar Model of Smooth Pursuit Eye Movement
R. E. Kettner1,
S. Mahamud2,
H.-C. Leung1,
N. Sitkoff2,
J. C. Houk1,
B. W. Peterson1, and
A. G. Barto2
1 Department of Physiology M211, Northwestern University Medical School, Chicago, Illinois 60611; and
2 Department of Computer Science, University of Massachusetts, Amherst, Massachusetts 01003
 |
ABSTRACT |
Kettner, R. E., S. Mahamud, H.-C. Leung, N. Sitkoff, J. C. Houk, B. W. Peterson, and A. G. Barto. Prediction of complex two-dimensional trajectories by a cerebellar model of smooth pursuit eye movement. J. Neurophysiol. 77: 2115-2130, 1997. A neural network model based on the anatomy and physiology of the cerebellum is presented that can generate both simple and complex predictive pursuit, while also responding in a feedback mode to visual perturbations from an ongoing trajectory. The model allows the prediction of complex movements by adding two features that are not present in other pursuit models: an array of inputs distributed over a range of physiologically justified delays, and a novel, biologically plausible learning rule that generated changes in synaptic strengths in response to retinal slip errors that arrive after long delays. To directly test the model, its output was compared with the behavior of monkeys tracking the same trajectories. There was a close correspondence between model and monkey performance. Complex target trajectories were created by summing two or three sinusoidal components of different frequencies along horizontal and/or vertical axes. Both the model and the monkeys were able to track these complex sum-of-sines trajectories with small phase delays that averaged 8 and 20 ms in magnitude, respectively. Both the model and the monkeys showed a consistent relationship between the high- and low-frequency components of pursuit: high-frequency components were tracked with small phase lags, whereas low-frequency components were tracked with phase leads. The model was also trained to track targets moving along a circular trajectory with infrequent right-angle perturbations that moved the target along a circle meridian. Before the perturbation, the model tracked the target with very small phase differences that averaged 5 ms. After the perturbation, the model overshot the target while continuing along the expected nonperturbed circular trajectory for 80 ms, before it moved toward the new perturbed trajectory. Monkeys showed similar behaviors with an average phase difference of 3 ms during circular pursuit, followed by a perturbation response after 90 ms. In both cases, the delays required to process visual information were much longer than delays associated with nonperturbed circular and sum-of-sines pursuit. This suggests that both the model and the eye make short-term predictions about future events to compensate for visual feedback delays in receiving information about the direction of a target moving along a changing trajectory. In addition, both the eye and the model can adjust to abrupt changes in target direction on the basis of visual feedback, but do so after significant processing delays.
 |
INTRODUCTION |
Smooth pursuit eye tracking is highly accurate, and several studies have shown that eye movements can actually predict the future course of simple target motions. For example, when a visual target is moved back and forth at a constant frequency, the eye soon locks onto the target so that there is little or no lag, and sometimes even a slight lead, between eye and target. This has been shown during tracking of a variety of periodic stimuli, including sine and square waves in one dimension (Bahill and McDonald 1983
; Dallos and Jones 1963
; Kowler and Steinman 1979
; Lisberger et al. 1981
; Pola and Wyatt 1980
) and circular and rhomboid trajectories in two dimensions (Collewijn and Tamminga 1984
; Deno et al. 1995
; Leung and Kettner 1997
). This tracking behavior is considered predictive because visual signals are processed by the smooth pursuit system with considerable delays that have been estimated at ~100 ms (Becker and Fuchs 1984
; Carl and Gellman 1986
; Fuchs 1967
; Leung and Kettner 1997
; Lisberger and Westbrook 1985
; Robinson 1965
). One would expect tracking to lag by similar delays during smooth pursuit if the eye were controlled exclusively by a simple negative feedback system based on visual input.
Complex trajectories created by summing sinusoids can also be tracked with various degrees of predictive control. Although sum-of-sines stimuli have often been used in an attempt to create "random" and therefore "unpredictable" target trajectories, some investigators (e.g., Dallos and Jones 1963
; Yasui and Young 1984
) have shown that low-frequency components of these sum-of-sines trajectories can be tracked with phase leads that suggest predictive control. Barnes et al. (1987)
have stressed that the highest-frequency component of a sum-of-sines waveform can be tracked with high gains and small phase lags that also indicate predictive control. We have recently extended these results to complex, two-dimensional pursuit for the monkey (Kettner et al. 1996
); low-frequency sinusoidal components were tracked with consistent phase leads, whereas middle- and high-frequency components were tracked with very small phase lags. To verify that these monkeys were tracking predictively, we computed visual processing delays in response to perturbations from circular pursuit and compared these delays with tracking delays during complex two-dimensional sum-of-sines pursuit in the same monkeys (Leung and Kettner 1997
). The lag between eye and target during circular pursuit before the perturbations averaged 3 ms. This value was much smaller than the 90-ms average delay in responding to right-angle perturbations from this circular trajectory. Similarly, the average delay during complex sum-of-sines pursuit was 20 ms, much smaller than the time required to process visual information.
Both the sum-of-sines and circle-with-perturbation results provide interesting challenges for models of pursuit. The excellent short-lag tracking behavior observed for sum-of-sines trajectories requires a model of pursuit that can generate complex predictive control. In addition, the circle-with-perturbation data confirm that visual information is also used to guide the tracking of unexpected target motions when predictive control is not possible. This means that complete models should allow predictive and feedback control to work in concert to generate the full range of smooth pursuit behaviors. In addition, an ultimate model of the neural control of pursuit should be biologically plausible. Most previous models of smooth pursuit have addressed only its feedback control component (Deno et al. 1995
; Krauzlis and Lisberger 1989
; Robinson et al. 1986
; Young et al. 1968
). These models are typically couched in the form of linear system operations, which do not explicitly incorporate features of the neural pathways that generate smooth pursuit. In an attempt to generate predictive tracking of complex target motions, Van den Berg (1988)
and Barnes and Asselman (1991)
have suggested that the pursuit system has a buffer that stores target motions. Performance of such models is limited by the length of the buffer, and it is not immediately apparent how they would be implemented by known neural circuits involved in pursuit.
Here we present a neural network model based on the anatomy and physiology of the flocculus and paraflocculus of the cerebellum, brain areas that have been related to the control of smooth pursuit eye movements (Zee et al. 1981
). The model can generate complex predictive smooth pursuit while still mediating responses to unexpected perturbations, which require feedback control. This is the first model that generates complex predictive pursuit with the use of an explicit simulation of control processing implemented by a biologically reasonable architecture. This model represents an extension of our previous models of limb movement control, which have been constrained by the anatomy and physiology of the intermediate cerebellum (Berthier et al. 1993
; Houk and Barto 1992
; Houk et al. 1990
). As in recent renditions of the limb model (Barto et al. 1996
; Buckingham et al. 1995
), we utilize a learning rule with an "eligibility trace" and incorporate the rich diversity of mossy fiber input to the cerebellum. The eligibility trace allows delayed feedback of errors to modify the strength of connections that were active at the time at which the motor commands that led to the error were generated (Houk and Alford 1996
; Klopf 1972, 1982; Sutton and Barto 1981
). The inclusion of diverse mossy fiber inputs together with a granular layer creates a sparse expansive recoding of these inputs that is well suited for associative mapping (Albus 1971
; Marr 1969
; Tyrrell and Willshaw 1992
). The simulated mossy fiber inputs are based on the behavior of actual neurons revealed by single-neuron recording studies (Lisberger and Fuchs 1978a
,b
; Maekawa and Kimura 1980
; Miles et al. 1980
; Mustari et al. 1988
; Stone and Lisberger 1990a
,b
; Suzuki et al. 1990
). These various features allow the model to learn to generate predictive pursuit despite the fact that pursuit error signals are delayed by 100 ms. Unlike the limb control model, which regulates state transitions in Purkinje cells, the present model generates continuous outputs that control smooth eye movements.
In outline, the paper first describes our cerebellar model in mathematical detail while at the same time justifying with physiological data the choices made in formulating the mathematical equations and in selecting the model parameters. Results from simulations of the model are then presented and compared with analogous data obtained from behaving monkeys studied in our laboratory. All of these findings are integrated with the existing literature in a final discussion. The monkey results used to test the model have been reported in more detail elsewhere. Monkey tracking behavior and estimates of component interactions during complex two-dimensional sum-of-sines pursuit are presented in Kettner et al. (1996)
. Leung and Kettner (1997)
describe pursuit behavior during right-angle perturbations from circular pursuit, as well as quantifying predictive control during complex pursuit with comparisons between visual feedback delays and tracking delays during complex pursuit. Preliminary reports of the model have also appeared (Kettner et al. 1995
; Mahamud 1995
; Mahamud et al. 1996
).
Model
The structure of the network model is shown in terms of the neuronal architecture of the cerebellum in Fig. 1 and functionally in Fig. 2. Information enters the network via 440 pathways representing mossy fibers. Of these, some mossy fibers conveyed two-dimensional visual information about the position (n = 40) and velocity (n = 40) of the target image on the retina, and other mossy fibers provided information related to the position (n = 180) and velocity (n = 180) of the eye relative to the head. In each case, information was conveyed with time delays spanning ranges that are realistic for each population of mossy fibers. We denote the activation of mossy fiber i at time t by mi(t),i = 1, . . . , 440.

View larger version (42K):
[in this window]
[in a new window]
| FIG. 1.
Architecture of the cerebellar model. Mossy fibers are randomly connected to granule units. Dotted lines: hypothesized influence of Golgi cell (Go) receptive fields that allow only those granule cells with the highest activity to generate outputs. Axons from granule cells bifurcate into 2 parallel fibers that excite horizontal (HP) and vertical (VP) Purkinje units. Basket (B) and stellate (S) units receive excitatory parallel fiber input and inhibit Purkinje cells. The action of these unit types is collapsed into "Purkinje units" that allow both positive and negative "parallel fiber" weights. Climbing fiber (cf) inputs are used to train the network.
|
|

View larger version (14K):
[in this window]
[in a new window]
| FIG. 2.
Block diagram of the model. Although the brain stem integrator and the eye plant are modeled by the same set of equations in the model, these 2 functions are distinguished in the diagram to emphasize their different neural substrates and the idea that both proprioceptive and efference copy signals may provide eye position and velocity information. All lines indicate the flow of multivariate information, with the heavier arrow indicating the wider bandwidth associated with the expansive recoding of mossy inputs. Smaller boxes: pure delays in the model. Open arrowhead: indirect action climbing fiber training signals have on information throughput by the alteration of network weights via the learning rule. Visual input to the system is assumed to take the form of retinal error signals that are obtained by a subtraction at the node labeled S of target and eye position signals.
|
|
Retinal position error was coded by 40 visual mossy fibers (8 preferred position directions × 5 delays ranging from 80 to 120 ms in 10-ms steps). Similarly, retinal velocity error was conveyed by 40 fibers (8 preferred velocity directions × 5 delays ranging from 80 to 120 ms). In both cases, preferred directions ranged from 0 to 315° at intervals of 45°. Specifically, let e(t) denote the two-dimensional retinal position error vector at time t, which is the vector difference between the retinal target image and the center of view (the fovea) at time t (bold type indicates that a quantity is a vector); let ni denote the unit vector in the preferred direction of mossy fiber i; and let
i denote the delay by which mossy fiber i transmits information to the cerebellum. Then the activation of retinal position error mossy fiber i, i = 1, . . . , 40, at time t, is
|
(1)
|
where emax is the maximum expected magnitude of e(t) used to normalize mi(t). The centered dot indicates the inner, or scalar, product of the two vectors. Taking the maximum of the normalized inner product and 0 ensures that the mossy fiber activation is never negative. Similarly, the activation of retinal velocity error mossy fiber i, i = 41, . . . , 80, is
|
(2)
|
where
(t), the time derivative of e(t), is the retinal velocity error vector, and
max is the maximum expected magnitude of
(t).
This approximates the broad cosine spatial tuning (Fig. 3A) that has been observed experimentally (Mustari et al. 1988
; Suzuki et al. 1990
) for neurons in the dorsolateral pontine nucleus, which provides visual input to the cerebellum. The distribution of visual delays is derived from single-neuron recordings in the dorsolateral pontine nucleus (Mustari et al. 1988
; Suzuki et al. 1990
) and the cerebellar flocculus/paraflocculus (Miles et al. 1980
).

View larger version (15K):
[in this window]
[in a new window]
| FIG. 3.
Examples of mossy fiber response properties. A: visual mossy fibers had cosine tuning functions with a single preferred direction. In this figure firing rate was equal to the distance from the origin to the tuning function in the direction of target motion. Here the preferred direction of the mossy fiber was 45°. B: eye movement mossy fibers were tuned for horizontal and vertical movements with a variety of different slopes and thresholds. The figure shows the 9 different sets of tuning curves associated with rightward movement.
|
|
The second set of mossy fiber inputs provides signals related to eye movement at biologically realistic delays. Eye position is coded by 180 mossy fibers (4 preferred position directions × 3 thresholds × 3 slopes × 5 delays ranging from 0 to 40 ms in steps of 10 ms), and eye velocity is coded by 180 additional mossy fibers (4 preferred velocity directions × 3 thresholds × 3 slopes × 5 delays from 0 to 40 ms). These signals are based on responses from granular layer input units recorded from the cerebellar flocculus/paraflocculus (Lisberger and Fuchs 1978a
,b
; Miles et al. 1980
) and have preferred directions aligned with either the vertical or horizontal axes in four directions (up, down, right, left). Because of their relatively short delays, these signals are likely to arise from brain stem systems that either provide an efference copy of motor commands or are driven by proprioceptive inputs (Maekawa and Kimura 1980
).
Mathematically, the activation levels of eye position mossy fibers, which are the mossy fibers i, i = 81, . . . , 260, are defined as follows
|
(3)
|
where ni is the unit vector in the fiber's preferred direction, x(t
i)is the eye position vector delayed by
i ms, xmax is the maximum expected magnitude of x(t) used to normalize mi(t), and the parameters ai and bi serve to diversify the response properties of the mossy fibers. Specifically, ai is a vector whose two components determine response thresholds of mossy fiber i for horizontal and vertical eye position. For any mossy fiber i, we defined these components to both equal 0, 1/2, or 1. The scalar bi, which determines the response slope of mossy fiber i, was set to 1/4, 1/2, or 3/4. These values were chosen so that all combinations occurred with approximately equal frequencies across the population of mossy fibers. Examples of these linear response functions for one of the four preferred directions are illustrated in Fig. 3B. The activations of the 180 eye velocity mossy fibers i, i = 261, . . . , 440, were defined as in Eq. 3, but with the delayed eye velocity vector,
(t
i), substituted for the delayed eye position vector, x(t
i), and normalized by the maximum expected eye velocity magnitude.
Each of 6,000 granule units sums inputs from five randomly selected mossy fibers (Ito 1984
; Palkovits et al. 1972
). If for each j, j = 1, . . . , 6,000, Rj is a set of (indexes of) 5 mossy fibers selected randomly (uniformly without replacement) from the total of 440 mossy fibers, then the activation of granule unit j at time t is
|
(4)
|
The constants, hi, were selected randomly from the interval (0.75, 1.00) at the beginning of a simulation and then remain fixed.
In deriving a pattern of parallel fiber activity from granule unit activations, we assumed that a local competition takes place that allows only 5% of the granule units to be active at any given time. Thus the continuous-valued mossy fiber activation pattern is recoded into a sparsely activated higher-dimensional vector of binary-valued parallel fiber activity that improves pattern discrimination, storage capacity, and learning (Tyrrell and Willshaw 1992
). Marr (1969)
and Albus (1971)
assumed that this competition arose from inhibitory interactions by Golgi cells. To reduce computation times, we implemented this competition by dividing the granule cell population into 300 Golgi cell receptive fields each comprising 20 granule units, and allowed only the most active unit in each field to provide an output of 1 during any time step. Let Gj denote the set of (indexes of) the granule cells making up the Golgi cell receptive field in which granule cell j is located. Then the activation of parallel fiber j, j = 1, . . . , 6,000, at time t is
|
(5)
|
This results in 300 granule cells having an activity of 1, and the remaining 5,700 granule cells having an activity of 0. During the next simulated time step, granule cell activity is recomputed on the basis of new mossy fiber inputs. In practice, however, only a subset of the 300 active granule cells change state from one time step to the next.
Two units representing Purkinje cells generate horizontal and vertical movement commands in agreement with experimental reports that Purkinje cells fire primarily in relation to horizontal and vertical eye movement (Miles et al. 1980
; Stone and Lisberger 1990a
). In the brain, granule cell axons generate parallel fibers that excite Purkinje cells, as well as exciting basket and stellate cells that in turn inhibit Purkinje cells. This part of the cerebellar circuitry was modeled by defining the activation of the Purkinje unit k at time t, pk(t), to be a weighted sum of the activations at time t of all the model's parallel fibers
|
(6)
|
where wjk(t) represents the weight, or efficacy, of the synapse by which parallel fiber j influences Purkinje cell k,k = 1,2, and po is the Purkinje unit background firing rate. The parallel-fiber-to-Purkinje-unit weights were allowed to assume both positive and negative values to simulate the combined direct positive activation via parallel fiber input and indirect negative activation via basket and stellate cells. This assumption, made for the sake of computational simplicity, has been used in other cerebellar models (Fujita 1982a
,b
) and is justified mathematically by assuming that each weight, w, is the combination of an excitatory weight w+ and a negative weight w
so that w = w+
w
. Thus a positive synaptic weight corresponds to greater activation along the excitatory than the inhibitory pathway (w+ > w
), and a negative synaptic weight is associated with the opposite: greater activation along the inhibitory than the excitatory path (w
> w+). Changes in total weight, w, can correspond to changes in either the w+ pathway alone, the w
pathway alone, or a combination of both.
Outputs from the two Purkinje units generate horizontal and vertical eye movements via a model of the oculomotor plant that includes a neural integrator. Mathematically, this is accomplished in the simulation by the following two vector difference equations
|
(7)
|
|
(8)
|
where p(t) is the vector of Purkinje unit activities at time t, and p0 is the vector of background firing rates. These equations were derived from the second-order linear equation relating average motoneuron firing rate, s(t), with eye position, x, due to Keller (1973)
|
(9)
|
It is assumed that brain stem circuits account for the background firing rate, s0, and the position term, 4.0x(t), via circuits that include a neural integrator (see review by Fukushima et al. 1992
). Details of how this integration might be implemented by brain networks have been presented elsewhere (e.g., Anastasio and Robinson 1989
; Arnold and Robinson 1991
). The other two terms in the equation were assumed to relate to Purkinje unit output by the equation
|
(10)
|
This equation was solved for
(t) by substituting
(t) = [
(t)
(t
t)]/
t to derive Eq. 7, and x(t) was then obtained by integrating
(t).
On the basis of experimental results (Leung and Kettner 1997
; Robinson 1965
), large tracking errors were corrected by catch-up saccades that stepped the eye to the location of the target. Saccades were initiated 200 ms after retinal position error exceeded 0.25°, followed by a refractory period of 200 ms during which saccades were not allowed to occur. When the retinal error exceeded 0.25° during the refractory period, a saccade was initiated at the end of this period. This meant that corrective saccades were always separated by
200 ms. We assumed that saccades were generated by other brain systems outside of the cerebellum. Several explicit models of saccade generation exist (e.g., Arai et al. 1994
; Lefevre and Galiana 1992
), and we did not attempt to duplicate this work. Saccades were more frequent during the early stages of training and declined in magnitude and frequency as the network learned.
Learning in the model takes the form of changes in the synaptic weights, wjk, by which the parallel fibers influence Purkinje units. These weights change when a retinal velocity error (retinal slip) occurs in an appropriate temporal relationship with parallel fiber activation. Retinal velocity errors are sensed by the climbing fiber system and relayed to Purkinje units with a delay, d, that was set to 100 ms (Stone and Lisberger 1990b
). The model adopts the hypothesis of Klopf (1972, 1982) of a synaptic eligibility trace that allows changes to be made in the weights of the synapses that were active ~100 ms before each climbing fiber signal arrived at a Purkinje unit. The eligibility trace is a temporary memory trace, local to the synapse, that is initiated whenever the presynaptic parallel fiber is active. It makes the synapse "eligible" for being modified if, and when, a climbing fiber signal arrives at the postsynaptic Purkinje unit within a time window of several hundred milliseconds (see Fig. 4). This allows changes in these synaptic weights of Purkinje units whose activity participated in causing the error conveyed by the climbing fiber.

View larger version (13K):
[in this window]
[in a new window]
| FIG. 4.
Eligibility trace activation is broad with a peak at a delay of ~100 ms. In response to a step input from the parallel fiber input, f(t), the concentration of q(t) increases abruptly and then decays. These increases in q(t) then cause increases in the eligibility trace, r(t).
|
|
Although there are a number of ways to produce an eligibility trace, we selected a biologically reasonable mechanism based on slow "second-messenger" chemical processes that are postulated to underlie synaptic change in cerebellar neurons (see DISCUSSION). The model generates the time courses of the eligibility trace, rjk(t), by the following coupled first-order linear difference equations
|
(11)
|
|
(12)
|
where the parameters
,
,
, and
are all positive fractions. These equations provide a discrete time model of a series coupling of two "leaky integrators." The time courses of r and q in response to a brief pulse input f are illustrated in Fig. 4. Equation 11 implies that the cellular factor qjk(t) rises immediately in response to an increase in parallel fiber activity, fj(t), and otherwise decays toward 0. As elaborated in the DISCUSSION, we think of qjk(t) as the level of the protein kinase C (PKC) within the cell, the parameter
as representing the rate at which glutamate released by parallel fiber activity leads to increases in PKC via metabotropic glutamate receptors, and
as the rate of spontaneous inactivation of PKC. Similarly, Eq. 12 causes the eligibility trace level, rjk(t), to rise in response to increases in qjk(t) and to decay toward 0 otherwise. Here, we think of rjk(t) as the level of phosphorylation of
-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid (AMPA) receptors for glutamate,
as the rate at which PKC levels promote this phosphorylation, and
as the rate at which other enzymes dephosphorylate the receptors. With appropriate selection of these parameters (
= 0.1,
= 0.1,
= 0.1, and
= 0.1) the synapse's eligibility level, rjk(t), rises to a peak ~100 ms after activation of the parallel fiber input and then declines (seeFig. 4).
A more abstract eligibility trace modeled as a pure delay,
, was used in some simulations to study the effects of the trace delay on learning. The result was an eligibility trace,
jk(t), defined simply by
|
(13)
|
When
was set equal to the climbing fiber delay, d, this trace produced excellent results for all trajectories in about half the simulated time steps required with the use of the trace defined by Eqs. 11 and 12. We used it in runs designed to study the effects of varying the trace delay (Fig. 8), and also to reduce computation times in the simulations that generated quantitative gain and phase data (Fig. 6). The trace defined by Eqs. 11 and 12 was used to generate the simulation data presented in Figs. 5 and 7; very similar results were obtained with the use of Eq. 13.

View larger version (12K):
[in this window]
[in a new window]
| FIG. 8.
Effects of the trace delay on model performance. Dotted line: saccade initiation error of 0.25°.
|
|

View larger version (16K):
[in this window]
[in a new window]
| FIG. 6.
Average component gain vs. phase values for the model and the monkey. In all instances, predictive pursuit is indicated by phases well in advance of the average visual feedback delays of 80 ms for the model and 90 ms for the monkey. Note that low-frequency components (represented by triangles for the model and asterisks for the monkey) were tracked with phase leads, whereas high-frequency components (represented by circles for the model and plus signs for the monkey) were tracked with phase lags. Open symbols: model performance. Other symbols: monkey behavior. High-, medium- (when present), and low-frequency component values are indicated for the model by circles, diamonds, and triangles, and for the monkey by plus signs, crosses, and asterisks, respectively. Model gain and phase values were based on performance along the H3V2 and H4H6V7 trajectories illustrated in Fig. 5 and the H2H3 trajectories used to study high- vs. low-frequency component tracking. Monkey gain and phase data were based on results reported in Kettner et al. (1996) and Leung and Kettner (1997) for 8 complex trajectories each at 3 waveform frequencies: H2H3, V2V3, H2V3, V2H3 (at 0.3, 0.4, and 0.5 Hz) and H4H6H7, V4V6V7, H4H6V7, V4V6H7 (at 0.05, 0.10, and 0.15 Hz). Here, for example, V4V6H7 indicates a combination of 2 vertical sinusoids and 1 horizontal sinusoid with frequencies 4, 6, and 7 times the waveform frequency, with each sinusoid in a waveform having the same average velocity. Each data point for the monkey data is based on performance during 100 waveform presentations (10 waveforms × 5 runs × 2 animals).
|
|

View larger version (46K):
[in this window]
[in a new window]
| FIG. 5.
Comparisons between model tracking (left) and monkey eye tracking (right) along complex trajectories. Heavy lines: target trajectories. Fine lines: tracking traces. A and C: tracking along H3V2 trajectories generated by combining horizontal (0.9 Hz, 3.33°) and vertical (0.6 Hz, 5°) sinusoids at 3 and 2 times the 0.3-Hz waveform frequency. B and D: tracking along H4H6V7 trajectories created by pairing 2 sinusoids at 4 and 6 times the 0.15-Hz waveform frequency on the horizontal axis (0.6 Hz, 5°; 0.9 Hz, 3.33°) with a 3rd sinusoid at 7 times the waveform frequency (1.05 Hz, 2.86°) on the vertical axis. Amplitudes were adjusted so that components had the same peak velocity.
|
|

View larger version (29K):
[in this window]
[in a new window]
| FIG. 7.
Comparisons between model tracking (A and B) and monkey eye tracking (C and D) along perturbed trajectories. Heavy lines: target trajectories. Fine lines: tracking traces. A and C: average tracking of waveforms consisting of 3.5 cycles of a circular trajectory (1.0 Hz, 5°) followed by 0.5 cycle of vertical sinusoidal pursuit. After the perturbation, both eye and model followed the curvature of the expected circular trajectory before smooth and then saccadic corrections were made. B and D: individual horizontal position traces for the cycle before and during the perturbation. Bars above the nonperturbed cycle correspond to the small pursuit lags that are observed before perturbation, in comparison with the bars above the perturbed cycles, which indicate estimates of the much larger smooth correction latencies.
|
|
The rising and falling eligibility level of a synapse provides one factor that influences synaptic weight changes; a second factor is activation of the climbing fiber. We model this process by the following rule for changing synaptic weights
|
(14)
|
where
is a parameter influencing the rate of learning, rjk(t) is the synapse's eligibility level, co is the background firing rate of the climbing fiber, and ck(t) is the climbing fiber's discharge rate defined by
|
(15)
|
Here, nk is a unit vector in the preferred direction of the horizontal or vertical Purkinje unit and
(t
d) is the retinal velocity error at time t
d, where d is the climbing fiber delay of 100 ms described above. In the simulations reported below,
ranged from 0.0001 to 0.00001, and
jk(t) replaced rjk(t) when the eligibility trace defined by Eq. 13 was used. The very low values of
needed for successful learning by the model, which uses a continuous representation of the climbing fiber signal, may correspond to the low rates of climbing fiber discharge observed physiologically (Ito 1984
).
We can summarize the behavior of the learning rule given by Eq. 14 as follows. For a given Purkinje unit, whenever its climbing fiber is firing above or below its background rate, the weight of each of its parallel fiber synapses is changed by an amount that depends on 1) the amount by which the climbing fiber's current activity differs from its background rate, 2) the current eligibility level of the synapse, and 3) the parameter
. A synapse's eligibility level is high to the extent that the synapse was active ~100 ms in the past. Climbing fiber activity deviates from its background rate to the extent that the target image was moving on the retina 100 ms in the past. Consequently, the learning rule selectively changes weights at those synapses that were active when retinal velocity errors occurred. In this way a set of weights is obtained that allows accurate target tracking by the network.
 |
METHODS |
The complex sum-of-sines stimuli used to test the model were identical to a subset of stimuli previously used to study monkey performance (Kettner et al. 1996
). Complex trajectories were created by combining two or three sinusoids equated for velocity along horizontal and/or vertical axes (see Fig. 5). Model and monkey pursuit performance was quantified by fitting sinusoids at component frequencies to target and eye velocity traces with the use of a least-squares regression procedure. Regions of these traces containing saccadic jumps were deleted with the use of custom computer programs, and the regressions were then based on the remaining data. Gain was defined as the ratio of eye and target velocity amplitudes; phase was the phase difference in milliseconds between eye and target. This analysis produced three gain/phase pairs for each of the three components associated with tracking along a sum-of-three-sines trajectory, and two gain/phase pairs for pursuit along sum-of-two-sines or circular trajectories. This analysis is mathematically equivalent to estimating the discrete Fourier components at component frequencies on the basis of records with missing data (Press et al. 1992
).
In a different set of simulations, the model was trained to track trajectories (see Fig. 7) that had previously been used to study perturbation responses in monkeys (see Leung and Kettner 1997
). These trajectories consisted of three cycles of circular target motion followed by a right-angle perturbation along a circle meridian during the fourth cycle. This four-cycle sequence was repeated
10 times for both the model and the monkey to evaluate response patterns that were consistent across repetitions of the perturbation. Quantitative estimates of visual feedback delays for both the model and the monkeys were based on the first smooth deviation from the circular trajectory after the perturbation had begun. A custom computer program first subtracted the eye position during the unperturbed cycle just before the perturbation from the eye position during the perturbed cycle. This subtraction created a "difference trace" that eliminated systematic deviations from the target trajectory during normal pursuit. A regression line was then fit to the difference trace for a 50-ms period that began 25 ms before the perturbation. Only those records that were free from saccades during the perturbation were used in our analyses, as verified by viewing trace records of eye position and velocity. The latency of smooth corrections for a session was defined as the time after the perturbation of the first deviation from the regression line that was maintained for 100 ms. In a related fashion, the latency of saccadic corrections was defined as the first occurrence of a single deviation of 40°/s in velocity difference records.
The model was trained afresh on each stimulus from the initial state defined in the description of the model, until we observed good tracking with a minimum of corrective saccades and no performance improvement with additional training. All the results described in this paper are based on asymptotic performance of this sort. Although model training could require several days of computer time, the simulated time required to reach performance asymptote ranged from 8 to 33 min depending on the trajectory being trained and the eligibility rule that was used. Model performance typically reached asymptote after 100,000 simulated 10-ms time steps for the sum-of-two-sines (H3H2 and H3V2) waveforms, and after 200,000 time steps for the sum-of-three-sines (H4H6V7) and circles-with-perturbation trajectories, with the use of the eligibility trace defined by Eqs. 11 and 12 with the optimal delay of 100 ms. This corresponded to 300 waveform presentations in each case, because the H3V2 and H2H3 waveforms had a repetition period of 3.33 s, which was half the repetition period of 6.67 s associated with the H4H6V7 and circles-with-perturbation waveforms. Training reached asymptote in about half the number of simulated time steps for the pure delay trace described by Eq. 13. Each model simulation was implemented in the C programming language running on DEC 5000 workstations, and required ~24 h to execute 50,000 simulated 10-ms time steps. No systematic attempt was made to study the effects of varying learning rate parameters or to optimize other model parameters, and it is likely that learning would have occurred more rapidly for a different set of model parameters.
We attempted to use similar training procedures for the model and the monkey, but the large times required to train the model necessitated several compromises. The first simplification was the elimination of waveforms that were 90° rotations of each other. Although the monkey results indicate clear differences in horizontal and vertical pursuit (Kettner et al. 1996
), no such differences were built into this initial version of the model. Second, the model simulations presented here were based on training for a single waveform, whereas each monkey was tested on all of the pursuit stimuli as well as several other waveform trajectories. In addition, the monkey was continually pursuing targets in the home cage between experiments. These differences in the stimulus sets processed by the model and the monkey are likely to explain many of the small performance differences that were observed.
Standard methods were used to collect pursuit data from monkeys tracking complex two-dimensional trajectories. These techniques have been reported in detail elsewhere (Kettner et al. 1996
), as will techniques for collecting data during perturbations from circular pursuit. Only a few key features of these experimental methods are described here. Two male rhesus monkeys were housed and maintained according to Guiding Principles in the Care and Use of Animals of the American Physiological Society. Care was taken to ensure the comfort of the animals during the experiment, and their general well being. An eye coil and stainless steel cylinder for head fixation were surgically implanted under deep anesthesia (20 mg/kg iv thiamylal sodium followed by 1% halothane) and aseptic conditions. The pursuit stimulus was a laser spot controlled by computer-driven mirror galvanometers that was back-projected onto a tangent screen 40 cm from the eyes. Eye position was monitored by a magnetic search coil system. Juice rewards were delivered by computer when the eye was within ±2° of the target for 500 ms.
 |
RESULTS |
Figure 5, A and B, shows results from simulations of the model during tracking along sum-of-two-sines and sum-of-three-sines trajectories. Corresponding data for a monkey tracking along the same trajectories are shown in Fig. 5, C and D, to the right of each model simulation. Both the model and the monkey showed excellent tracking performance along these complex target trajectories. For both the model simulations and the monkey experiments, sum-of-two-sines trajectories (V2H3 in Fig. 5, A and C) at a waveform (repeat) frequency of 0.30 Hz were created by summing an 0.6-Hz vertical sinusoid (V2) at 2 times the waveform frequency with an 0.9-Hz horizontal sinusoid (H3) at 3 times the waveform frequency of the trajectory. Similarly, sum-of-three-sines trajectories (H4H6V7 in Fig. 5, B and D) were created by summing three sinusoidal components, two horizontal (H4 and H6) and one vertical (V7), at frequencies 4, 6, and 7 times the waveform frequency of 0.15 Hz. The monkey tracking data were derived from our previously reported studies of component interactions during sum-of-sines pursuit (Kettner et al. 1996
).
Figure 6 shows quantitative measurements of performance for model simulations (open symbols) in terms of component gains and phases (see METHODS). Six sets of data points correspond to results for six trajectories: the H3V2 and H4H6V7 trajectories displayed in Fig. 5 and the H2H3 trajectory (discussed in more detail below) presented at four waveform frequencies of 0.3, 0.4, 0.5, and 0.6 Hz. Thus three gain/phase points define tracking for the high-, middle-, and low-frequency components of the H4H6V7 waveform, and two gain/phase points define high- and low-frequency tracking for each of the other five sum-of-two-sines trajectories. For the model, the average gain for all trajectory components was 0.97, whereas the average phase magnitude (absolute value) was 8 ms.
For comparison purposes, gain/phase pairs for monkey tracking of sum-of-sines trajectories are also presented in Fig. 6. These data were used previously to study component interactions during sum-of-sines pursuit (Kettner et al. 1996
). They have been reanalyzed here so that phase angles are now expressed as phase delays in milliseconds to allow comparisons with the feedback delays obtained from the perturbation experiments described below. The greater number of points associated with the monkey data is due to the larger number of stimulus trajectories and frequencies used to study the monkey: four sum-of-two-sines and four sum-of-three-sines trajectories were each studied at three waveform frequencies. For the monkey, the average gain for all trajectory components was 0.84, whereas the average phase magnitude (absolute value) was 20 ms. The model's performance is included in the wide range of behaviors observed for monkey, but its output is closer to optimal performance of zero phase and unity gain than is the monkey's performance.
Interestingly, the model showed consistent differences in tracking phase for high- versus low-frequency components presented on the same axis: the high-frequency component was tracked with a phase lag, whereas the low-frequency component was tracked with a phase lead. For example, for the H4H6V7 trajectory, on the horizontal axis, the H4 trajectory was tracked with a phase lead of 14 ms, whereas the H6 trajectory was tracked with a phase lag of 6 ms; on the vertical axis, the V7 component was tracked with a phase lead of 6 ms. To further test this observation, we trained the model to track the sum-of-two-sines waveform H2H3 at four waveform frequencies of 0.3, 0.4, 0.5, and 0.6 Hz. Each of the H2H3 trajectories was created by summing a horizontal sinusoid (H2) at 2 times the waveform frequency with a horizontal sinusoid (H3) at 3 times the waveform frequency. Without exception, the higher-frequency H3 component was tracked with a phase lag (average lag = 3 ms), whereas the low-frequency H2 component was tracked with a phase lead (average lead = 14 ms). The model did not show this lag-lead effect for phase when components were separated on horizontal and vertical axes. For the model, the tracking components for the V2H3 trajectory both showed phase leads (H3 lead = 8 ms; V2 lead = 6 ms). Similarly, the V7 component of the H4H6V7 waveform was tracked with a phase lead of 6 ms, even though it was the highest-frequency component, whereas the H6 and H4 components on the horizontal axis did follow the correct lag-lead relationship.
Similar results were obtained for our monkeys (Kettner et al. 1996
). Notice in Fig. 6 that the high-frequency component phases (plus signs) for monkeys were associated with phase lags, whereas low-frequency components (asterisks) were always tracked with phase leads. Overall, the monkey tracked high-frequency components with an average phase lag of 8 ms, whereas low-frequency components were tracked with an average phase lead of 6 ms. Similar to the model, the monkey showed a weaker lag-lead relationship when component sinusoids were on opposite horizontal and vertical axes, but, unlike the model, this lag-lead relationship was still present. Although it is unclear why this difference between model and monkey performance occurs, it suggests that there is a greater separation between horizontal and vertical control for the model than for the monkey.
The model's response to an infrequent right-angle change in target trajectory (Fig. 7, A and B) was used to study its ability to respond to a perturbation during ongoing two-dimensional pursuit. Monkey tracking data from Leung and Kettner (1997)
for the same trajectories are presented in Fig. 7, C and D. Here both the model and the monkey tracked a circular trajectory for three cycles before encountering a right-angle perturbation along a circle meridian on the fourth cycle. The model responded to unexpected perturbations in trajectory in much the same way the eye did. In Fig. 7A there is a continuation of the expected curved circular trajectory for about the same time period observed for the eye in Fig. 7C. Interestingly, the model had a somewhat stronger predictive component than the oculomotor system; it anticipated smoothly the abrupt change from an upward linear trajectory to the curved trajectory, whereas the eye stayed close to the upward path for a longer period of time. Unlike the eye, the model deviated slightly from the curved trajectory on those cycles when a perturbation did not occur.
We used this perturbation paradigm to quantify delays in visual feedback, to compare these estimates with delays during smooth pursuit along the sum-of-sines trajectories described above, and to demonstrate the clear use of predictive control during by the model with the use of the same techniques used previously for the monkey (Leung and Kettner 1997
). During circular pursuit before the perturbation, the model tracked the circle with average horizontal and vertical component gains of 0.95 and 1.00, and small average phase leads of 5 and 5 ms. These results compared well with average horizontal and vertical component gains of 0.96 and 0.84 and average phase lags of 4 and 2 ms for the monkey. In contrast to this excellent performance before the perturbation, the model showed visual feedback delays of 80 ms before it responded to the right-angle change in target direction after the perturbation. The monkey showed a similar average response latency of 90 ms. This indicates that there are limits to the tracking accuracy of the model that are similar to the limitations of the monkey pursuit system. Although the model was trained on the perturbation, it was not trained to produce the specific corrective response that was observed. Rather, the parameters of the model were constrained by physiological measurements and then trained to accurately match the perturbation trajectory, as indicated by a lack of retinal slip. Thus the model was trained to accurately respond to the right-angle perturbation but generated an overshoot similar to that exhibited by the eye.
The robustness of the model's learning rule was tested. For this purpose, the eligibility trace described in Eq. 13 was used. This eligibility trace did not increase and decrease gradually when the process modeled by Eqs. 11 and 12 was used, but instead provided a record of parallel fiber input at the parallel-fiber-to-Purkinje-cell synapse that was simply delayed by d ms. This "pure delay" trace allowed us to systematically study the effects of trace delay on network learning. For each delay, the model was reset to the starting conditions defined in METHODS and trained for 50,000 simulated time steps. Results from this analysis (Fig. 8) indicate that this eligibility trace resulted in good model performance for delays that ranged from 80 to 200 ms. Here we measured model performance on the basis of the root-mean-square error during the final 4,000 time steps of the simulation and used an error criterion of 0.25° (Fig. 8, dotted line) to distinguish poor performance from good performance. This is the error level that the model uses to initiate a saccade that moves the eye back on target.
 |
DISCUSSION |
Previous models of predictive pursuit
In most previous models of smooth pursuit (e.g., Robinson et al. 1986
; Young et al. 1968
) it has been assumed that the system can be simulated with a small number of interconnected processing units that control eye movement by feedback/feedforward control. These models were designed primarily to explain pursuit along unpredictable target trajectories and to account for responses to trajectory steps. They also generate short-lag predictive tracking along straight-line constant-velocity ramp trajectories, because they are velocity controllers (Rashbass 1961
; but see Pola and Wyatt 1980
). After the eye has acquired the target, short-lag pursuit is maintained with a simple constant-velocity output.
However, the designers of these models stressed from the outset that they were not designed to generate predictive tracking along more complex trajectories including sinusoids that are tracked with little or no lag (Bahill and McDonald 1983
; Dallos and Jones 1963
; Lisberger et al. 1981
). Here position, as well as velocity, acceleration, and the other, higher-order derivatives of sinusoidal motion varies continuously in time, and a constant controller output will not allow predictive tracking. One approach used to generate predictive tracking for sinusoidal target motions is to sum delayed position, velocity, and acceleration inputs to generate a sinusoidal control signal with a frequency equal to the common frequency of the inputs and a phase shift determined by the relative proportion of the inputs (Deno et al. 1995
; Krauzlis and Lisberger 1989
). For example, for sinusoidal motion both velocity and position vary sinusoidally with the same period, but with different phases: velocity is 90° ahead of position. Their combination (position + velocity) produces a sinusoid that has a frequency equal to shared frequency of the position and velocity components and a phase that is 45° ahead of position. In a similar fashion, other phase delays can be generated by varying the relative amplitudes of the position, velocity, and acceleration components. A suitable choice of these amplitudes can produce a control signal that compensates for some delays in visual feedback. Unfortunately, a particular choice of model parameters will only work for sinusoids with a single frequency. This approach will not work for sum-of-sines stimuli composed of different frequency sinusoids: a phase-shifted copy of a sum-of-sines waveform cannot be generated by summing position, velocity, and acceleration versions of that waveform.
To explain more complicated target motions, as well as sinusoidal pursuit, some investigators (e.g., Barnes and Asselman 1991
; Van den Berg 1988
) have suggested that the pursuit system has a buffer that stores the most recent cycle of a periodic target motion. If the next cycle is the same as the previous cycle, it is easy to see how this stored representation can be used to generate eye movements that predict target motion on the current cycle. Although this approach accounts for some experimental data, it is difficult to see how a buffer of this type would be implemented by a biological system. There are also problems in deciding what constitutes a cycle. For example, if a cycle transition is defined as a zero crossing, then it is difficult to determine where a complex waveform begins and ends. A related approach is to imagine that input signals are delayed by the repeat period of a complex waveform, but again, it is difficult to imagine how very long and accurate delays corresponding to long repeat periods would be produced by a biological system. For example, the "fish" trajectory described in the RESULTS section has a repeat period of 6.6 s.
Network model of smooth pursuit based on the cerebellum
The model of two-dimensional visual pursuit presented here is based rather directly on the anatomy and physiology of the flocculus and paraflocculus regions of the cerebellum. These areas are known to receive a rich diversity of visual, efference copy, and proprioceptive information via mossy fibers, and they send their Purkinje axons to brain stem nuclei known to control eye movements. Whereas previous pursuit models have utilized simplified representations of the visual- and movement-related signals, the present model uses 440 mossy fibers to simulate the rich diversity of mossy fiber latencies, directional tuning characteristics, and velocity and positional response properties that are observed experimentally (e.g., Miles et al. 1980
). These mossy fiber input signals are combined randomly to generate an expansive recoding into 6,000 parallel fibers in the simulated granular layer.
It is then hypothesized that the Golgi cell system acts via a mechanism akin to lateral inhibition (see Model) to eliminate the activity for all but a subset of the most active granule cells during each time step. This process leads to a more sparse representation that improves the network's ability to distinguish input activity patterns. The result is a sparse expansive recoding that enhances the richness and discriminability of the vector of potential inputs (see Albus 1971
; Marr 1969
; Tyrrell and Willshaw 1992
) that is furnished to Purkinje cells. It improves the probability that a predictable and unique relationship exists between the input vector and the desired response at any point in time, and this improves the likelihood that an adaptive solution will result from training. The range in latencies of mossy fiber input also enhances the likelihood of a solution, because it allows the dynamic properties of the pursuit task to be represented. The visual pathway samples diverse retinal signals that occurred 80-120 ms earlier, and the proprioceptive/efference copy pathway samples diverse eye motion signals that occurred 0-40 ms earlier. With 6,000 different parallel fibers, the number of unique patterns of potential input is quite enormous, and the trajectories that are generated can be quite complex (see Fig. 5B).
The parallel fibers then synapse onto two Purkinje units: one that controls horizontal and another that controls vertical eye motion on the basis of the reported division of the flocculus into horizontal and vertical microzones (e.g., Sato and Kawasaki 1991
). The Purkinje units generate eye motion via circuitry distinct from the cerebellar network. We follow the tradition that the cerebellar flocculus/parafloculus is one of several premotor structures generating eye velocity commands that converge on common brain stem circuitry that converts these commands into signals driving the extraocular muscles (see reviews by Fuchs et al. 1985
; Robinson 1981a
,b
). We assume that premotor circuits in the brain stem perform "neural integration" (see Arnold and Robinson 1991
; Robinson 1989
) and saccade generation (e.g., Arai et al. 1994
; Dean et al. 1994
; Lefevre and Galiana 1992
) and we model these functions with the use of simple equations. Thus we have not attempted to duplicate past modeling efforts that provide more detailed explanations of how these processes might be implemented. We have also restricted the scope of the model to the generation of eye movements when the head is fixed. In future work, we hope to account for vestibuloocular behaviors either via extensions of our cerebellar network or the incorporation of other network modeling results into the model (e.g., Anastasio and Robinson 1989
).
Comparison between monkey and model performance
In this first study we chose to explore the model's ability to learn to track both the simple and more complex one- and two-dimensional target motions that have been the subject of recent experimental studies in our laboratory (Kettner et al. 1996
; Leung and Kettner 1997
). These experiments provide a good test of the model's capabilities because the monkeys' pursuit along these trajectories requires both predictive and feedback tracking, and this work is compatible with other studies of smooth pursuit (see INTRODUCTION). The model simulations produced results similar to the monkey data: the average phase difference between eye and target during complex sum-of-sines pursuit was 20 ms for the monkey and 8 ms for the model. These values were much shorter than response delays after right-angle changes in target direction during predictive circular pursuit, which averaged 90 ms for the monkey (Leung and Kettner 1997
) and 80 ms for the model. Thus, for practiced target trajectories, both the monkey and the model can alter eye motion to follow a predictable target well before visual information about target motion can be processed by the pursuit system. The model also showed the lag-lead relationship that we (Kettner et al. 1995
) and others (e.g., Barnes et al. 1987
; Dallos and Jones 1963
; Yasui and Young 1984
) have observed for the high- versus low-frequency components of sum-of-sines trajectories when these components were on the same axis.
In addition, the model generated appropriate responses to perturbations from ongoing circular pursuit. The model's response to an abrupt freezing of horizontal target motion (bottom of the circle in Fig. 7A) is delayed by a visual reaction time. This is because the model's training experience consisted of a repeating trajectory that continued in its circular orbit in three of four cases. The occurrence of the perturbation was therefore not reliably predicted by visual signals 80-120 ms earlier, or by movement-related signals 0-40 ms earlier. The network was forced to rely on visual feedback that followed the perturbation by an average of 100 ms. In contrast, the subsequent resumption of horizontal target motion (top of the circle in Fig. 7A) followed the vertical segment of the target trajectory in all cases during training. Thus there was a reliable predictive relationship between parallel fiber activity and target trajectory that both the model and the monkey learned to exploit to reinitiate horizontal eye movement without having to rely on visual feedback.
Although the monkey behavior during perturbed circular pursuit is qualitatively similar to that of the model (Fig. 7), there are also differences. The model better anticipates the resumption of horizontal target motion at the top of the circle, and deviates from the circular trajectory during unperturbed cycles to a greater extent than the monkey. We conjecture that these and other differences can be explained by the much larger number of neurons in the monkey's cerebellar network, and its exposure to a much wider range of trajectories than the model. In terms of training time, the perturbation paradigm was one of the most difficult for the model to master, with a difficulty similar to the fish pattern (Fig. 5B). This is because the model had to learn two patterns, a circle and a half circle, and switch between the two at the point of perturbation. The difficult, but important, issue of how the model stores multiple trajectories and how it switches from one trajectory to another will provide one focus for future study. It should also be emphasized that the model does not include an explicit representation of the nonadaptive, feedback control systems hypothesized to exist by several researchers (e.g., Robinson et al. 1986
; Young et al. 1968
). This was done intentionally to determine how well the model would perform without them, but the addition of such a feedback loop external to our cerebellar network could improve the model's performance. Simultaneous control by several systems may correspond to what actuall