JN AJP: Gastrointestinal and Liver Physiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J Neurophysiol 88: 942-953, 2002;
0022-3077/02 $5.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (32)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Mehta, B.
Right arrow Articles by Schaal, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mehta, B.
Right arrow Articles by Schaal, S.

The Journal of Neurophysiology Vol. 88 No. 2 August 2002, pp. 942-953
Copyright ©2002 by the American Physiological Society

Forward Models in Visuomotor Control

Biren Mehta1,3 and Stefan Schaal2,3

 1Department of Biomedical Engineering, HNB-001 and  2Computer Science and Neuroscience, HNB-103, University of Southern California, Los Angeles, California 90089-2520; and  3Kawato Dynamic Brain Project (Exploratory Research for Advanced Technology/Japan Science and Technology Corporation), Soraku-gun, 619-02 Kyoto, Japan


    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

Mehta, Biren and Stefan Schaal. Forward Models in Visuomotor Control. J. Neurophysiol. 88: 942-953, 2002. In recent years, an increasing number of research projects investigated whether the central nervous system employs internal models in motor control. While inverse models in the control loop can be identified more readily in both motor behavior and the firing of single neurons, providing direct evidence for the existence of forward models is more complicated. In this paper, we will discuss such an identification of forward models in the context of the visuomotor control of an unstable dynamic system, the balancing of a pole on a finger. Pole balancing imposes stringent constraints on the biological controller, as it needs to cope with the large delays of visual information processing while keeping the pole at an unstable equilibrium. We hypothesize various model-based and non-model-based control schemes of how visuomotor control can be accomplished in this task, including Smith Predictors, predictors with Kalman filters, tapped-delay line control, and delay-uncompensated control. Behavioral experiments with human participants allow exclusion of most of the hypothesized control schemes. In the end, our data support the existence of a forward model in the sensory preprocessing loop of control. As an important part of our research, we will provide a discussion of when and how forward models can be identified and also the possible pitfalls in the search for forward models in control.


    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

Successful motor control requires issuing motor commands for all the muscles of a movement system at the right time and of correct magnitude in response to internal and external sensations. Thus given a particular task goal, the problem of motor control can generally be formalized as finding a task specific control policy pi  
<B>u</B>(<IT>t</IT>)<IT>=&pgr;</IT>(<B>x</B>(<IT>t</IT>)<IT>, </IT><IT>t</IT><IT>, &agr;</IT>) (1)
where u denotes the vector of motor commands, x the vector of all relevant internal states of the movement system and external states of the environment, t represents the time parameter, and alpha  stands for the vector of open parameters that need to be adjusted during learning, e.g., the weights of a neural network. The formulation in Eq. 1 is very general and can be applied to any level of analysis, such as a detailed neuronal level or a more abstract joint angular level. If the function pi  was known, the task goal could be achieved from every state x of the movement system. This theoretical view allows approaching motor control as the more formal question of how control policies are represented and learned in biology.

From a computational point of view, two basic strategies exist to generate the function pi . In "direct control" (Fig. 1A) (e.g., Miall 1995), pi  would be one single "black box" that directly maps sensations to actions, without meaningful intermediate steps and, in particular, without any attempts to explicitly model the movement system or task. For example, Barto, Fagg, Sitkoff, and Houk (1999) suggested such a direct control strategy as a model for motor learning in the cerebellum.



View larger version (32K):
[in this window]
[in a new window]
 
Fig. 1. A: sketch of a direct controller. B: sketch of an indirect controller.

The alternative to direct control is "indirect control." Instead of one step, indirect control explicitly employs multiple information-processing steps to build the control policy, and in particular it employs internal models. Figure 1B shows a typical representative of an indirect controller as commonly suggested for robotic and biological control (e.g., Wolpert 1997), including separate stages for trajectory planning, feedback,1 and feedforward control based on an inverse dynamics model.

Whether and when the brain employs indirect or direct control is an important scientific question for our understanding of the CNS because the learning mechanisms for the two methods are quite different. Learning indirect control is normally accomplished by (self-) supervised learning, whereas learning direct control falls into the domain of reinforcement learning, a learning method that is much more time consuming and harder to apply to large-scale learning tasks. Thus one can expect that the functional organization of motor control in the CNS would differ significantly depending on whether direct or indirect control was employed (Doya 2000). A crucial question in this discussion is whether the CNS employs internal models in motor control. Internal models are neural substrates that model input/output relationships and their inverses of kinematic and dynamic processes of the motor system and the environment---they are a key ingredient in indirect control. Thus identifying internal models in biological motor control constitutes an essential step toward identifying the brain's control and learning strategies.

In this paper, we will address the question of whether the human brain employs internal forward models to overcome the long sensory delays encountered in visuomotor control, i.e., whether the brain employs direct or indirect control. For this purpose, the feedback pathways in Fig. 1 should be conceived of as those of visual feedback, while other faster feedback loops like those on a spinal level are assumed to be part of the dynamics of the "movement system" box. Evidence for internal models in the brain has grown increasingly stronger (e.g., see Desmurget and Grafton 2000; Kawato 1999 for recent reviews). In particular, inverse dynamics models in oculomotor control (Gomi et al. 1998; Shidara et al. 1993) and arm control (Ebner 1998; Gribble and Ostry 1999; Shadmehr and Mussa-Ivaldi 1994; Thach 1998) have become rather well accepted. On the other hand, evidence for forward models in motor control is more complicated to establish because the output of forward models---a prediction of an event in the future---is not directly used as an output to the motor system but rather indirectly to facilitate additional control processes. Thus the forward model's output can rarely be observed explicitly in behavior and may be hard to find electrophysiologically without an apriori hypothesis where to locate it in the distributed sensorimotor control loops of the brain.

In a generic control diagram, Fig. 2 illustrates "test points" that were chosen in behavioral investigations of forward models in the past. Most commonly, subjects interact with a manipulated object, and the task-level command uTask, which arises as a consequence of the current state x of the movement system, is monitored. For example, Flanagan and Wing (1997) used a hand-held load as a manipulated object and concluded from the zero-lag adjustment of grip force (i.e., the task level command) as a function of load force that humans used a forward model to anticipate the load forces; in a related study, similar conclusions were drawn for adjusting the normal force exerted on a sliding object (Flanagan and Lolley 2001). Miall and his colleagues (Miall 1986; Miall et al. 1993) investigated manual tracking movement of sinusoidal targets on a computer screen with a joystick. The tracking characteristics of subjects when exposed to additional delays in the control loop suggested a forward model based control strategy. Bushan and Shadmehr (1999) employed virtual force fields as the manipulated object and concluded from the learning behavior of their subjects that a forward-model based controller gave the most suitable explanation of their data. An alternative method of investigation was suggested by Wolpert et al. (1995). These authors asked subjects to estimate, by introspection, their Cartesian hand positions at the end of reaching movements that were executed without visual feedback under force perturbations. From the systematic structure of the error in position estimates as a function of movement duration, the authors inferred a forward model-based sensory preprocessing similar as in a Kalman filter. Using a similar introspection strategy in a more perceptual study, Blakemore et al. (Blakemore et al. 1999, 2001; Blakemore, Wolpert, and Frith 2000) hypothesized a forward model in the sensory preprocessing loop that allows canceling of self-generated tactile stimuli as opposed to externally generated stimuli.



View larger version (35K):
[in this window]
[in a new window]
 
Fig. 2. A generic control diagram for motor control with a manipulated object. The "biological system" box denotes the entire movement system, including the CNS. Inside, we distinguish between a possible sensory preprocessing block (Wolpert et al. 1995), the control policy that generates the motor commands for the musculoskeletal system (called "movement system"), and a possible sensory delay of t in the feedback loop. The movement system interacts with a manipulated object, which has its own dynamics and task state.

Despite the elegant experimental setups in all these studies, none of them could rule out that subjects used a direct control strategy instead of a model-based indirect approach. The theoretical justification for this statement results from the fact that every indirect controller can be replaced by a direct controller that has exactly the same input/output function.

That this statement is true can be illustrated from a neural network metaphor. Given an indirect control policy pi  that maps x to u (cf. Eq. 1), a neural network can be trained on data pairs (x, u) drawn from this function. Because well-chosen neural networks can theoretically approximate arbitrary functions with arbitrary accuracy (e.g., Bishop 1995), the network can be assumed to learn the indirect policy perfectly after some time. However, the network retains none of the modularity of the indirect controller and thus becomes a direct controller.2 This fact poses significant limitations on investigations of forward models in behavioral motor control experiments---most often they are simply not identifiable. There is, however, one major difference between a direct and a forward model-based controller: the forward model in the controller can be used for filling in sensory inputs in case that they are missing, i.e., it can predict. This distinction motivated the experimental procedure of the present study.

The goal of the present study was to probe the existence of forward models in a "blank-out" experiment (e.g., Hah and Jagacinski 1994, Magdaleno et al. 1969), i.e., an experiment that randomly blocked the vision of the manipulated object. The manipulated object was a pole, and the task was to balance the pole stably at a fixed setpoint. Pole-balancing was chosen as an experimental task because, in contrast to, for instance, sinusoidal tracking tasks, memorized motor commands cannot be used to achieve the task goal---pole-balancing crucially depends on closed-loop visual feedback due to the pole being balanced at an unstable equilibrium and the stochastic perturbations arising in normal motor behavior. Additionally, pole balancing can be analyzed in terms of linear control theory because the realizable balancing regime is approximately linear.

In balancing a real pole on a finger and a virtual pole on a computer screen, we investigated whether the CNS employs one of the control hypotheses illustrated in Fig. 3. This set of hypotheses constitutes the core methods that have been suggested in previous work for visuomotor control with delays in both biological and control theoretic motor control. In the same way as Fig. 2, this figure distinguishes between a separate sensory preprocessing and a control stage, just that it fills in several detailed suggestions of the control theoretic implementation of these stages either with or without a forward model.



View larger version (118K):
[in this window]
[in a new window]
 
Fig. 3. Control hypotheses of our experiments that can be inserted for the "control policy" and "sensory preprocessing" blocks of Fig. 2. A-D: different control policies that can be combined with the sensory preprocessing strategies (E and F).

Sensory preprocessing can essentially be accomplished in two different ways: either model based (Fig. 3F) or non-model based (Fig. 3E). The archetypal example of model-based sensory preprocessing is a Kalman filter3 (Kalman 1960), as illustrated in Fig. 3F. Any non-model-based sensory processing, e.g., filtering by temporal averaging, etc., is not relevant for our arguments such that we left the box in Fig. 3E an identity mapping.

Control policies are more complex. Given that the pole-balancing task is an approximately linear control task, all control policies in Fig. 3, A-D, are discussed as linear control systems; the desired behavior of the controllers is to keep the pole in particular desired state, i.e., an upright angular position at a particular Cartesian position in work-space.

Figure 3A shows the simplest possible controller, a linear negative feedback controller that does not compensate for the delays in the sensory input. This controller is rather sensitive to the magnitude of the control gains K as it can become unstable due to the delays.

Figure 3B illustrates a direct controller that augments its inputs by efference copies called a "tapped delay line" controller. The augmentation of the input representation by a history of previous motor commands is control theoretically a powerful tool to estimate hidden-state variables, e.g., the actual state of the movement system from delayed sensory input (Priestley 1981). Tapped delay-line control (cf. METHODS) can compensate for delays very well without the need of a forward model.

An alternative delay-compensated controller is shown in Fig. 3C. A forward model is employed to predict out the delays in the sensory input. For this purpose, efference copies of the commands during the delay period need to be given to the forward model. Similar to the direct control in Fig. 3B, very good delay compensation can be accomplished.

A last control hypothesis is the Smith Predictor (Fig. 3D), suggested by Miall et al. (1993). The Smith Predictor compensates for sensory delays by generating control commands out of a "mental simulation" of the pole based on a forward-dynamics model. This strategy allows for a very fast internal feedback loop, indicated by the thick arrows in Fig. 3D. An outer loop makes sure that the mental simulation remains synchronized with the real pole by employing a model of the delay in the sensory feedback. This control scheme is elegant and can deal with long sensory delays. More details can be found in Miall et al. (1993).

The hypotheses in Fig. 3 allow for eight different combinations of sensory preprocessing and control. The goal of the present paper is to provide empirical evidence of which combination is the most likely in human behavior. Our results will provide evidence in favor a forward model in the sensory preprocessing loop (Fig. 3F) and against delay uncompensated control (Fig. 3A) and the Smith Predictor (Fig. 3D). However, we will not be able to make any further distinction whether simple predictive control (Fig. 3C) or tapped delay-line control (Fig. 3B) is employed by the CNS, a constraint that we believe is shared with many other studies on identifying model-based control and has not been properly stressed before.


    METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

Participants

A total of 11 volunteers from our laboratory participated in the experiments. Their age ranged between 22 and 44 years. The ethics committee had approved the experiments and subjects gave their informed consent prior to their inclusion in the study. In addition, a 7-df anthropomorphic robot arm was programmed to balance the actual pole using various control strategies. The robot served as a control subject whose control strategy and sensorimotor delays were known. It participated in experiments 1 and 2 in the following text, in exactly the same way as the human subjects.

Experiments

Two kinds of experiments were conducted. Actual pole balancing required that participants balanced a pole (a 1-m-long cylindrical wooden rod with 0.01-m diam and a small metal weight attached at the upper tip to adjust the eigenfrequency) in three-dimensional (3D) space while standing upright in an unconstrained laboratory environment (Fig. 4A). Virtual pole balancing involved balancing a simulated two-dimensional pole on a computer screen with a manipulandum (Fig. 4B); the simulated pole had the same eigenfrequency as the actual pole. The following experimental conditions were tested in both virtual and actual pole balancing, unless mentioned otherwise.



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 4. The experimental setup of actual pole balancing (A) and virtual pole balancing (B).

EXPERIMENT 1: NORMAL BALANCING. Subjects balanced the pole without any experimental manipulations and were instructed to keep the pole as accurately as possible at a balancing setpoint xdesired. In actual balancing, subjects used a table tennis racket to support the lower end of the pole in order to exclude the influence of haptic perception as much as possible (Fig. 4). Subjects reported this manipulation as inconvenient but adjusted to it within about 1 min of training. In pilot tests, we also confirmed that it is impossible to balance a pole blindfolded, i.e., solely based on haptic information. The balancing setpoint in actual balancing was to keep the pole's angular position zero (i.e., a vertical pole) and the pole's horizontal position as close as possible to a point marked on the floor. In virtual balancing, the pole was to be kept vertically at a line in the middle of the computer screen. Ten 30-s trials were collected from each subject.

EXPERIMENT 2: PERTURBATION TRIALS. While subjects balanced the pole, a small random perturbation was delivered to the tip of the pole. In actual pole balancing, the experimenter manually delivered the perturbation from behind the subject with a long wooden rod (Fig. 4A). The rod very briefly touched a thin, very lightweight aluminum extension tube that was attached to the pole tip. Because subjects wore a baseball hat with a shield over their eyes, the perturbation was outside of their visual field. The effectiveness of the baseball hat was verified with fake perturbation movements of the experimenter that did not touch the pole. In virtual pole balancing, the computer program added a brief force pulse to the tip of the pole. At least 20 perturbations were collected from each subject in as many 30-s trials as needed.

EXPERIMENT 3: BLANK-OUT TRIALS. Blank-out trials were only conducted in virtual pole balancing. In blank-out balancing, the pole disappeared from the computer screen at random times for a randomly chosen period between 450 and 550 ms while the subject was balancing. During the blank-out, the computer continued simulating the pole dynamics as in normal trials, using the manipulandum's acceleration as control input. Subjects were instructed to ignore the blank-out and to continue balancing. To avoid that balancing was lost during blank-outs, blank-outs were only triggered if the pole was not too close to either end of the computer screen and if the pole was not too close to an upright still position. The latter pole state resulted in a pole behavior that was highly stochastic, i.e., unpredictable, as any little perturbation could make the pole fall to the left or to the right. Ten 30-s trials were collected from each subject, resulting in about 30 blank-outs per subject.

Data recording

ACTUAL POLE BALANCING. As illustrated in Fig. 4A, the 3D Cartesian positions of the pole were recorded from two color markers attached to the pole, tracked by a color vision system (QuickMag, Japan) at 60-Hz sampling frequency. During perturbation trials, the onset of the perturbation was recorded from a touch sensor, mounted to the distal end of a lightweight rod, and converted by a 12-bit A/D board in a VME bus. Two Motorola MVME68040 CPUs in the VME bus, running the real-time operating system VxWorks (Windriver Systems), collected both vision and A/D data and stored it on a SunSparc workstation for further processing.

VIRTUAL POLE BALANCING. The pole dynamics of the actual pole in the preceding text was simulated on a Macintosh G3 computer using Euler integration of the equations of motion at 600 Hz. The equations of motion are given by
<A><AC>&thgr;</AC><AC>˙</AC></A>=<FR><NU><IT>mc<SUB>m</SUB>g</IT></NU><DE><IT>I</IT></DE></FR> <FENCE><IT>sin &thgr;+</IT><FR><NU><IT>u</IT></NU><DE><IT>g</IT></DE></FR><IT> cos &thgr;</IT></FENCE><IT>=&lgr;</IT><FENCE><IT>sin &thgr;+</IT><FR><NU><IT>u</IT></NU><DE><IT>g</IT></DE></FR><IT> cos &thgr;</IT></FENCE> (2)

<IT><A><AC>x</AC><AC>˙</AC></A>=u</IT>
where x and theta  are the position of the lower end of the pole and its angular orientation (cf. Fig. 4B), respectively, m is the pole's mass, cm the pole's center of mass, I the pole inertia, and g the gravitational constant. For notational convenience, we combined these constants in the variable lambda , corresponding to the squared eigenfrequency of the pole. The mass of the pole was determined with a scale, while center of mass and inertia were determined from standard textbook formulae for the cylindrical wooden rod and for the mass of the weight at the top of the pole, which was treated as a point mass. This procedure resulted in lambda  = 13.2 s-2 and was confirmed in a pendulum experiment that determined the eigenfrequency of the pole experimentally to be f = 3.6 s-1, which compared well to the theoretical value of ftheory = <RAD><RCD>&lgr;</RCD></RAD> = 3.62 s-1. On a 19-in screen, the 1-m-long pole appeared to be 0.25 m long. The graphics update rate was 75 Hz. A 0.75-m long wooden rod, with one end attached to the vertically oriented axis of a high-resolution rotary optical encoder, centered right under the computer screen, served as a manipulandum to move the base of the pole (cf. Fig. 4B). Because the angular displacement of the rod remained in the ±30° range, the motion of the tip of the rod was approximately linear and was transformed into the pole's motion on the screen in a 1:1 scale. Transverse motion of the rod correlated with movement of the bottom end of the pole. Angular position and velocity of the optical encoder were collected by a Motorola 68332 microcomputer at 600-Hz sampling frequency. This processor possesses a special hardware feature to extract high-resolution velocity signals from optical encoders directly. The velocity was differentiated and filtered with a second-order Butterworth filter (cutoff frequency, 15 Hz) to obtain accelerations. Position, velocity, and acceleration were communicated to the Macintosh G3 computer through a serial connection at 200 Hz. Acceleration of the manipulandum served as the motor command u in Eq. 2 to the pole. Subjects required practicing for about 1 h until they felt comfortable with the task, but could then achieve 30-s-long balancing trials without problems.

Experimental procedure

Participants were initially asked to practice the pole balancing until they felt comfortable that they could sustain 30-s-long trials with perturbations. In actual pole balancing, subjects first performed experiment 1, then experiment 2. In virtual pole balancing, experiment 3 was added to this sequence. Trials in which the pole was dropped were repeated---dropping happened in less than 5% of all trials. The robot participated as a subject in actual pole balancing in Experiments 1 and 2. It was programmed to use the tapped-delay line control strategy (Fig. 3, B and E) in one set of trials, and the forward model-based controller (Fig. 3, C and F) in a second set of trials.

Data analysis

In both actual and virtual pole balancing, the data of each trial contained the pole's position, velocity, acceleration, angular position, and angular velocity, collected at 60 Hz, the frequency of NTSC video signals. For virtual pole balancing, these five quantities were scalar values. For actual pole balancing, position, velocity, and acceleration were 3D data and the pole's angular position and angular velocity were two-dimensional vectors. We neglected the vertical movement component in actual pole balancing and just treated 3D pole balancing as independent control in two orthogonal planes, the saggital and the frontal parallel plane. Due to the small deviations of the pole from the vertical in pole balancing (about ±12° maximum across all conditions and subjects), this simplification is mathematically justified. Thus, every actual pole balancing trial generated two independent data sets, one for saggital and one for frontal-parallel control. From this point onwards, actual and virtual pole balancing underwent the same data analyses.

DATA FILTERING. All data traces were zero-lag second-order Butterworth filtered with a cutoff frequency of 4 Hz. This relatively low cutoff frequency was the optimal cutoff frequency selected by an optimization process over a range of possible cutoff frequencies. The optimization criterion was to find the best reconstruction of the linearized discrete pole dynamics from the data; linearization was justified since subjects remained in a ±12° angular range of the vertical pole position. The linearized discrete pole dynamics obeys the equations
<B>x</B><SUP><IT>n</IT><IT>+1</IT></SUP><IT>=</IT><B>Ax</B><SUP><IT>n</IT></SUP><IT>+</IT><B>bu</B><SUP><IT>n</IT></SUP>
where
<B>x</B><IT>=</IT><FENCE><AR><R><C><IT>x</IT></C></R><R><C><IT><A><AC>x</AC><AC>˙</AC></A></IT></C></R><R><C>&thgr;</C></R><R><C><A><AC>&thgr;</AC><AC>˙</AC></A></C></R></AR></FENCE>, <B>A</B><IT>=</IT><FENCE><AR><R><C>1</C><C>&tgr;</C><C>0</C><C>0</C></R><R><C>0</C><C>1</C><C>0</C><C>0</C></R><R><C>0</C><C>0</C><C>1</C><C><FR><NU>1</NU><DE>60</DE></FR></C></R><R><C>0</C><C>0</C><C>&lgr;&tgr;</C><C>1</C></R></AR></FENCE>, <B>b</B><IT>=</IT><FENCE><AR><R><C>0</C></R><R><C><FR><NU>1</NU><DE>60</DE></FR></C></R><R><C>0</C></R><R><C>&lgr;&tgr;/<IT>g</IT></C></R></AR></FENCE><IT>, &tgr;=</IT><FR><NU><IT>1</IT></NU><DE><IT>60</IT></DE></FR><IT> s, </IT> (3)

<IT>&lgr;=13.2 </IT><FR><NU><IT>1</IT></NU><DE><IT>s<SUP>2</SUP></IT></DE></FR>
Using the collected data, the constants A and b can be estimated by linear regression analysis of the function xn+1 = f(xn, un) + c, where the command un corresponds to the acceleration of the lower end of the pole (i.e., the finger tip in actual pole balancing) and c accounts for a constant offset. The offset is due to the fact that the Cartesian coordinate system of our data recording did have its origin at the instructed set point xdesired of pole balancing. The optimal filter cutoff frequency was the one that resulted in an estimated  and &bcirc; that were the closest to the original values in Eq. 3---we achieved an almost perfect match in this way. It should be noted that the cutoff frequency parameter could be increased significantly without any effect on the results of this paper; we tried up to 7-Hz cutoff frequencies; just the estimated  and &bcirc; were not as close to the known pole dynamics anymore for these higher cutoff frequencies.

TRANSLATION OF RECORDED DATA TO ZERO EQUILIBRIUM POINT. From the regressed coefficients Â, &bcirc;, and c of the previous analysis, obtained subject specifically from all normal pole balancing trials, the balancing setpoint xdesired of each subject can be determined empirically to account for the possibility that subjects did not exactly pursue the instructed setpoint that was given by the experimenter. At the setpoint, which is an equilibrium point of pole balancing, we have u = 0, i.e., no acceleration of the pole is required as the pole should be standing still in a vertical position, and also xn+1 = xn as the pole should not be moving. From these two conditions, inserted into Eq. 3, one obtains the equation for the setpoint as xdesired = (I - Â)-1c. We subtracted the empirically determined xdesired per subject from all recorded data x such that after this translation, the equilibrium point for every subject was at xdesired = 0. We also inspected the empirically determined setpoint whether subjects stayed close to the instructed set point and always found very good accordance with out instructions.

POWER SPECTRUM. To test for intermittency in subjects' control strategy, the unfiltered velocity data of the base of the pole was processed by FFT analysis (Matlab's Welch's averaged periodogram method, de-trended, using a Hanning window size of 512 data points). FFT analyses were performed per subject by pooling all normal trials of a subject.

VISUOMOTOR DELAYS. The data from perturbation trials served to calculate the visuomotor delay for every subject. The visuomotor delay was taken to be the reaction time from the onset of the perturbation, as recorded with the movement data, to the time until the rectified jerk (the derivative of acceleration) of the lower tip of the pole exceeded normal behavior, expressed as a threshold value. The threshold was determined subject specifically to be two standard deviations of the jerk of normal pole balancing trials that had no major instabilities. All visuomotor delays were visually inspected for possible errors of this procedure. Perturbations whose reaction time could not be determined clearly were excluded from our analyses. Data from our robot subject, whose visuomotor delays were known, served to ensure the correctness of this procedure.

EXTRACTING A DELAY-UNCOMPENSATED CONTROLLER. A delay uncompensated linear controller uses delayed state information to generate the commands according to un = Kxinternal, where xinternal is the delayed state information that the CNS has access to. Because xinternal is experimentally not accessible in behavioral experiments and may be a filtered and transformed version of the actually perceived state, we hypothesized that xinternal could be any recorded state between the current time and the current time minus the visuomotor delay. More formally, for the 60-Hz discretized pole-balancing system, we set xinternal = xn-r, where r is in  {0, ... , d} and d is the discrete time visuomotor delay, d = ceil((visuomotor delay)/60 s) [where the Matlab function ceil(z) rounds its argument z to the next highest integer value]. We used linear regression analysis to find the control gains K for every r given the linear control strategy un = Krxn-r.

For every Kr a linear stability analysis determined the maximal permissible delay that the controller with Kr could tolerate. For this purpose, we can formulate the discrete linearized dynamical system (Eq. 3) with d discrete sensory delay ticks as an augmented dynamical system
<B><A><AC>x</AC><AC>˜</AC></A></B><SUB><IT>n</IT><IT>+1</IT></SUB><IT>=</IT><B><A><AC>A</AC><AC>˜</AC></A><A><AC>x</AC><AC>˜</AC></A></B><SUB><IT>n</IT></SUB><IT>+</IT><B><A><AC>B</AC><AC>˜</AC></A>u</B><SUB><IT>n</IT></SUB><IT>=</IT><B><A><AC>A</AC><AC>˜</AC></A><A><AC>x</AC><AC>˜</AC></A></B><SUB><IT>n</IT></SUB><IT>+</IT><B><A><AC>B</AC><AC>˜</AC></A></B>(<B>K</B><SUB><IT>r</IT></SUB><B>z</B><SUB><IT>n</IT></SUB>)<IT>=</IT><B><A><AC>A</AC><AC>˜</AC></A><A><AC>x</AC><AC>˜</AC></A></B><SUB><IT>n</IT></SUB><IT>+</IT><B><A><AC>B</AC><AC>˜</AC></A>K</B><SUB><IT>r</IT></SUB><B><A><AC>C</AC><AC>˜</AC></A><A><AC>x</AC><AC>˜</AC></A></B><SUB><IT>n</IT></SUB><IT>=</IT>(<B><A><AC>A</AC><AC>˜</AC></A></B><IT>+</IT><B><A><AC>B</AC><AC>˜</AC></A>K</B><SUB><IT>r</IT></SUB><B><A><AC>C</AC><AC>˜</AC></A></B>)<B><A><AC>x</AC><AC>˜</AC></A></B><SUB><IT>n</IT></SUB><IT>=&Lgr;</IT><B><A><AC>x</AC><AC>˜</AC></A></B><SUB><IT>n</IT></SUB>

<B>z</B><SUB><IT>n</IT></SUB><IT>=</IT><B><A><AC>C</AC><AC>˜</AC></A><A><AC>x</AC><AC>˜</AC></A></B><SUB><IT>n</IT></SUB> (4)
where
<B><A><AC>x</AC><AC>˜</AC></A></B><SUB><IT>n</IT></SUB><IT>=</IT><FENCE><AR><R><C><B>x</B><SUB><IT>n</IT></SUB></C></R><R><C><B>x</B><SUB><IT>n</IT><IT>−1</IT></SUB></C></R><R><C><B>x</B><SUB><IT>n</IT><IT>−2</IT></SUB></C></R><R><C>…</C></R><R><C><B>x</B><SUB><IT>n</IT><IT>−</IT><IT>d</IT></SUB></C></R></AR></FENCE><IT>; </IT><B><A><AC>A</AC><AC>˜</AC></A></B><IT>=</IT><FENCE><AR><R><C><B>A</B></C><C><B>0</B></C><C><B>0</B></C><C>…</C><C><B>0</B></C></R><R><C><B>I</B></C><C><B>0</B></C><C><B>0</B></C><C>…</C><C><B>0</B></C></R><R><C><B>0</B></C><C><B>I</B></C><C><B>0</B></C><C>…</C><C><B>0</B></C></R><R><C>…</C><C>…</C><C>…</C><C>…</C><C>…</C></R><R><C><B>0</B></C><C>…</C><C>…</C><C><B>I</B></C><C><B>0</B></C></R></AR></FENCE><IT>; </IT><B><A><AC>B</AC><AC>˜</AC></A></B><IT>=</IT><FENCE><AR><R><C><B>B</B></C></R><R><C><B>0</B></C></R><R><C><B>0</B></C></R><R><C>…</C></R><R><C><B>0</B></C></R></AR></FENCE><IT>; </IT>

<B><A><AC>C</AC><AC>˜</AC></A></B><IT>=</IT><FENCE><AR><R><C><B>0</B></C><C>…</C><C><B>0</B></C><C><B>I</B></C></R></AR></FENCE>
For every Kr, we examined increasing integer values of d, starting with d = 0, up to the point where Lambda  became unstable, as indicated by the maximal absolute eigenvalue of the matrix Lambda  exceeding 1. The controller Kr with the largest tolerable delay was assumed to be the most robust delay uncompensated control strategy. The coefficient of determination of the linear regression was simultaneously computed to assess the quality of fit of the regressed controller.

EXTRACTING A TAPPED-DELAY LINE CONTROLLER. Tapped-delay line control augments the state x of the pole balancing system with efference copies of the motor commands u that were sent out during the delay period. Assuming a linear controller, a command at discrete time step n, un, becomes a function of the delayed pole state xn-d and the previous motor commands un-r, r is in  {1, ... , d}, in the form of un = KTD[xn-dT un-dT un-d+1T  ···  un-1T]T = KTDx<UP><SUB><IT>aug</IT></SUB><SUP><IT>n</IT></SUP></UP>. Linear regression analysis was employed to regress the control gains KTD. For each coefficient in KTD statistical significance was analyzed by t-tests. As the efference copies in the augmented state xaug can make the regression ill conditioned, we employed ridge regression (Myers 1990) to achieve numerical robustness. In ridge regression, KTD is obtained from the regression formula KTD = (X<UP><SUB><IT>aug</IT></SUB><SUP><IT>T</IT></SUP></UP>Xaug varepsilon I)-1X<UP><SUB><IT>aug</IT></SUB><SUP><IT>T</IT></SUP></UP>U, where the rows of the matrix Xaug contain all collected data points xaug,i, and the rows of U contain the corresponding motor commands ui. The ridge regression parameter varepsilon  was optimized across all subjects to minimize the mean-squared PRESS residual error,4 i.e., the leave-one-out cross validation error (Myers 1990), formalized as:
<IT>J</IT><IT>=</IT><LIM><OP>∑</OP><LL><IT>i</IT><IT>=1</IT></LL><UL><IT>n</IT></UL></LIM> <FR><NU>(<B>u</B><SUB><IT>i</IT></SUB><IT>−</IT><B>K</B><SUB><IT>TD</IT></SUB><B>x</B><SUB><IT>aug</IT><IT>,</IT><IT>i</IT></SUB>)<SUP><IT>2</IT></SUP></NU><DE>(<IT>1−</IT><B>x</B><SUP><IT>T</IT></SUP><SUB><IT>aug,</IT></SUB><B>Px</B><SUB><IT>aug</IT><IT>,</IT><IT>i</IT></SUB>)<SUP><IT>2</IT></SUP></DE></FR><IT> where </IT><B>P</B><IT>=</IT>(<B>X</B><SUP><IT>T</IT></SUP><SUB><IT>aug</IT></SUB><B>X</B><SUB><IT>aug</IT></SUB><IT>+ϵ</IT><B>I</B>)<SUP><IT>−1</IT></SUP>
For this optimization, we searched 100 values for varepsilon  in the interval varepsilon  is in  [1, 10-10], equally spaced on a logarithmic scale, and picked the one that minimized the cost J.

BLANK-OUT DATA ANALYSIS. From the blank-out trials, all data falling into the blank-out window were extracted. For this purpose, we assumed that the first data point of a blank-out interval was the one that occurred at time t + d, where t is the discrete start time of the blank-out and d the discrete visuomotor delay time. The last data point of a blank-out interval was t + d + p, where p is the random duration of the blank-out. Blank-out data from all blank-out trials were pooled per subject, and a tapped-delay line controller was extracted from this data as described in the preceding text. Mean and SDs were computed across subjects. To examine the influence of the duration of a blank-out on our data analysis, we also performed the blank-out analysis by using data from only a subset of the individual blank-outs, which was simply accomplished by discarding all data of each blank-out beyond a certain time duration. For instance, to perform a blank-out analysis with 200-ms-long blank-out windows, we only used data in the interval [t + d, t + d + 200 ms].


    RESULTS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

In the INTRODUCTION, several different control policies and sensory preprocessing strategies were proposed. The results section is structured such that we can eliminate several of these hypotheses to provide evidence for or against predictive forward models in biological motor control.

Delay uncompensated control

Delay uncompensated control uses a linear feedback controller that does not compensate for the delays in sensory inputs. Because in previous work, it was discovered that delayed sensory inputs can elicit intermittency in the control strategy (Miall 1996), i.e., motor commands are only sent out at certain time intervals, we examined the power spectrum of the (unfiltered) velocity of the base of the pole in normal trials with FFT analyses (cf. METHODS). The power spectrum (not illustrated here) showed one broad peak, roughly centered at 0.8 Hz, and no peaks at higher frequencies, which would be the hallmark of intermittency (Miall 1996). In both robot data and human data, the power spectrums looked alike, and we knew that the robot used a non-intermittent controller. Based on these findings, intermittent control was excluded from the control hypotheses and the continuous control hypotheses from Fig. 3 could be examined.

Assuming a delay uncompensated controller (Fig. 3A), each subject's most robust control gains were found as described in METHODS. For both virtual and actual pole balancing, gains were an average across 10 normal pole-balancing trials---in actual pole balancing, gains were additionally averaged across both planes of motion, i.e., saggital and frontal-parallel. Figure 5 illustrates the gains for the different experimental conditions, Fig. 5A for actual and Fig. 5B for virtual pole balancing. For each subject, the gains of the most stable controller for the four pole states are displayed. In addition, Fig. 5A depicts the pole balancing gains for the robot programmed with a predictive controller, RP, and a delay-line controller, RD. As can be recognized from this figure, control gains are rather consistent across all subjects with small standard deviations, as indicated by the error bars in Fig. 5. The gains obtained from the robot show similar error bars as in the human subjects, thus indicating that this magnitude of fluctuations is due to noise in our data recording technique and the hidden state that was not taken into account by regressing the delay uncompensated controller. Given that the robot used delay compensated controllers, the magnitude of the gains cannot be compared to any reference values---due to the visual delays in the robot's video system, it was not possible to implement delay uncompensated control. Because four state variables of the pole have different units (meter, meter/second, radians, and radians/second), the magnitude of the gains for different state variables is not comparable.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 5. Control gains assuming a delay uncompensated controller for actual (A) and virtual (B) pole balancing. Virtual and actual pole-balancing control gains are averages across multiple normal pole-balancing trials. Actual pole balancing includes the gains of the robot subject programmed with a predictive controller, RP, and a delay-line controller, RD. Error bars denote ±1 SD.

We assessed the quality of fit of the linear controllers from the coefficients of determination, R2, as shown in Fig. 6 (cf. METHODS). The coefficient of determination across all subjects was very high, an average of 0.89 for virtual pole balancing and 0.88 in the frontal plane and 0.91 in the saggital plane for actual pole balancing. Based on the results of this analysis, we concluded that a linear controller is adequate for modeling this motor task. Again, it is interesting to note that the two robot subjects had similar R2 values as the human subjects. Unknown nonlinearities in the robot, noise in our recording techniques, and hidden state from the delay compensated controllers are the most likely candidates to account for this observation.



View larger version (53K):
[in this window]
[in a new window]
 
Fig. 6. Coefficients of determination to assess the quality of fit of the linear controller for actual pole balancing (A) in the saggital and frontal planes and virtual pole balancing (B). In both cases, the coefficients of determination were found by averaging over all normal balancing trials for each subject.

As a last step, we determined the maximally permissible delays of the linear controllers from the linear stability analysis and the visuomotor delay times, both described in METHODS. Each subject's maximal permissible delay was the average of 10 normal pole-balancing trials. The average maximal permissible delay across all subjects for virtual pole balancing was 163 ms and for actual pole balancing was 157, as shown in Fig. 7.



View larger version (45K):
[in this window]
[in a new window]
 
Fig. 7. Visuomotor delays vs. maximally permissible delays for actual pole balancing (A) and virtual pole balancing (B). Error bars denote on SD derived from averaging over all normal and all perturbation trials of each subject, respectively.

The key element of evaluating the delay uncompensated control strategy was to compare these permissible delays with the visuomotor delay times. The visuomotor delay times were 269 ms on average in virtual pole balancing and 220 ms on average in actual pole balancing---we explain the increase in visuomotor delay for virtual pole balancing with the 75% decreased length of the pole on the computer screen that makes a deviation of the pole position from the vertical line harder to perceive. The human visuomotor delays for actual pole balancing are slightly lower than the lowest values of 250-270 ms reported in visuomotor tracking experiments (Miall 1996; Miall et al. 1993) and comparable to several results in target switching paradigms (Brenner and Smeets 1997; Georgopoulos et al. 1981; Hoff and Arbib 1992). The robot visuomotor delays were in accordance to our knowledge about the hardware and software processing delays in this system. Given the strong task dependence of visuomotor delays (e.g., Brenner and Smeets 1997), the correctness of the extraction of the delay times from the robot data seems to be the best confirmation of the validity of our determination of visuomotor delays in pole balancing.

As illustrated in Fig. 7, the visuomotor delay times in both actual and virtual pole balancing are significantly larger (P < 0.001) than the maximally permissible delay times of delay uncompensated control---this result would remain true even if the visuomotor delays were up to about 40 ms lower in actual pole balancing, and up to about 70 ms lower in virtual pole balancing. This result suggests that subjects did not employ delay uncompensated control for pole balancing and eliminates delay uncompensated control as a feasible control policy.

Smith Predictor

The Smith Predictor compensates for sensory delays by generating control commands out of a "mental simulation" of the pole, based on a forward dynamics model, and by synchronizing this mental simulation with the real world based on a delayed error signal. The suitability of the Smith Predictor for pole balancing can be assessed from a brief theoretical analysis. The linearized discrete pole dynamics obeys the equation
<B>x</B><SUB><IT>n</IT><IT>+1</IT></SUB><IT>=</IT><B>Ax</B><SUB><IT>n</IT></SUB><IT>+</IT><B>Bu</B><SUB><IT>n</IT></SUB> (5)
In contrast, the model dynamics, i.e., the inner loop of the Smith Predictor, is based on the estimated state xn and estimated dynamics Â, &Bcirc; of the system (cf. Fig. 3D)
<B><A><AC>x</AC><AC>ˆ</AC></A></B><SUB><IT>n</IT><IT>+1</IT></SUB><IT>=</IT><B><A><AC>A</AC><AC>ˆ</AC></A><A><AC>x</AC><AC>ˆ</AC></A></B><SUB><IT>n</IT></SUB><IT>+</IT><B><A><AC>B</AC><AC>ˆ</AC></A>u</B><SUB><IT>n</IT></SUB> (6)
The Smith predictor uses a linear control law based on the difference between the desired state xdesired, estimated state xn, delayed state xd, and estimated delayed state xd
<B>u</B><SUB><IT>n</IT></SUB><IT>=</IT><B>K</B>(<B>x<SUB>desired</SUB></B><IT>−</IT>(<B><A><AC>x</AC><AC>ˆ</AC></A><SUB>n</SUB></B><IT>+</IT><B>x<SUB>d</SUB></B><IT>−</IT><B><A><AC>x</AC><AC>ˆ</AC></A><SUB>d</SUB></B>)) (7)
By subtracting Eq. 6 from Eq. 5, we obtain the error dynamics, en
<B>e<SUB>n</SUB></B><IT>=</IT><B>x</B><SUB><B>n</B><IT>+</IT><B>1</B></SUB><IT>−</IT><B><A><AC>x</AC><AC>ˆ</AC></A></B><SUB><B>n</B><IT>+</IT><B>1</B></SUB><IT>=</IT><B>Ax<SUB>n</SUB></B><IT>+</IT><B>Bu</B><SUB><IT>n</IT></SUB><IT>−</IT>(<B><A><AC>A</AC><AC>ˆ</AC></A><A><AC>x</AC><AC>ˆ</AC></A><SUB>n</SUB></B><IT>+</IT><B><A><AC>B</AC><AC>ˆ</AC></A>u</B><SUB><IT>n</IT></SUB> (8)
If we assume perfect internal models,  = A and &Bcirc; = B, this equation simplifies to
<B>e</B><SUB><B>n</B><IT>+</IT><B>1</B></SUB><IT>=</IT><B>A</B>(<B>x<SUB>n</SUB></B><IT>−</IT><B><A><AC>x</AC><AC>ˆ</AC></A><SUB>n</SUB></B>)<IT>=</IT><B>Ae</B><SUB><IT>n</IT></SUB> (9)
even without the need to insert the control law (Eq. 7). Stability for the Smith Predictor requires that the error converge to zero, implying that the error dynamics needs to be a stable system with equilibrium point of zero. As can be seen in Eq. 9, it is only the system matrix A that determines this stability. Because for pole balancing, A is an unstable system, i.e., the pole is balanced at an unstable equilibrium point, the Smith predictor is provably unstable as a control scheme for pole balancing and can be eliminated as feasible control hypothesis.

Blank-out trials

After excluding delay uncompensated control (Fig. 3A) and the Smith Predictor (Fig. 3D), only two possibilities remain for a forward model in the pole balancing control loop, either in the sensory preprocessing stage (Fig. 3, E and F) or the motor command generation stage (Fig. 3, B and C). Analysis of the blank-out trials aimed at determining whether the human control system was able to use predictive control, i.e., whether the system could fill in sensory input when it was not available. Such a behavior would give evidence for a forward model. To assess this predictive ability, we compared the control gains from normal trials with those of blank-out periods. Importantly, given that the last section's analysis established that a delay-uncompensated controller is an inadequate model of pole balancing, we switched to extracting the control gains based on a tapped delay line model, i.e., a direct controller. This analysis strategy is justified because we know from the discussion in the INTRODUCTION that any indirect controller has an equivalent representation as a direct controller and, thus, that the tapped delay line model should be able to capture the human control system, irrespective of its true internal implementation. From the good linear fits in the delay-uncompensated analyses in the preceding text, it seemed justified to hypothesize that the direct controller was linear, too, thus allowing us to use linear regression to determine the control gains of the tapped delay line controller. The regression analyses returned the gains KTD of the linear control law un = KTD[xn-dT un-dT un-d+1T  ···  un-1T]T = KTDx~ (cf. METHODS). Given an average visuomotor delay time of 269 ms, as determined from the perturbation trials, we chose the discrete number of delay states d = 16 for a 60-Hz discretized control law for all subjects to make the gains from different subjects comparable.

The key quantities of the blank-out analysis were the gains for the delayed state information xn-d and its statistical significance, while the gains of the efference copies in the augmented input to the tapped delay line model have no special meaning for our purposes. During blank-outs, if the human control system had no access to an estimated state of the pole, the statistical significance of the gains of xn-d should vanish as we confirmed in Matlab simulations. Additionally, in such a case, these gains should change significantly in magnitude. Figure 8 illustrates our results. To assess the sensitivity of the gains as a function of the duration of the blank-out interval, gains were regressed for a range of blank-out periods by simply discarding all blank-out data after that particular period (cf. METHODS)---it should be noted, however, that gains at very short blank-out periods have rather few data for regression, such that the magnitude should not be over-interpreted, as indicated by the large error-bars of these gains. As can be seen from the black bars in Fig. 8 (), the magnitudes of the gains stabilize if more than about 200-250 ms of blank-out data are used in the regression---after this point, there is enough data to constrain the regression consistently. It should be noted that the magnitude of the gains are not comparable to those of Fig. 5 due to the augmented input state representation that was employed by the tapped delay line model. All blank-out gains were statistically significant (P < 0.01) according to t-tests for regression coefficients.



View larger version (85K):
[in this window]
[in a new window]
 
Fig. 8. Controller gains during blank-out balancing periods () and normal balancing periods () for position A, velocity B, angular position C, and angular velocity D states. All gains were averaged across subjects and error bars denote 1 SD. To obtain the gains as a function of blank-out period, regression analyses were performed on using blank-out data from the start of blank-out up to a particular time of the blank-out, called the blank-out period (cf. METHODS). Note that the magnitudes of the gains are not comparable to Fig. 5 because the regression had many more independent variables due to the efference copies in the input state and that only statistical significance of the gains and their relative magnitude with respect to normal pole balance are important for out experiment.

For comparison, we also used normal pole-balancing trials from each subject and extracted randomly intervals of 450-550 ms of data from them to obtain "fake" blank-out data (i.e., there was actually no blank-out) until the same number of fake blank-outs was obtained per subject as in real blank-outs trials. Submitting these fake data to the same regression analyses as the real blank-out resulted in the superimposed gray bars () in Fig. 8, labeled "normal." The gains from the fake data stabilized more quickly and had less variance for shorter blank-out periods, but they converged to statistically indistinguishable final values in comparison to the real blank-out data (P < 0.05), although there seems to be a trend that the gains slightly decreased in blank-out trials.

The statistical significance of the blank-out gains and their statistically indistinguishable magnitude from normal pole balancing gains indicates that state information about the pole was used in issuing motor commands during blank-out periods. Because a tapped delay line controller by itself is unable to generate such estimated state information, this result suggests a predictive forward model in the biological control loop.


    DISCUSSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

The goal of this paper was to investigate the existence of predictive forward models in human visuomotor control, a typical example of control with long delays in sensory feedback due to visual processing. We hypothesized three generic control strategies of how to deal with delayed sensory feedback by ignoring the delays (delay uncompensated control), using efferent copies to augment the state of the controlled system (tapped delay line control), and employing internal forward models to estimate the actual (un-delayed) state of the controlled system (Figs. 2 and 3). In the latter category, we distinguished between two alternative model-based controllers, the Smith Predictor (Miall et al. 1993) and simple predictive control (Fig. 3) that merely uses a forward model to predict out the delays. In analogy to Wolpert et al. (1995), we also assumed that the control loop could have a separate sensory preprocessing stage that, again, could either be forward-model-based or not. We investigated visuomotor control in the task of pole balancing, both with an actual pole and with a virtual pole simulated on a computer. Pole balancing requires the pole being balanced at an unstable equilibrium point and, therefore, requires continuous closed-loop control without the possibility to just memorize an action pattern as in certain visual tracking tasks. The stringent constraints imposed by pole balancing were ideally suited to narrow down which of the hypothesized control and sensory preprocessing methods were the most appropriate model for human control.

Our experimental data allowed excluding delay uncompensated control based on the observation that under the assumption of a linear delay uncompensated controller, subjects' control gains were too high in light of their more than 220-ms visuomotor delay in the control loop. This result was obtained in a rather conservative fashion: instead of just computing the control gains based on the pole state delayed by each subject's visuomotor delay, we computed a candidate set of control gains assuming a range of visuomotor delays from zero up to the experimentally determined visuomotor delay. The gains tolerating the longest delays were finally used to judge whether delay uncompensated control was possible. Even under this conservative analysis, the longest delay times that could be tolerated by the controllers was about 160 ms, i.e., 60 ms less than needed for human visuomotor delays.

Forward model-based control using the control circuitry of the Smith Predictor could be excluded solely based on theoretically grounds. The Smith Predictor is a provably unstable control strategy if the control task is unstable, e.g., as in pole balancing, and would perform even worse than a delay uncompensated controller as it is guaranteed to destabilize the control system, even if the delays are small. This result casts some doubt on the suitability of the Smith Predictor as a general model for cerebellar control (Miall and Wolpert 1996; Miall et al. 1993) as the cerebellum is known to be involved in postural control, the archetypical unstable control system in bipeds.

Employing virtual pole balancing with blank-out periods during balancing, our results suggest that human subjects had access to an estimated state of the pole during the blank-outs, i.e., that there was a forward model in the control loop that could fill in the missing sensory information during blank-outs. This result is the most important one of this paper and deserves to be discussed from the following viewpoints.

What happens during the blank-out period?

A key point for the validity of our analyses is whether one could find alternative explanations for the blank-out data. To address this issue, we assume that during blank-outs, the neural activity representing the pole state drops to zero or some steady-state firing. What is the ensuing behavior of the controller?

CONTROL CONTINUES AS IF NOTHING HAPPENED. Assuming a linear control system, there are only two possibilities when the pole state information is kept constant: either the system exponentially diverges or converges to steady state. In both cases, the motor commands issued would lose any correlation with the actual pole state during the blank-out, and the gains regressed during blank-out should change significantly. None of these effects could be observed in our data. Even when assuming a nonlinear control system, this statement would not change.

CONTROL SWITCHES TO A DEFAULT MODE. Triggered by the missing state information, the biological controller could just switch to a simple default mode. For instance, it could stop as characterized by zero velocity of the pole base, use constant velocity commands as characterized by zero acceleration of the pole base, or use constant acceleration commands as characterized by zero jerk of the pole base. We tested for each of these alternatives with t-test on the rectified characteristic quantity and could not find any statistical significance. Additionally, as in the previous point, any of these strategies would make the correlation between pole state and motor commands vanish and affect the regressed gains during blank-out significantly.

CONTROL SWITCHES TO A PREDICTION MODE. From our understanding, the most viable hypothesis is that during the blank-outs, somewhere in the control loop a switch occurs that fills in the missing pole state information with predicted states. Obviously, such an internal prediction cannot be very accurate for a long time, as the predicted state will diverge from the real state rather quickly. Our subjects could, after some training, tolerate 500- to 600-ms blank-out times but not more, a number that is similar as reported in smooth pursuit blank-out experiments (Pola and Wyatt 1997). Furthermore, when scrutinizing Fig. 8, one can see that the magnitude of the control gains all became smaller at the end of blank-outs, although this decrease did not reach statistical significance due to the variability across subjects. Such a decrease is expected if the correlation between motor command and actual pole states decreases during blank-outs as the predicted pole states diverge from the actual one due to error integration.

Is the forward model in the sensory preprocessing stage or the control stage?

Under the assumption that there is a forward model in the control loop, the questions arise where it is and how it is switched in during blank-outs. Here we will argue that this forward model is in the sensory preprocessing stage, although we cannot exclude another forward model in the control stage.

Referring to Fig. 3C, i.e., the simple predictor controller, the function of the forward model is to take as inputs the delayed state and efference copies of the motor commands from the delay period to estimate the current state. Importantly, the forward model is set up to accomplish the state estimation in one step such that it bridges the entire delay period, e.g., 220 ms in our actual pole balancing experiments. Alternatively, one could also imagine that the forward model only bridges a fraction of the delay time and that several iterations through the forward model are needed to compute an estimation of the current state. The latter concept is easily possible on a digital computer but less suited for the rather slow information processes in neural tissue. Thus the forward model needs to bridge at least sufficiently large time gaps in one prediction step such that the process of prediction does not cause too much additional delay. These large prediction steps, however, make the output of the forward model unsuitable to fill in missing state information during blank-outs. For instance, assume that the last perceived state before the blank-out onset was xn-d, then the next required input to the controller would be an estimate of xn-d+1, denoted as xn-d+1. But the forward model in the simple predictor created xn-d+r where r must be significantly greater than one to quickly predict-out the delay, as explained in the preceding text. Hence, the prediction of the simple predictor is too advanced in time to be