|
|
||||||||
The Journal of Neurophysiology Vol. 88 No. 2 August 2002, pp. 942-953
Copyright ©2002 by the American Physiological Society
1Department of Biomedical Engineering, HNB-001 and 2Computer Science and Neuroscience, HNB-103, University of Southern California, Los Angeles, California 90089-2520; and 3Kawato Dynamic Brain Project (Exploratory Research for Advanced Technology/Japan Science and Technology Corporation), Soraku-gun, 619-02 Kyoto, Japan
| |
ABSTRACT |
|---|
|
|
|---|
Mehta, Biren and Stefan Schaal. Forward Models in Visuomotor Control. J. Neurophysiol. 88: 942-953, 2002. In recent years, an increasing number of research projects investigated whether the central nervous system employs internal models in motor control. While inverse models in the control loop can be identified more readily in both motor behavior and the firing of single neurons, providing direct evidence for the existence of forward models is more complicated. In this paper, we will discuss such an identification of forward models in the context of the visuomotor control of an unstable dynamic system, the balancing of a pole on a finger. Pole balancing imposes stringent constraints on the biological controller, as it needs to cope with the large delays of visual information processing while keeping the pole at an unstable equilibrium. We hypothesize various model-based and non-model-based control schemes of how visuomotor control can be accomplished in this task, including Smith Predictors, predictors with Kalman filters, tapped-delay line control, and delay-uncompensated control. Behavioral experiments with human participants allow exclusion of most of the hypothesized control schemes. In the end, our data support the existence of a forward model in the sensory preprocessing loop of control. As an important part of our research, we will provide a discussion of when and how forward models can be identified and also the possible pitfalls in the search for forward models in control.
| |
INTRODUCTION |
|---|
|
|
|---|
Successful motor control requires issuing motor
commands for all the muscles of a movement system at the right time and
of correct magnitude in response to internal and external sensations. Thus given a particular task goal, the problem of motor control can
generally be formalized as finding a task specific control policy
|
(1) |
stands for the vector of open parameters that need to be adjusted during learning, e.g., the weights of a neural
network. The formulation in Eq. 1 is very general and can be
applied to any level of analysis, such as a detailed neuronal level or
a more abstract joint angular level. If the function
was known, the
task goal could be achieved from every state x of the
movement system. This theoretical view allows approaching motor control
as the more formal question of how control policies are represented and
learned in biology.
From a computational point of view, two basic strategies exist to
generate the function
. In "direct control" (Fig.
1A) (e.g., Miall
1995
),
would be one single "black box" that directly
maps sensations to actions, without meaningful intermediate steps and, in particular, without any attempts to explicitly model the movement system or task. For example, Barto, Fagg, Sitkoff, and Houk
(1999)
suggested such a direct control strategy as a model for
motor learning in the cerebellum.
|
The alternative to direct control is "indirect control." Instead of
one step, indirect control explicitly employs multiple information-processing steps to build the control policy, and in
particular it employs internal models. Figure 1B shows a
typical representative of an indirect controller as commonly suggested for robotic and biological control (e.g., Wolpert 1997
),
including separate stages for trajectory planning,
feedback,1 and feedforward
control based on an inverse dynamics model.
Whether and when the brain employs indirect or direct control is an
important scientific question for our understanding of the CNS because
the learning mechanisms for the two methods are quite different.
Learning indirect control is normally accomplished by (self-)
supervised learning, whereas learning direct control falls into the
domain of reinforcement learning, a learning method that is much more
time consuming and harder to apply to large-scale learning tasks. Thus
one can expect that the functional organization of motor control in the
CNS would differ significantly depending on whether direct or indirect
control was employed (Doya 2000
). A crucial question in
this discussion is whether the CNS employs internal models in motor
control. Internal models are neural substrates that model input/output
relationships and their inverses of kinematic and dynamic processes of
the motor system and the environment
they are a key ingredient in
indirect control. Thus identifying internal models in biological motor
control constitutes an essential step toward identifying the brain's
control and learning strategies.
In this paper, we will address the question of whether the human brain
employs internal forward models to overcome the long sensory delays
encountered in visuomotor control, i.e., whether the brain employs
direct or indirect control. For this purpose, the feedback pathways in
Fig. 1 should be conceived of as those of visual feedback, while other
faster feedback loops like those on a spinal level are assumed to be
part of the dynamics of the "movement system" box. Evidence for
internal models in the brain has grown increasingly stronger (e.g., see
Desmurget and Grafton 2000
; Kawato 1999
for recent reviews). In particular, inverse dynamics models in
oculomotor control (Gomi et al. 1998
; Shidara et
al. 1993
) and arm control (Ebner 1998
;
Gribble and Ostry 1999
; Shadmehr and Mussa-Ivaldi
1994
; Thach 1998
) have become rather well
accepted. On the other hand, evidence for forward models in motor
control is more complicated to establish because the output of forward
models
a prediction of an event in the future
is not directly used as
an output to the motor system but rather indirectly to facilitate
additional control processes. Thus the forward model's output can
rarely be observed explicitly in behavior and may be hard to find
electrophysiologically without an apriori hypothesis where to locate it
in the distributed sensorimotor control loops of the brain.
In a generic control diagram, Fig. 2
illustrates "test points" that were chosen in behavioral
investigations of forward models in the past. Most commonly, subjects
interact with a manipulated object, and the task-level command
uTask, which arises as a consequence of the
current state x of the movement system, is monitored. For
example, Flanagan and Wing (1997)
used a hand-held load
as a manipulated object and concluded from the zero-lag adjustment of
grip force (i.e., the task level command) as a function of load force
that humans used a forward model to anticipate the load forces; in a
related study, similar conclusions were drawn for adjusting the normal
force exerted on a sliding object (Flanagan and Lolley
2001
). Miall and his colleagues (Miall 1986
;
Miall et al. 1993
) investigated manual tracking movement of sinusoidal targets on a computer screen with a joystick. The tracking characteristics of subjects when exposed to additional delays
in the control loop suggested a forward model based control strategy.
Bushan and Shadmehr (1999)
employed virtual force fields as the manipulated object and concluded from the learning behavior of
their subjects that a forward-model based controller gave the most
suitable explanation of their data. An alternative method of
investigation was suggested by Wolpert et al. (1995)
.
These authors asked subjects to estimate, by introspection, their
Cartesian hand positions at the end of reaching movements that were
executed without visual feedback under force perturbations. From the
systematic structure of the error in position estimates as a function
of movement duration, the authors inferred a forward model-based sensory preprocessing similar as in a Kalman filter. Using a similar introspection strategy in a more perceptual study, Blakemore et al.
(Blakemore et al. 1999
, 2001
;
Blakemore, Wolpert, and Frith 2000
) hypothesized a
forward model in the sensory preprocessing loop that allows canceling
of self-generated tactile stimuli as opposed to externally generated
stimuli.
|
Despite the elegant experimental setups in all these studies, none of them could rule out that subjects used a direct control strategy instead of a model-based indirect approach. The theoretical justification for this statement results from the fact that every indirect controller can be replaced by a direct controller that has exactly the same input/output function.
That this statement is true can be illustrated from a neural network
metaphor. Given an indirect control policy
that maps x to u (cf. Eq. 1), a neural network
can be trained on data pairs (x, u) drawn from
this function. Because well-chosen neural networks can theoretically
approximate arbitrary functions with arbitrary accuracy (e.g.,
Bishop 1995
), the network can be assumed to learn the
indirect policy perfectly after some time. However, the network retains
none of the modularity of the indirect controller and thus becomes a
direct controller.2 This fact
poses significant limitations on investigations of forward models in
behavioral motor control experiments
most often they are simply not
identifiable. There is, however, one major difference between a direct
and a forward model-based controller: the forward model in the
controller can be used for filling in sensory inputs in case that they
are missing, i.e., it can predict. This distinction motivated the
experimental procedure of the present study.
The goal of the present study was to probe the existence of forward
models in a "blank-out" experiment (e.g., Hah and Jagacinski 1994
, Magdaleno et al. 1969
), i.e., an
experiment that randomly blocked the vision of the manipulated object.
The manipulated object was a pole, and the task was to balance the pole
stably at a fixed setpoint. Pole-balancing was chosen as an
experimental task because, in contrast to, for instance, sinusoidal
tracking tasks, memorized motor commands cannot be used to achieve the task goal
pole-balancing crucially depends on closed-loop visual feedback due to the pole being balanced at an unstable equilibrium and
the stochastic perturbations arising in normal motor behavior. Additionally, pole balancing can be analyzed in terms of linear control
theory because the realizable balancing regime is approximately linear.
In balancing a real pole on a finger and a virtual pole on a computer screen, we investigated whether the CNS employs one of the control hypotheses illustrated in Fig. 3. This set of hypotheses constitutes the core methods that have been suggested in previous work for visuomotor control with delays in both biological and control theoretic motor control. In the same way as Fig. 2, this figure distinguishes between a separate sensory preprocessing and a control stage, just that it fills in several detailed suggestions of the control theoretic implementation of these stages either with or without a forward model.
|
Sensory preprocessing can essentially be accomplished in two different
ways: either model based (Fig. 3F) or non-model based (Fig.
3E). The archetypal example of model-based sensory
preprocessing is a Kalman
filter3 (Kalman
1960
), as illustrated in Fig. 3F. Any
non-model-based sensory processing, e.g., filtering by temporal
averaging, etc., is not relevant for our arguments such that we left
the box in Fig. 3E an identity mapping.
Control policies are more complex. Given that the pole-balancing task is an approximately linear control task, all control policies in Fig. 3, A-D, are discussed as linear control systems; the desired behavior of the controllers is to keep the pole in particular desired state, i.e., an upright angular position at a particular Cartesian position in work-space.
Figure 3A shows the simplest possible controller, a linear negative feedback controller that does not compensate for the delays in the sensory input. This controller is rather sensitive to the magnitude of the control gains K as it can become unstable due to the delays.
Figure 3B illustrates a direct controller that augments its
inputs by efference copies called a "tapped delay line" controller. The augmentation of the input representation by a history of previous motor commands is control theoretically a powerful tool to estimate hidden-state variables, e.g., the actual state of the movement system
from delayed sensory input (Priestley 1981
). Tapped
delay-line control (cf. METHODS) can compensate for delays
very well without the need of a forward model.
An alternative delay-compensated controller is shown in Fig. 3C. A forward model is employed to predict out the delays in the sensory input. For this purpose, efference copies of the commands during the delay period need to be given to the forward model. Similar to the direct control in Fig. 3B, very good delay compensation can be accomplished.
A last control hypothesis is the Smith Predictor (Fig. 3D),
suggested by Miall et al. (1993)
. The Smith Predictor
compensates for sensory delays by generating control commands out of a
"mental simulation" of the pole based on a forward-dynamics model.
This strategy allows for a very fast internal feedback loop, indicated by the thick arrows in Fig. 3D. An outer loop makes sure
that the mental simulation remains synchronized with the real pole by
employing a model of the delay in the sensory feedback. This control
scheme is elegant and can deal with long sensory delays. More details
can be found in Miall et al. (1993)
.
The hypotheses in Fig. 3 allow for eight different combinations of sensory preprocessing and control. The goal of the present paper is to provide empirical evidence of which combination is the most likely in human behavior. Our results will provide evidence in favor a forward model in the sensory preprocessing loop (Fig. 3F) and against delay uncompensated control (Fig. 3A) and the Smith Predictor (Fig. 3D). However, we will not be able to make any further distinction whether simple predictive control (Fig. 3C) or tapped delay-line control (Fig. 3B) is employed by the CNS, a constraint that we believe is shared with many other studies on identifying model-based control and has not been properly stressed before.
| |
METHODS |
|---|
|
|
|---|
Participants
A total of 11 volunteers from our laboratory participated in the experiments. Their age ranged between 22 and 44 years. The ethics committee had approved the experiments and subjects gave their informed consent prior to their inclusion in the study. In addition, a 7-df anthropomorphic robot arm was programmed to balance the actual pole using various control strategies. The robot served as a control subject whose control strategy and sensorimotor delays were known. It participated in experiments 1 and 2 in the following text, in exactly the same way as the human subjects.
Experiments
Two kinds of experiments were conducted. Actual pole balancing required that participants balanced a pole (a 1-m-long cylindrical wooden rod with 0.01-m diam and a small metal weight attached at the upper tip to adjust the eigenfrequency) in three-dimensional (3D) space while standing upright in an unconstrained laboratory environment (Fig. 4A). Virtual pole balancing involved balancing a simulated two-dimensional pole on a computer screen with a manipulandum (Fig. 4B); the simulated pole had the same eigenfrequency as the actual pole. The following experimental conditions were tested in both virtual and actual pole balancing, unless mentioned otherwise.
|
EXPERIMENT 1: NORMAL BALANCING. Subjects balanced the pole without any experimental manipulations and were instructed to keep the pole as accurately as possible at a balancing setpoint xdesired. In actual balancing, subjects used a table tennis racket to support the lower end of the pole in order to exclude the influence of haptic perception as much as possible (Fig. 4). Subjects reported this manipulation as inconvenient but adjusted to it within about 1 min of training. In pilot tests, we also confirmed that it is impossible to balance a pole blindfolded, i.e., solely based on haptic information. The balancing setpoint in actual balancing was to keep the pole's angular position zero (i.e., a vertical pole) and the pole's horizontal position as close as possible to a point marked on the floor. In virtual balancing, the pole was to be kept vertically at a line in the middle of the computer screen. Ten 30-s trials were collected from each subject.
EXPERIMENT 2: PERTURBATION TRIALS. While subjects balanced the pole, a small random perturbation was delivered to the tip of the pole. In actual pole balancing, the experimenter manually delivered the perturbation from behind the subject with a long wooden rod (Fig. 4A). The rod very briefly touched a thin, very lightweight aluminum extension tube that was attached to the pole tip. Because subjects wore a baseball hat with a shield over their eyes, the perturbation was outside of their visual field. The effectiveness of the baseball hat was verified with fake perturbation movements of the experimenter that did not touch the pole. In virtual pole balancing, the computer program added a brief force pulse to the tip of the pole. At least 20 perturbations were collected from each subject in as many 30-s trials as needed.
EXPERIMENT 3: BLANK-OUT TRIALS. Blank-out trials were only conducted in virtual pole balancing. In blank-out balancing, the pole disappeared from the computer screen at random times for a randomly chosen period between 450 and 550 ms while the subject was balancing. During the blank-out, the computer continued simulating the pole dynamics as in normal trials, using the manipulandum's acceleration as control input. Subjects were instructed to ignore the blank-out and to continue balancing. To avoid that balancing was lost during blank-outs, blank-outs were only triggered if the pole was not too close to either end of the computer screen and if the pole was not too close to an upright still position. The latter pole state resulted in a pole behavior that was highly stochastic, i.e., unpredictable, as any little perturbation could make the pole fall to the left or to the right. Ten 30-s trials were collected from each subject, resulting in about 30 blank-outs per subject.
Data recording
ACTUAL POLE BALANCING. As illustrated in Fig. 4A, the 3D Cartesian positions of the pole were recorded from two color markers attached to the pole, tracked by a color vision system (QuickMag, Japan) at 60-Hz sampling frequency. During perturbation trials, the onset of the perturbation was recorded from a touch sensor, mounted to the distal end of a lightweight rod, and converted by a 12-bit A/D board in a VME bus. Two Motorola MVME68040 CPUs in the VME bus, running the real-time operating system VxWorks (Windriver Systems), collected both vision and A/D data and stored it on a SunSparc workstation for further processing.
VIRTUAL POLE BALANCING.
The pole dynamics of the actual pole in the preceding text was
simulated on a Macintosh G3 computer using Euler integration of the
equations of motion at 600 Hz. The equations of motion are given by
|
(2) |
|
are the position of the lower
end of the pole and its angular orientation (cf. Fig. 4B),
respectively, m is the pole's mass,
cm the pole's center of mass, I the
pole inertia, and g the gravitational constant. For
notational convenience, we combined these constants in the variable
, corresponding to the squared eigenfrequency of the pole. The mass
of the pole was determined with a scale, while center of mass and
inertia were determined from standard textbook formulae for the
cylindrical wooden rod and for the mass of the weight at the top of the
pole, which was treated as a point mass. This procedure resulted in
= 13.2 s
2 and was confirmed in a pendulum
experiment that determined the eigenfrequency of the pole
experimentally to be f = 3.6 s
1,
which compared well to the theoretical value of
ftheory = 
1. On a 19-in screen, the 1-m-long pole appeared to be
0.25 m long. The graphics update rate was 75 Hz. A 0.75-m long
wooden rod, with one end attached to the vertically oriented axis of a
high-resolution rotary optical encoder, centered right under the
computer screen, served as a manipulandum to move the base of the pole
(cf. Fig. 4B). Because the angular displacement of the rod
remained in the ±30° range, the motion of the tip of the rod was
approximately linear and was transformed into the pole's motion on the
screen in a 1:1 scale. Transverse motion of the rod correlated with
movement of the bottom end of the pole. Angular position and velocity
of the optical encoder were collected by a Motorola 68332 microcomputer at 600-Hz sampling frequency. This processor possesses a special hardware feature to extract high-resolution velocity signals from optical encoders directly. The velocity was differentiated and filtered
with a second-order Butterworth filter (cutoff frequency, 15 Hz) to
obtain accelerations. Position, velocity, and acceleration were
communicated to the Macintosh G3 computer through a serial connection
at 200 Hz. Acceleration of the manipulandum served as the motor command
u in Eq. 2 to the pole. Subjects required practicing for about 1 h until they felt comfortable with the task, but could then achieve 30-s-long balancing trials without problems.
Experimental procedure
Participants were initially asked to practice the pole balancing
until they felt comfortable that they could sustain 30-s-long trials
with perturbations. In actual pole balancing, subjects first performed
experiment 1, then experiment 2. In virtual pole balancing, experiment 3 was added to this sequence. Trials
in which the pole was dropped were repeated
dropping happened in less
than 5% of all trials. The robot participated as a subject in actual
pole balancing in Experiments 1 and 2. It was
programmed to use the tapped-delay line control strategy (Fig. 3,
B and E) in one set of trials, and the forward
model-based controller (Fig. 3, C and F) in a
second set of trials.
Data analysis
In both actual and virtual pole balancing, the data of each trial contained the pole's position, velocity, acceleration, angular position, and angular velocity, collected at 60 Hz, the frequency of NTSC video signals. For virtual pole balancing, these five quantities were scalar values. For actual pole balancing, position, velocity, and acceleration were 3D data and the pole's angular position and angular velocity were two-dimensional vectors. We neglected the vertical movement component in actual pole balancing and just treated 3D pole balancing as independent control in two orthogonal planes, the saggital and the frontal parallel plane. Due to the small deviations of the pole from the vertical in pole balancing (about ±12° maximum across all conditions and subjects), this simplification is mathematically justified. Thus, every actual pole balancing trial generated two independent data sets, one for saggital and one for frontal-parallel control. From this point onwards, actual and virtual pole balancing underwent the same data analyses.
DATA FILTERING.
All data traces were zero-lag second-order Butterworth filtered with a
cutoff frequency of 4 Hz. This relatively low cutoff frequency was the
optimal cutoff frequency selected by an optimization process over a
range of possible cutoff frequencies. The optimization criterion was to
find the best reconstruction of the linearized discrete pole dynamics
from the data; linearization was justified since subjects remained in a
±12° angular range of the vertical pole position. The
linearized discrete pole dynamics obeys the equations
|
|
(3) |
|

we achieved an almost perfect match in this way. It should be noted that the cutoff frequency parameter could be increased significantly without any effect on the
results of this paper; we tried up to 7-Hz cutoff frequencies; just the
estimated  and 
TRANSLATION OF RECORDED DATA TO ZERO EQUILIBRIUM POINT.
From the regressed coefficients Â,

Â)
1c.
We subtracted the empirically determined
xdesired per subject from all recorded data
x such that after this translation, the equilibrium point
for every subject was at xdesired = 0. We
also inspected the empirically determined setpoint whether subjects
stayed close to the instructed set point and always found very good
accordance with out instructions.
POWER SPECTRUM. To test for intermittency in subjects' control strategy, the unfiltered velocity data of the base of the pole was processed by FFT analysis (Matlab's Welch's averaged periodogram method, de-trended, using a Hanning window size of 512 data points). FFT analyses were performed per subject by pooling all normal trials of a subject.
VISUOMOTOR DELAYS. The data from perturbation trials served to calculate the visuomotor delay for every subject. The visuomotor delay was taken to be the reaction time from the onset of the perturbation, as recorded with the movement data, to the time until the rectified jerk (the derivative of acceleration) of the lower tip of the pole exceeded normal behavior, expressed as a threshold value. The threshold was determined subject specifically to be two standard deviations of the jerk of normal pole balancing trials that had no major instabilities. All visuomotor delays were visually inspected for possible errors of this procedure. Perturbations whose reaction time could not be determined clearly were excluded from our analyses. Data from our robot subject, whose visuomotor delays were known, served to ensure the correctness of this procedure.
EXTRACTING A DELAY-UNCOMPENSATED CONTROLLER.
A delay uncompensated linear controller uses delayed state information
to generate the commands according to un = Kxinternal, where
xinternal is the delayed state information that
the CNS has access to. Because xinternal is
experimentally not accessible in behavioral experiments and may be a
filtered and transformed version of the actually perceived state, we
hypothesized that xinternal could be any
recorded state between the current time and the current time minus the visuomotor delay. More formally, for the 60-Hz discretized
pole-balancing system, we set xinternal = xn
r, where r
{0, ... , d} and d is the discrete
time visuomotor delay, d = ceil((visuomotor delay)/60
s) [where the Matlab function ceil(z) rounds its argument
z to the next highest integer value]. We used linear
regression analysis to find the control gains K for every
r given the linear control strategy
un = Krxn
r.
|
|
(4) |
|
|
became unstable, as
indicated by the maximal absolute eigenvalue of the matrix
exceeding 1. The controller Kr with the largest tolerable delay was assumed to be the most robust delay uncompensated control strategy. The coefficient of determination of the
linear regression was simultaneously computed to assess the quality of
fit of the regressed controller.
EXTRACTING A TAPPED-DELAY LINE CONTROLLER.
Tapped-delay line control augments the state x of the pole
balancing system with efference copies of the motor commands u that were sent out during the delay period. Assuming a
linear controller, a command at discrete time step n,
un, becomes a function of the delayed pole
state xn
d and the
previous motor commands
un
r, r
{1, ... , d}, in the form of
un = KTD[xn
dT un
dT un
d+1T ··· un
1T]T = KTDx
)
to achieve numerical robustness. In ridge regression,
KTD is obtained from the regression
formula KTD = (X
I)
1X
was
optimized across all subjects to minimize the mean-squared
PRESS residual error,4 i.e.,
the leave-one-out cross validation error (Myers 1990
), formalized as:
|
in the
interval
[1, 10
10], equally spaced on a
logarithmic scale, and picked the one that minimized the cost
J.
BLANK-OUT DATA ANALYSIS. From the blank-out trials, all data falling into the blank-out window were extracted. For this purpose, we assumed that the first data point of a blank-out interval was the one that occurred at time t + d, where t is the discrete start time of the blank-out and d the discrete visuomotor delay time. The last data point of a blank-out interval was t + d + p, where p is the random duration of the blank-out. Blank-out data from all blank-out trials were pooled per subject, and a tapped-delay line controller was extracted from this data as described in the preceding text. Mean and SDs were computed across subjects. To examine the influence of the duration of a blank-out on our data analysis, we also performed the blank-out analysis by using data from only a subset of the individual blank-outs, which was simply accomplished by discarding all data of each blank-out beyond a certain time duration. For instance, to perform a blank-out analysis with 200-ms-long blank-out windows, we only used data in the interval [t + d, t + d + 200 ms].
| |
RESULTS |
|---|
|
|
|---|
In the INTRODUCTION, several different control policies and sensory preprocessing strategies were proposed. The results section is structured such that we can eliminate several of these hypotheses to provide evidence for or against predictive forward models in biological motor control.
Delay uncompensated control
Delay uncompensated control uses a linear feedback controller that
does not compensate for the delays in sensory inputs. Because in
previous work, it was discovered that delayed sensory inputs can elicit
intermittency in the control strategy (Miall 1996
), i.e., motor commands are only sent out at certain time intervals, we
examined the power spectrum of the (unfiltered) velocity of the base of
the pole in normal trials with FFT analyses (cf. METHODS). The power spectrum (not illustrated here) showed one broad peak, roughly centered at 0.8 Hz, and no peaks at higher frequencies, which
would be the hallmark of intermittency (Miall 1996
). In both robot data and human data, the power spectrums looked alike, and
we knew that the robot used a non-intermittent controller. Based on
these findings, intermittent control was excluded from the control
hypotheses and the continuous control hypotheses from Fig. 3 could be examined.
Assuming a delay uncompensated controller (Fig. 3A), each
subject's most robust control gains were found as described in
METHODS. For both virtual and actual pole balancing, gains
were an average across 10 normal pole-balancing trials
in actual pole
balancing, gains were additionally averaged across both planes of
motion, i.e., saggital and frontal-parallel. Figure
5 illustrates the gains for the different
experimental conditions, Fig. 5A for actual and Fig.
5B for virtual pole balancing. For each subject, the gains
of the most stable controller for the four pole states are displayed.
In addition, Fig. 5A depicts the pole balancing gains for
the robot programmed with a predictive controller, RP, and a delay-line
controller, RD. As can be recognized from this figure, control gains
are rather consistent across all subjects with small standard
deviations, as indicated by the error bars in Fig. 5. The gains
obtained from the robot show similar error bars as in the human
subjects, thus indicating that this magnitude of fluctuations is due to
noise in our data recording technique and the hidden state that was not
taken into account by regressing the delay uncompensated controller.
Given that the robot used delay compensated controllers, the magnitude
of the gains cannot be compared to any reference values
due to the
visual delays in the robot's video system, it was not possible to
implement delay uncompensated control. Because four state variables of
the pole have different units (meter, meter/second, radians, and
radians/second), the magnitude of the gains for different state
variables is not comparable.
|
We assessed the quality of fit of the linear controllers from the coefficients of determination, R2, as shown in Fig. 6 (cf. METHODS). The coefficient of determination across all subjects was very high, an average of 0.89 for virtual pole balancing and 0.88 in the frontal plane and 0.91 in the saggital plane for actual pole balancing. Based on the results of this analysis, we concluded that a linear controller is adequate for modeling this motor task. Again, it is interesting to note that the two robot subjects had similar R2 values as the human subjects. Unknown nonlinearities in the robot, noise in our recording techniques, and hidden state from the delay compensated controllers are the most likely candidates to account for this observation.
|
As a last step, we determined the maximally permissible delays of the linear controllers from the linear stability analysis and the visuomotor delay times, both described in METHODS. Each subject's maximal permissible delay was the average of 10 normal pole-balancing trials. The average maximal permissible delay across all subjects for virtual pole balancing was 163 ms and for actual pole balancing was 157, as shown in Fig. 7.
|
The key element of evaluating the delay uncompensated control strategy
was to compare these permissible delays with the visuomotor delay
times. The visuomotor delay times were 269 ms on average in virtual
pole balancing and 220 ms on average in actual pole balancing
we
explain the increase in visuomotor delay for virtual pole balancing
with the 75% decreased length of the pole on the computer screen that
makes a deviation of the pole position from the vertical line harder to
perceive. The human visuomotor delays for actual pole balancing are
slightly lower than the lowest values of 250-270 ms reported in
visuomotor tracking experiments (Miall 1996
;
Miall et al. 1993
) and comparable to several results
in target switching paradigms (Brenner and Smeets 1997
;
Georgopoulos et al. 1981
; Hoff and Arbib
1992
). The robot visuomotor delays were in accordance to our
knowledge about the hardware and software processing delays in this
system. Given the strong task dependence of visuomotor delays (e.g.,
Brenner and Smeets 1997
), the correctness of the
extraction of the delay times from the robot data seems to be the best
confirmation of the validity of our determination of visuomotor delays
in pole balancing.
As illustrated in Fig. 7, the visuomotor delay times in both actual and
virtual pole balancing are significantly larger (P < 0.001) than the maximally permissible delay times of delay
uncompensated control
this result would remain true even if the
visuomotor delays were up to about 40 ms lower in actual pole
balancing, and up to about 70 ms lower in virtual pole balancing. This
result suggests that subjects did not employ delay uncompensated
control for pole balancing and eliminates delay uncompensated control
as a feasible control policy.
Smith Predictor
The Smith Predictor compensates for sensory delays by generating
control commands out of a "mental simulation" of the pole, based on
a forward dynamics model, and by synchronizing this mental simulation
with the real world based on a delayed error signal. The suitability of
the Smith Predictor for pole balancing can be assessed from a brief
theoretical analysis. The linearized discrete pole dynamics obeys the
equation
|
(5) |
n and estimated dynamics
Â, 
|
(6) |
n, delayed state
xd, and estimated delayed state
d
|
(7) |
|
(8) |

|
(9) |
Blank-out trials
After excluding delay uncompensated control (Fig. 3A)
and the Smith Predictor (Fig. 3D), only two possibilities
remain for a forward model in the pole balancing control loop, either
in the sensory preprocessing stage (Fig. 3, E and
F) or the motor command generation stage (Fig. 3,
B and C). Analysis of the blank-out trials aimed
at determining whether the human control system was able to use
predictive control, i.e., whether the system could fill in sensory
input when it was not available. Such a behavior would give evidence
for a forward model. To assess this predictive ability, we compared the
control gains from normal trials with those of blank-out periods.
Importantly, given that the last section's analysis established that a
delay-uncompensated controller is an inadequate model of pole
balancing, we switched to extracting the control gains based on a
tapped delay line model, i.e., a direct controller. This analysis
strategy is justified because we know from the discussion in the
INTRODUCTION that any indirect controller has an equivalent
representation as a direct controller and, thus, that the tapped
delay line model should be able to capture the human control system,
irrespective of its true internal implementation. From the good linear
fits in the delay-uncompensated analyses in the preceding text, it
seemed justified to hypothesize that the direct controller was linear,
too, thus allowing us to use linear regression to determine the control
gains of the tapped delay line controller. The regression analyses
returned the gains KTD of the linear
control law un = KTD[xn
dT un
dT un
d+1T ··· un
1T]T = KTD
(cf.
METHODS). Given an average visuomotor delay time of 269 ms,
as determined from the perturbation trials, we chose the discrete
number of delay states d = 16 for a 60-Hz discretized
control law for all subjects to make the gains from different subjects comparable.
The key quantities of the blank-out analysis were the gains for the
delayed state information
xn
d and its statistical
significance, while the gains of the efference copies in the augmented
input to the tapped delay line model have no special meaning for our
purposes. During blank-outs, if the human control system had no access
to an estimated state of the pole, the statistical significance of the
gains of xn
d should
vanish as we confirmed in Matlab simulations. Additionally, in such a
case, these gains should change significantly in magnitude. Figure
8 illustrates our results. To assess the
sensitivity of the gains as a function of the duration of the blank-out
interval, gains were regressed for a range of blank-out periods by
simply discarding all blank-out data after that particular period (cf. METHODS)
it should be noted, however, that gains at very
short blank-out periods have rather few data for regression, such that the magnitude should not be over-interpreted, as indicated by the large
error-bars of these gains. As can be seen from the black bars in Fig. 8
(
), the magnitudes of the gains stabilize if more than about
200-250 ms of blank-out data are used in the regression
after this
point, there is enough data to constrain the regression consistently. It should be noted that the magnitude of the gains are not comparable to those of Fig. 5 due to the augmented input state representation that
was employed by the tapped delay line model. All blank-out gains were
statistically significant (P < 0.01) according to
t-tests for regression coefficients.
|
For comparison, we also used normal pole-balancing trials from each
subject and extracted randomly intervals of 450-550 ms of data from
them to obtain "fake" blank-out data (i.e., there was actually no
blank-out) until the same number of fake blank-outs was obtained per
subject as in real blank-outs trials. Submitting these fake data to the
same regression analyses as the real blank-out resulted in the
superimposed gray bars (
) in Fig. 8, labeled "normal." The gains
from the fake data stabilized more quickly and had less variance for
shorter blank-out periods, but they converged to statistically
indistinguishable final values in comparison to the real blank-out data
(P < 0.05), although there seems to be a trend that
the gains slightly decreased in blank-out trials.
The statistical significance of the blank-out gains and their statistically indistinguishable magnitude from normal pole balancing gains indicates that state information about the pole was used in issuing motor commands during blank-out periods. Because a tapped delay line controller by itself is unable to generate such estimated state information, this result suggests a predictive forward model in the biological control loop.
| |
DISCUSSION |
|---|
|
|
|---|
The goal of this paper was to investigate the existence of
predictive forward models in human visuomotor control, a typical example of control with long delays in sensory feedback due to visual
processing. We hypothesized three generic control strategies of how to
deal with delayed sensory feedback by ignoring the delays (delay
uncompensated control), using efferent copies to augment the state
of the controlled system (tapped delay line control), and
employing internal forward models to estimate the actual (un-delayed) state of the controlled system (Figs. 2 and 3). In the latter category,
we distinguished between two alternative model-based controllers, the
Smith Predictor (Miall et al. 1993
) and
simple predictive control (Fig. 3) that merely uses a
forward model to predict out the delays. In analogy to Wolpert
et al. (1995)
, we also assumed that the control loop could have
a separate sensory preprocessing stage that, again, could either be
forward-model-based or not. We investigated visuomotor control in the
task of pole balancing, both with an actual pole and with a virtual
pole simulated on a computer. Pole balancing requires the pole being
balanced at an unstable equilibrium point and, therefore, requires
continuous closed-loop control without the possibility to just memorize
an action pattern as in certain visual tracking tasks. The stringent constraints imposed by pole balancing were ideally suited to narrow down which of the hypothesized control and sensory preprocessing methods were the most appropriate model for human control.
Our experimental data allowed excluding delay uncompensated control based on the observation that under the assumption of a linear delay uncompensated controller, subjects' control gains were too high in light of their more than 220-ms visuomotor delay in the control loop. This result was obtained in a rather conservative fashion: instead of just computing the control gains based on the pole state delayed by each subject's visuomotor delay, we computed a candidate set of control gains assuming a range of visuomotor delays from zero up to the experimentally determined visuomotor delay. The gains tolerating the longest delays were finally used to judge whether delay uncompensated control was possible. Even under this conservative analysis, the longest delay times that could be tolerated by the controllers was about 160 ms, i.e., 60 ms less than needed for human visuomotor delays.
Forward model-based control using the control circuitry of the Smith
Predictor could be excluded solely based on theoretically grounds. The
Smith Predictor is a provably unstable control strategy if
the control task is unstable, e.g., as in pole balancing, and would
perform even worse than a delay uncompensated controller as it is
guaranteed to destabilize the control system, even if the
delays are small. This result casts some doubt on the suitability of
the Smith Predictor as a general model for cerebellar control (Miall and Wolpert 1996
; Miall et al.
1993
) as the cerebellum is known to be involved in postural
control, the archetypical unstable control system in bipeds.
Employing virtual pole balancing with blank-out periods during balancing, our results suggest that human subjects had access to an estimated state of the pole during the blank-outs, i.e., that there was a forward model in the control loop that could fill in the missing sensory information during blank-outs. This result is the most important one of this paper and deserves to be discussed from the following viewpoints.
What happens during the blank-out period?
A key point for the validity of our analyses is whether one could find alternative explanations for the blank-out data. To address this issue, we assume that during blank-outs, the neural activity representing the pole state drops to zero or some steady-state firing. What is the ensuing behavior of the controller?
CONTROL CONTINUES AS IF NOTHING HAPPENED. Assuming a linear control system, there are only two possibilities when the pole state information is kept constant: either the system exponentially diverges or converges to steady state. In both cases, the motor commands issued would lose any correlation with the actual pole state during the blank-out, and the gains regressed during blank-out should change significantly. None of these effects could be observed in our data. Even when assuming a nonlinear control system, this statement would not change.
CONTROL SWITCHES TO A DEFAULT MODE. Triggered by the missing state information, the biological controller could just switch to a simple default mode. For instance, it could stop as characterized by zero velocity of the pole base, use constant velocity commands as characterized by zero acceleration of the pole base, or use constant acceleration commands as characterized by zero jerk of the pole base. We tested for each of these alternatives with t-test on the rectified characteristic quantity and could not find any statistical significance. Additionally, as in the previous point, any of these strategies would make the correlation between pole state and motor commands vanish and affect the regressed gains during blank-out significantly.
CONTROL SWITCHES TO A PREDICTION MODE.
From our understanding, the most viable hypothesis is that during the
blank-outs, somewhere in the control loop a switch occurs that fills in
the missing pole state information with predicted states. Obviously,
such an internal prediction cannot be very accurate for a long time, as
the predicted state will diverge from the real state rather quickly.
Our subjects could, after some training, tolerate 500- to 600-ms
blank-out times but not more, a number that is similar as reported in
smooth pursuit blank-out experiments (Pola and Wyatt
1997
). Furthermore, when scrutinizing Fig. 8, one can see that
the magnitude of the control gains all became smaller at the end of
blank-outs, although this decrease did not reach statistical
significance due to the variability across subjects. Such a decrease is
expected if the correlation between motor command and actual
pole states decreases during blank-outs as the predicted
pole states diverge from the actual one due to error integration.
Is the forward model in the sensory preprocessing stage or the control stage?
Under the assumption that there is a forward model in the control loop, the questions arise where it is and how it is switched in during blank-outs. Here we will argue that this forward model is in the sensory preprocessing stage, although we cannot exclude another forward model in the control stage.
Referring to Fig. 3C, i.e., the simple predictor controller,
the function of the forward model is to take as inputs the delayed state and efference copies of the motor commands from the delay period
to estimate the current state. Importantly, the forward model is set up
to accomplish the state estimation in one step such that it bridges the
entire delay period, e.g., 220 ms in our actual pole balancing
experiments. Alternatively, one could also imagine that the forward
model only bridges a fraction of the delay time and that several
iterations through the forward model are needed to compute an
estimation of the current state. The latter concept is easily possible
on a digital computer but less suited for the rather slow information
processes in neural tissue. Thus the forward model needs to bridge at
least sufficiently large time gaps in one prediction step such that the
process of prediction does not cause too much additional delay. These
large prediction steps, however, make the output of the
forward model unsuitable to fill in missing state information during
blank-outs. For instance, assume that the last perceived state before
the blank-out onset was
xn
d, then the next
required input to the controller would be an estimate of
xn
d+1, denoted as
n
d+1. But the
forward model in the simple predictor created
n
d+r
where r must be significantly greater than one to quickly
predict-out the delay, as explained in the preceding text. Hence, the
prediction of the simple predictor is too advanced in time to be