|
|
||||||||
1Laboratory for Computational Motor Control, Department of Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, Maryland; and 2School of Psychology, Bangor University, United Kingdom
Submitted 3 October 2007; accepted in final form 25 May 2008
|
|
ABSTRACT |
|---|
|
|
|
INTRODUCTION |
|---|
|
In the field of machine learning, this problem is termed "active learning." The learner has access to a list of training examples; each example focuses on a different component of the task (e.g., a golf swing using the driver). There is no preset curriculum nor is there an instructor; therefore the learner picks out its own curriculum, one practice example at a time. Examples can be repeated if necessary, but the ultimate goal is to improve in the overall task performance with a minimum number of examples. How should one pick the training examples?
There are a number of potential criteria for such a decision. For example, one could use random exploration to minimize statistical bias, or select examples to minimize the element of surprise (Einhauser et al. 2007
). In general, we are often faced with decision whether to exploit known alternatives or explore unknown ones. How the nervous system makes decisions in these situations remains an open question (Cohen et al. 2007
). However, Cohn et al. (1996
) suggested that in the case of active learning (i.e., choosing training examples), an optimal solution can be obtained if the goal is to minimize the uncertainty of the learner. In the current study, we will apply this approach in a human motor learning task and test whether behavior is well described by such a selection criterion.
Active learning
Mathematically, we can summarize active learning of a motor task as follows. The student has to learn a task consisting of P different components or behaviors, indicated by the multinomial variable x between 1 and P. A task component can be defined as a subtask in which mastery is required for the attainment of the overall task goal. The correct or optimal motor output is an observable but currently unknown function of the task component, y(x). The learned behavior for each task component can be described by a set of parameters. For simplicity and without loss of generality, we assume here that each task component is described by a single parameter. Thus in our simple case, the learner has a set of parameters [w(1)...w(P)] representing the knowledge of the movement required for each task component. For example, in our task, the subject was asked to compensate for a force perturbation when making reaching movements toward a number of different targets. Here w(1),..., w(P) represent the estimated force that the subject needs to produce for each movement direction to counteract the force perturbation. In other words, the parameters constitute the learner's model of the task.
On trial n with component xn, the learner produces the movement
the estimated force plus random motor error
n, and observes the correct answer yn. A general class of learning rules is gradient descent: The learner updates his own knowledge by the difference between actual and desired output, using the learning gain Kn (Donchin et al. 2003
; Huang and Shadmehr 2007
; Thoroughman and Shadmehr 2000
)
![]() | (1) |
The problem of active learning in this context is to pick the next training example xn such that overall performance in the task space can improve. In coached (i.e., passive) learning and most motor learning experiments, this problem does not arise because xn is determined by the teacher or experimenter. Now what would constitute a good criterion to choose xn? Cohn and colleagues (1996) suggest that a learning system should attempt to reduce the expected squared error on the trial after the learning trial n. That is, the learner should try to pick a training example on trial n so that after he or she has learned from this training example following Eq. 1, the performance averaged across all skill components will be maximally improved.
For an unbiased learner (a learner who does not show systematic constant errors), the expected squared error on trial n + 1 is equal to the uncertainty the learner has about his learned output
. Therefore good training examples will maximally reduce the uncertainty on the parameters w(1),..., w(P) after the learning. As we show in APPENDIX 1, this translates into a selection rule in which the learner should pick the task component in which the uncertainty prior to the learning trial is highest. This task component then will maximally benefit from learning on trial n, therefore also maximally reducing overall uncertainty. This decision rule is optimum for a number of different learning rules, including gradient descent with a constant learning rate, as well as learning using a Kalman filter (APPENDIX 1). Furthermore the decision rule also holds for a learner who forgets from trial to trial (APPENDIX 2), as long as the rate of forgetting is tuned to the rate of change in the environment (Koerding et al. 2007
).
In all of these situations, learning in general leads to a reduction of uncertainty for the practiced task component. Therefore a learner who follows such a rule would not pick the same training example twice in a row; after practicing task component x on trial n, a different skill component will have a higher uncertainty. We ask here whether active learning in people follows this prediction and whether it can be described using an uncertainty-based decision rule.
Furthermore, we ask whether the choices during active learning are influenced by the error experienced during the last movement. When both the parameter and the motor noise
n are assumed to have Gaussian distribution, as is the case in the standard Kalman filter, the uncertainty of the model parameters will always be reduced for the component of the task that was practiced independent of the error observed. This would lead to the counterintuitive prediction that errors during active learning would not affect action selection.
However, there are versions of the Kalman filter that would increase the parameter uncertainty when a big error is observed. One example is a system where the output noise does not have a Gaussian distribution but is drawn from a mixture of Gaussians of identically zero means and different variances. In such case, the observation of a large error would lead to a reduced rate of learning (Kording and Wolpert 2004
), and an increase in model uncertainty. Under the uncertainty-based selection rule, this would lead to repetitions of training examples when a large error is observed. However, any versions of such model would still predict that a learner should decrease its uncertainty when a small error is observed. Therefore any uncertainty-based selection model predicts that the learner is biased to not repeat an action when an error close to zero is observed.
To summarize, the uncertainty-based selection rule (Cohn et al. 1996
) predicts that an optimal learner should choose the task component with the highest uncertainty. After that choice is made, the learner should be biased to not immediately repeat the same action on the next trial. Here we have taken the first steps in quantifying the factors that affect human choices in active learning of a motor skill. Can their choices be understood in the framework of uncertainty estimation? Do motor errors affect our choice of training? In the present study, we used a force-adaptation task in which subjects learned the task of compensating for force perturbation when making reaching movements. In experiment 1, we observed how human participants select their own sequence of actions to practice and how motor errors influence these choices. In experiment 2, we further explored the idea that the variance of observation might influence the subjective estimation of its reliability. We tested the hypothesis that the consistency of observation was a contributing factor to human participants' strategies by varying the variances of the perturbation applied to each of the four task components.
|
|
METHODS |
|---|
|
We used a hitting game to examine how the statistical properties of the training experience influenced the subjects' strategy in active learning (Fig. 1, A–D). The goal of the game was to become as proficient as possible at hitting a small target in one of four directions with a rapid center-out strike using a robotic manipulandum (Fig. 1A) (Huang and Shadmehr 2007
; Hwang et al. 2003
). Subjects' proficiency was rewarded with points in randomly interspersed test trials. During test trials, the computer chose the target for the subject. The closer the hand cursor came to the target, the greater the number of points. The score and the financial incentive depended solely on the performance on these test trials.
|
Because there were multiple targets present during active training trials, we needed to ascertain toward which target the subject was aiming. Therefore after each strike when the hand hit the pillow, subjects brought their hand back to the center of their intended target. At this point the center mark reappeared and the robot brought the hand back to the center.
Active training, passive training, and test trials
Participants were tested on randomly interspersed test trials in which one direction was chosen at random for them. In between test trials, participants trained for directions of their choice (active learning) or a direction chosen randomly for them (passive learning) in either an active learning or passive learning block (Fig. 1, A and B). Comparison between the test trials in each type of blocks allowed us to assess the efficacy of the learning strategies. Participants were instructed to pick their movement directions in training trials so that they would maximize their performance in the test trials. We awarded participants a monetary reward dependent on the amounts of points earned in the test trials. It was made very clear that the score only reflected their performance in the test trials and not their performance on the training trials. This was important as we did not want to contaminate subjects' strategies during training with a greedy element of selecting an easy target for monetary return.
Test trials were randomly interspersed between the training trials (1 of 5) and clearly announced with the word "test" on the screen immediately before the trial started. The computer pseudorandomly picked the target among the four available to test the participants' performance. Performance was measured as the angular distance to the target at the point where the cursor crossed an invisible circle of 10 cm. Four accuracy levels were established: 5.16, 4.49, 3.61, and 2.48°. For each additional accuracy level achieved, the movement was award one additional point in that trial for up to a maximum of 4 points.
Four of five trials were training trials. The experiments were divided into blocks of 60 trials (experiment 1) or 160 trials (experiment 2); each block was either active training or passive training. The schedule of active and passive training blocks is shown in Fig. 1, B (experiment 1) and C (experiment 2). In training trials of active training blocks, all four targets were available and the subject chose their target direction to aim. In training trials of passive training blocks, the computer pseudorandomly picked the target among the four available.
Experiment 1
Our objective was to determine whether errors that subjects experienced during active learning affected the subsequent choices that they made. To that aim, we considered two kinds of errors: errors that were due to a consistent perturbation and errors that were due to an inconsistent perturbation. To induce errors, we applied a velocity-dependent force (viscosity of 10 Ns/m) that pushed the hand perpendicular to the hand movements toward some target directions.
The perturbations of each block of trials followed one of two patterns (Fig. 1D). In the constant-in-null pattern, movements toward three of the four targets were unperturbed (null or "N"), whereas movements toward one target were perturbed with a consistent curl force field (constant or "C"). The C target was assigned pseudorandomly for each block. The C target had a clockwise field for the first 30 trials of the block, and then the field switched to a counter clockwise field. A good active learning policy would have been to find the C target and continue to train mostly on that target.
It is possible that subjects would choose the C target because it was the only target that had any perturbation. In the constant-in-random pattern, again one target was picked to have a constant perturbation (C) that switched after 30 trials. In contrast to a constant-in-null block, however, a curl field was also presented during movements to the three remaining targets. These curl fields switched randomly between clockwise or counterclockwise fields (random or "R"). In these movements, the field had a random viscosity with a uniform distribution from –10 to +10 Ns/m to further mask the presence of the C target.
Experiment 2
To further explore the idea that the variance of the perturbations—i.e., the reliability of errors—influenced choices during active learning, we conducted a second experiment where the mean of the perturbations associated to the various targets were identical, but their variance differed (Fig. 1, C and D). Once again, four targets were available. At the start of each block (now 160 trials long), each target was assigned a curl field with a viscosity that had a mean of 10 Ns/m but a variance that was low (R1 target, viscosity uniformly drawn from 6 to 14 Ns/m), medium (R2 target, viscosity uniformly drawn from 2 to 18 Ns/m), high (R3 target, viscosity uniformly drawn from –2 to 22 Ns/m), and very high (R4 target, viscosity uniformly drawn from –6 to 26 Ns/m). Therefore observations at the R1 target should be the most consistent through out the block. Similar to experiment 1, participants earned points only during the sparsely distributed test trials in which the computer randomly tested participants' performance in one of the four targets. During these test trials, the viscosity was always 10 Ns/m (C targets).
Softmax regression procedures
We modeled the probability of choosing a target using a generalized linear model. We used a multinomial extension of logistic regression—softmax regression. The probability vector, p, of selecting target xn (xn = 1...P) depends on the vector vn, which in turn was a linear function of three factors: the 4 x 1 vector
bias, with the mean constrained to zero (i.e., 3 free parameters) modeled the preference of participants for one of the four particular targets,
repeat modeled the preference of each subject to repeat the last movement direction xn, and finally
error modeled the increase in probability to repeat the last movement direction, as the absolute size of the last error |yn| increased. By writing the last choice xn as the vector of indicator variables xn, the full model can be written as
![]() | (2) |
![]() | (3) |
We fitted the parameters
bias,
repeat, and
error by maximizing the log-likelihood of the data given our model using numerical methods (Matlab fminsearch).
Participants
Sixteen subjects participated in experiment 1 and another 16 in experiment 2. For experiment 1, the experiment was counter-balanced across subjects for the order of the perturbations (constant-in-null and constant-in-random) and training conditions (active and passive). Subjects were healthy, right-handed, and naïve to the purpose of the experiment. Procedures and protocols were approved by the Johns Hopkins Medicine Institutional Review Board and participants gave their written consent prior to the experiments.
|
|
RESULTS |
|---|
|
If active learners estimate the uncertainty about the desired output and then choose to train on a component on which they are most unsure (APPENDIX 1, Eq. A9), they should have a tendency not to practice in the same direction as the last trial. Contrary to this prediction of uncertainty-based models, our participants repeated the last movement direction with a probability of 35 ± 12% (Fig. 2A), significantly higher than just choosing a direction at random [2-tail t-test, t(15) = 3.46, P < 0.01]. When the participants decided to switch, they picked each of the other three directions with equal probability.
|
Participants repeated even well-learned task components
When there were little or no errors in a trial, the probability of repeating the target approached 25%, the rate of random selection. Under any uncertainty-based models, the observation of a zero error should have decreased the uncertainty about the corresponding task component. This would have then lowered the learner's probability of selecting this direction again below the probabilities of the other directions. While our data showed that error was a robust factor in encouraging repetition of a previously selected direction, a trial with a small error did not reduce the probability of re-selection of the same movement direction below chance as the uncertainty model had suggested. Participants, therefore clearly violated a fundamental prediction of uncertainty-based active learning models. Subjects completed a postexperimental questionnaire. While two subjects reported that they were repeating large error movements, cognitive responses were inconsistent across subjects.
To quantify these observations we estimated the contribution of error and the tendency to repeat a direction even in the absence of an error using softmax regression (a multinomial extension of logistic regression, see METHODS). The regression included a term to capture biases toward specific targets, a term that determined the probability of repeating a direction in the absence of error (
repeat), and a term that captured how much the probability of repeating increased with the absolute size of the last error (
error). From the estimated parameters, we were able to reproduce the sequence and trends of participants' choices (dashed line, Fig. 2B). While we did not find any significant bias toward any of the four targets [1-factor ANOVA, F(3,60) = 1.04, P = 0.38], nearly all participants showed a positive
error, indicating that they were more likely to repeat an action when a large error was encountered [2-tailed t-test, t(1,15) = 3.6, P < 0.01]. Once we accounted for the size of the error,
repeat was not significantly different from zero, [t(1,15) = 1.0, P = 0.32]. That is, participants chose the well-learned movement direction just as likely as the other directions even when no error was encountered. This clearly violates the prediction of uncertainty-based selection models, as any variant of this model would have predicted a bias away from a just-practiced skill component when no error was encountered.
Relationship between selection strategy and performance
How did the participants' active learning strategies affect their performance? In general the average absolute errors in test trials were slightly, but not significantly bigger after active learning compared with passive learning blocks [paired t-test, t(15) = –0.238, P = 0.82]. We postulated that subjects might have used a combination of good (e.g., error-dependent repetition) and bad strategies (e.g., blind repetition). We looked at the correlation between strategy and performance on test trials after active learning. Because performance was determined largely by the overall proficiency of the participants, for each participant, we subtracted the average error during test trials after active learning from the average error after passive learning. The difference was then correlated with individual parameter estimates (
repeat and
error) from the softmax regression. A positive correlation of the parameter with the difference in errors indicated that this strategy facilitated learning, while a negative correlation indicated that this strategy hurt learning.
There was a positive correlation between error sensitivity and later active test performance (1-tail Spearman's correlation, r = 0.48, P < 0.05). Participants who sought to train in directions where their errors were big performed better in subsequent testing (Fig. 3,
error). Importantly, two participants that displayed error avoidance (negative values for
error) showed poorer performance relative to their own performance after passive learning. Furthermore, we found a strong negative correlation between
repeat and subsequent test performance (Fig. 3). The more likely participants repeated a direction (after the influence of the error size has been accounted for) the worse was their test trial performance (1-tailed Spearman's correlation, r = –0.60, P < 0.01). Thus the violation of the optimal active learning strategy indeed hurt the performance of the participants in the active compared with the passive learning condition. These analyses show that the individual's active learning strategy influenced later performance on test trials. Furthermore, repetition of targets in the absence of errors led, as predicted by uncertainty-based models (Cohn et al. 1996
), indeed to poorer learning outcomes.
|
One reason to re-select the last direction despite no errors may be that one is trying to estimate the consistency of the observation. For example, if a participant found that for one direction the perturbation changed randomly from trial to trial, a good active learning strategy would be to ignore this direction because training here could not lead to further improvement. Did the variance of the perturbations affect the choices?
To test this idea, we introduced a condition—constant-in-random—in which one direction was perturbed with a constant force field ("C" target), whereas the other three were perturbed with a random force field ("R" targets). To assess the influence of consistency, we attempted to match the absolute sizes of the errors of the movements toward all the directions. Because participants would adapt to the constant force field, we introduced a stronger field in the constant target than in the random ones and flipped the perturbation direction after 30 trials (Fig. 4A). As a result the errors for the constant target were large immediately after the onset of the block and after the switch. However, the errors in the "C" target became smaller than the "R" targets by the end of each phase.
|
While we attempted to match the absolute error size in experiment 1, the average size of the force field was different. Furthermore, the number of trials in a block (30) might have been too small to allow participants to estimate variance for the different skill components. To address these concerns and to test for the influence of error variance explicitly, we designed a second experiment in which the averages of the force perturbations were matched, and participants made substantially longer sequences of movements (160 per block). The perturbations associated with each target were drawn from a distribution that had identical mean (10 Ns/m). Each target in a block had a perturbation variance that was low, medium, high, or very high (Figs. 1D and Fig. 5B, abscissa). We labeled the corresponding targets R1, R2, R3, and R4 (Fig. 1C). Participants adapted to the mean force field for all four variance levels as seen in the decrease of mean error over many trials. However, trial-to-trial variance of errors remained high for movements with highly variable perturbation [1-way ANOVA, F(3,63) = 30.45, P << 0.001; Fig. 5, A and B, abscissa).
|
Finally, it is possible that participants repeated even high-variance targets because they attempted to reduce their motor variance strategically through an increase in stiffness for these movement directions (Burdet et al. 2001
). To test for this possibility, we estimated the stiffness of the arm for each movement direction and variance level. While the stiffness varied systematically with the movement direction, reflecting the natural anisotropy of the arm (Mussa-Ivaldi et al. 1985
), our estimates were not influenced by the variance of the target, (F(3,58) = 1.80, P = 0.157).
In summary, the results of experiment 2 demonstrated that participant's choices were influenced by the absolute size of the error of the last movement but not by the variance of these errors, at least as measured over 160 trials.
|
|
DISCUSSION |
|---|
|
First, when subjects encountered a task component that resulted in large performance errors, they repeated that movement. It is intuitively clear that this strategy should lead to better learning as compared with random selection of task components: big errors can indicate a mismatch between the current estimate of the force perturbation and the correct value and therefore indicate the need to learn. Participants who sought out movement directions with large errors were more successful in subsequent test trials; participants who avoided errors were comparatively less successful.
Our second finding was that after performing a perfect movement (i.e., no errors), participants did not avoid that task component. Current algorithms in machine learning show the opposite tendency: making an observation close to an action component reduces the model uncertainty in the neighborhood of this observation and therefore reduces the probability of re-selecting this component in the next training trial. People did not follow this strategy during active learning. When no error was observed in a movement direction—a situation in which the estimated uncertainty of the output should have been reduced, they did not avoid this movement direction. This behavior was suboptimal as it was correlated with poorer test performance after active learning. Thus reducing the tendency to repeat well-executed task components may help people improve overall task performance.
Why do people repeat task components even when the last error was very small? We tested the hypothesis that this might reflect a strategy to test the consistency or variance of the task component. Such knowledge could then be used to avoid task components in which large errors arise from high variance of the environment rather than from a large mismatch between average required and average produced motor behavior. We found that while participant's choices were dependent on the absolute size of the last error, they were insensitive to the cross-trial variance of these errors. The results can imply one of two things. First, participants may not estimate the variance of the error signal over multiple trials. This is congruent with recent results that showed that variance of reward values does not influence decisions (Daw et al. 2006
). Alternatively, it may imply that participants were trying to reduce the variance of the errors for movement directions with high trial-to-trial variance through a strategic increase in stiffness (Burdet et al. 2001
). While this remains a possibility, our analysis suggests that they were not successful in doing so. As a result, we concluded that the strategy of re-selecting the target was suboptimal. Indeed for most participants who showed this behavior performance after active learning was slightly worse than after passive learning in which trials were picked at random.
Thus participants repeated already learned skills rather than explore new, untrained task components. Indeed, performance during training trials was better during active learning than during passive learning likely due to the larger number of target repetitions during active learning. While this strategy led to poorer performance in the short term, it may have increased the motivation during the task. Recent studies indicate that positive, motivating feedback may increase retention of learned motor skills in the long-term (Chiviacowsky and Wulf 2007
). The optimality of the active machine-learning algorithm only reflects the minimization of cost terms associated with the explicit task goal. Therefore it is possible that repeating a task stems from its benefits over a long-term period, a component that we did not assay in our protocol. In addition, it is possible that as the targeted skill involve more variables, other principles may determine the optimal active learning strategy (Wulf and Shea 2002
). Indeed there is evidence that the sequence of learning examples affects retention properties of the acquired skill. In a task where participants were asked to learn three different punch styles, people who trained with a random schedule—practice trials on all three styles were conducted intermittently—retained their performance better after 10 min and after 10 days, when compared with people who trained one style at a time (Shea and Morgan 1979
). Similar results emphasizing the benefits of concurrent and intermixed training of several subskills as a whole were found in basketball shooting (Memmert 2006
), pistol shooting (Keller et al. 2006
), surgery training (Brydges et al. 2007
), and three-dimensional spatial orienting (Shebilske et al. 2006
). For example, in the basketball study, it was found that people had better acquisition when shooting positions were blocked but better retention of the skills when shooting positions were randomized.
These results raise the possibility that choices made during training can have different effects on short-term versus long-term measures of performance. Based on our study we can only make inferences about the short-term effects of these choices. However, because most evidence suggest an improvement of long-term retention with intermixing of training examples, we think it is likely that these results would generalize to a longer time scale. Our results highlight that humans do not always choose the optimal learning strategy when given the chance to select their own training sequence, possibly preferring immediate positive feedback to the chance of exploring new, unlearned task components. We showed that we can separate aspects of learning strategy that improved overall performance from aspects that impaired performance. Our findings imply that it should be possible to design adaptive algorithms, i.e., an artificial coach, that would lead to better short-term gains than random training and, in particular, better than what the students are likely to do on their own. Specifically, the present results predict that an artificial coach could be designed that produces better performance simply by instructing the student to repeat task components only when the last error in that component was large. Such adaptive training algorithms may play a useful role in sports training, as well as robot-based rehabilitation training after stroke or developmental disabilities.
|
|
APPENDIX 1 |
|---|
|
n for a particular component xn depends on the corresponding parameter plus some motor noise
n, a random variable with zero-mean and variance
2
![]() | (A1) |
After an action on trial n, the system learns from the performance error, the difference between the actual
n and optimal output yn. Thus on each trial
![]() | (A2) |
After the criterion proposed by Cohn et al. (1996)
, the best skill component to train on trial n is xn*, the component that, after learning, will reduce the expected squared error on trial n + 1 (the expected value is taken over all possible components xn+1 and produced movements)
![]() | (A3) |
Under the assumption of an unbiased learner (i.e., a learner that does on average not show a systematic error), the expected squared error of the output is the uncertainty about the relevant parameter
n+1(x) plus the variance of producing the output (
2), again averaged across all possible skill components on trial n + 1
![]() | (A4) |
The uncertainty is defined as the expected squared distance from the unknown ideal model parameters w(x)*
![]() | (A5) |
Now we have to calculate how observing the error of trial n for skill component xn influences the uncertainty of the model parameter on the next trial after using the learning rule in Eq. A2. To do so, we can use Eq. A2 to expand the term (w(x)* – wn+1(x))
![]() | (A6) |
We assume that the motor errors
n have variance
2 and are independent of the parameter uncertainty. Thus we can express the uncertainty around wn+1(x) after perceiving behavior x(n) as
![]() | (A7) |
From this we can see that the change in uncertainty after learning for a particular skill component x is
![]() | (A8) |
Thus for a constant learning rate 0 < K < 1, it follows directly from Eq. A8 that the average uncertainty will be reduced most, if we pick a behavior xn*, for which the corresponding parameter uncertainty
x(n) is highest.
Thus the decision rule in Eq. A3 can be simplified as
![]() | (A9) |
We also considered the result for adaptive learning rates. The optimal Kn, known as the Kalman gain, is the learning rate that results in the lowest possible uncertainty in the parameter after learning. To obtain this learning rate, we differentiate Eq. A7 with respect to Kn. The resulting optimal adaptive learning rate depends on the parameter uncertainty
(x)and the motor noise
2
![]() | (A10) |
With such flexible learning rate the updated optimal parameter uncertainty becomes
![]() | (A11) |
The decrease in uncertainty between trial n and n + 1 therefore is maximal when
n(x) is maximal. Therefore the optimal selection rule (Eq. A9) remains valid. Indeed this can be shown for a number of different choices of Kn. It should be noted that we assume that there is negligible generalization between behaviors, that the motor noise
2 is constant across all components, and that the amount of motor noise cannot be changed by learning.
|
|
APPENDIX 2 |
|---|
|
To set out, let us assume that the environment (v) changes following a simple auto-regressive process of order 1 with 0 < AT < 1
![]() | (A12) |
An optimal Bayesian learner should then mirror the rate of change in the environment with a forgetting factor of the same size. Thus the learning rule (Eq. A2) becomes
![]() | (A13) |
Thus in the absence of observations and –1 < A < 1, all weights drift back toward zero.
The uncertainty would also be updated to match the uncertainty in the environment
![]() | (A14) |
As long as the learner is unbiased, i.e., AT = AL and QT = QL, the expected squared error will still be as in Eq. A4. Thus the only thing that changed from APPENDIX 1 is that we have added a constant Q to the uncertainty and scaled the uncertainty by A2 on every trial. Neither of these manipulation changes where the minimum for the choice in Eq. A9 lies. Thus as long as we have an unbiased learner, the selection rule in Eq. A9 remains optimal.
What if the forgetting rate of the learner (AL) and the "true" forgetting rate of the environment (AT) are not the same? Then we will see a discrepancy between ALwn and ATwn, and this difference will be bigger the further wn is away from zero (the prior). So if learner has a weight that has a large absolute value, then the forgetting will make the internal estimate systematically closer to zero than in the environment. The optimal strategy would then to repeat these movements or observations more to offset the faster forgetting rate with repeated training.
The argument here does not rest on the assumption that the forgetting rate of the learner and the forgetting rate of a specific experimental environment are matched. Rather we propose that the forgetting rate is matched to the average forgetting rate in the environment and that under these conditions the active learning rule (Eq. A9) is optimal.
|
|
GRANTS |
|---|
|
|
|
FOOTNOTES |
|---|
Address for reprint requests and other correspondence: V. Huang, 710 W. 168thSt., Rm. 13-12, Motor Performance Laboratory, Columbia University Neurological Institute, New York, NY 10033 (E-mail: vh2181{at}columbia.edu)
|
|
REFERENCES |
|---|
|
Burdet E, Osu R, Franklin DW, Milner TE, Kawato M. The central nervous system stabilizes unstable dynamics by learning optimal impedance. Nature 414: 446–449, 2001.[CrossRef][Web of Science][Medline]
Chiviacowsky S, Wulf G. Feedback after good trials enhances learning. Res Q Exerc Sport 78: 40–47, 2007.[Web of Science][Medline]
Cohn DA, Ghahramani Z, Jordan MI. Active learning with statistical models. J Art Intell Res 4: 129–145, 1996.
Cohen JD, McClure SM, Yu AJ. Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Philos Trans R Soc Lond B Biol Sci 362: 933–942, 2007.
Daw ND, O'Doherty JP, Dayan P, Seymour B, Dolan RJ. Cortical substrates for exploratory decisions in humans. Nature 441: 876–879, 2006.[CrossRef][Web of Science][Medline]
Donchin O, Francis JT, Shadmehr R. Quantifying generalization from trial-by-trial behavior of adaptive systems that learn with basis functions: theory and experiments in human motor control. J Neurosci 23: 9032–9045, 2003.
Einhauser W, Mundhenk TN, Baldi P, Koch C, Itti L. A bottom-up model of spatial attention predicts human error patterns in rapid scene recognition. J Vis 7: 6 1–13, 2007.[Medline]
Huang VS, Shadmehr R. Evolution of motor memory during the seconds after observation of motor error. J Neurophysiol 97: 3976–3985, 2007.
Hwang EJ, Donchin O, Smith MA, Shadmehr R. A gain-field encoding of limb position and velocity in the internal model of arm dynamics. PLoS Biol 1: E25, 2003.[Medline]
Keller GJ, Li Y, Weiss LW, Relyea GE. Contextual interference effect on acquisition and retention of pistol-shooting skills. Percept Mot Skills 103: 241–252, 2006.[CrossRef][Web of Science][Medline]
Kording KP, Tenenbaum JB, Shadmehr R. The dynamics of memory as a consequence of optimal adaptation to a changing body. Nat Neurosci 10: 779–786, 2007.[CrossRef][Web of Science][Medline]
Kording KP, Wolpert DM. Bayesian integration in sensorimotor learning. Nature 427: 244–247, 2004.[CrossRef][Web of Science][Medline]
Memmert D. Long-term effects of type of practice on the learning and transfer of a complex motor skill. Percept Mot Skills 103: 912–916, 2006.[CrossRef][Web of Science][Medline]
Mussa-Ivaldi FA, Hogan N, Bizzi E. Neural, mechanical, and geometric factors subserving arm posture in humans. J Neurosci 5: 2732–2743, 1985.[Abstract]
Shea JB, Morgan RL. Contextual interference effects on the acquisition, retention and transfer of a motor skill. J Exp Psychol (Hum Learn) 3: 179–187, 1979.
Shebilske WL, Tubre T, Tubre AH, Oman CM, Richards JT. Three-dimensional spatial skill training in a simulated space station: random vs. blocked designs. Aviat Space Environ Med 77: 404–409, 2006.[Medline]
Thoroughman KA, Shadmehr R. Learning of action through adaptive combination of motor primitives. Nature 407: 742–747, 2000.[CrossRef][Web of Science][Medline]
Wulf G, Shea CH. Principles derived from the study of simple skills do not generalize to complex skill learning. Psychon Bull Rev 9: 185–211, 2002.[Medline]
This article has been cited by other articles:
![]() |
V. S. Huang and R. Shadmehr Persistence of Motor Memories Reflects Statistics of the Learning Event J Neurophysiol, August 1, 2009; 102(2): 931 - 940. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Visit Other APS Journals Online |