## Abstract

How the central nervous system coordinates the many intrinsic degrees of freedom of the musculoskeletal system is a recurrent question in motor control. Numerous studies addressed it by considering redundant reaching tasks such as point-to-point arm movements, for which many joint trajectories and muscle activations are usually compatible with a single goal. There exists, however, a different, extrinsic kind of redundancy that is target redundancy. Many times, indeed, the final point to reach is neither specified nor unique. In this study, we aim to understand how the central nervous system tackles such an extrinsic redundancy by considering a reaching-to-a-manifold paradigm, more specifically an arm pointing to a long vertical bar. In this case, the endpoint is not defined a priori and, therefore, subjects are free to choose any point on the bar to successfully achieve the task. We investigated the strategies used by subjects to handle this presented choice. Our results indicate both intersubject and intertrial consistency with respect to the freedom provided by the task. However, the subjects' behavior is found to be more variable than during classical point-to-point reaches. Interestingly, the average arm trajectories to the bar and the structure of intertrial endpoint variations could be explained via stochastic optimal control with an energy/smoothness expected cost and signal-dependent motor noise. We conclude that target redundancy is first overcome during movement planning and then exploited during movement execution, in agreement with stochastic optimal feedback control principles, which illustrates how the complementary problems of goal and movement selection may be resolved at once.

- arm movement
- bar reaching
- stochastic
- optimal control
- decision making

numerous studies have addressed the question of how the central nervous system (CNS) controls the many degrees of freedom of the musculoskeletal system by focusing on point-to-point arm movements. They showed that human movements exhibit kinematic/dynamic motor invariant features (Atkeson and Hollerbach 1985; Morasso 1981; Soechting and Lacquaniti 1981), suggesting that the CNS overcomes redundancy by using some particular rules. In all these studies and many others (e.g., Flash and Hogan 1985; Soechting et al. 1995; Biess et al. 2007), the requested two-dimensional (2-D) or three-dimensional (3-D) movements were oriented toward target points specified by the experimenter. However, daily life actions often involve movements toward objects that do not explicitly provide the point to reach. Consider for example the simple task of lifting a bowl: there is no unique way of positioning the fingers on the object surface to fulfill the task. More precisely, there is an infinite number of target positions for placing the fingers that equivalently serve the lifting task. Grasping a pen would be a different example because prevalent points (e.g., extremities, center of mass) may attract the reaching or be more relevant for the upcoming action. These examples nevertheless illustrate that, besides the intrinsic (i.e., body level) redundancy, there is a different and ecological type of degree of freedom that the brain has to deal with: the extrinsic (i.e., target-level) redundancy. So far, computational studies of goal-oriented movements have mainly focused on the problem of choosing between alternative means to achieve a prescribed goal (the “how,” e.g., Guigon et al. 2007) and little has been done on the complementary problem of “where” to reach (Haggard 2008).

In this study, we thus consider a manifold reaching paradigm to investigate the mechanisms underlying the simultaneous control of both intrinsic and extrinsic degrees of freedom. The choice of a manifold, i.e., a mathematical entity that “locally resembles the Euclidean space,” is motivated by the fact that most of the objects we encounter in life can be well modeled as such. The surface of a coffee mug with a handle, for example, is topologically equivalent to a 2-D manifold. We define the “manifold reaching paradigm” as the study of reaching movements in which the target is a manifold in Cartesian space. The main purpose of this article is to investigate the subjects' behavior when facing the absence of an unequivocal endpoint to reach.

When the endpoint is known, the brain is required to transform the visual information about target position into a suitable movement plan at hand (Desmurget et al. 1998). In a “free choice” paradigm, additional decision-making processes come into play to select a particular endpoint (Andersen and Cui 2009). For example, when multiple targets are presented to a monkey, the primate brain concurrently computes possible motor plans until a reach decision is made (Cisek and Kalaska 2005). In the case of a target manifold, that is a continuum of possible endpoints, comparing candidate motor plans would be computationally much more intensive and, therefore, how subjects may handle such a situation is a priori unknown.

In this article, we focus on pointing to a long vertical rod, a simple instance of manifold for which the end effector has to be controlled along the anteroposterior/transversal axes while the vertical dimension is completely irrelevant for the task achievement. We ask whether the same intertrial/subject consistency as during point-to-point reaches is observed when such a target manifold with task-equivalent endpoints is presented. If so, it would indicate that the reduction of spatial constraints is automatically compensated for by internally driven processes that would remain to be understood. For example, it might be that subjects select a virtual endpoint, possibly depending on the starting posture, and always reach to it as if it was prescribed by the experimenter. However, the task may be conducive to larger variability compared with classical point-to-point studies. Indeed, participants might interpret this free choice as a requirement for randomness over trials or as a possibility to reduce the precision of their movements along the vertical, leading them to vary the endpoints significantly. The endpoint location could thus change depending on trials and subjects but to an extent that is not yet characterized.

The experimental results show that, despite the lack of a precise point to be reached, most participants reduced the set of all possible behaviors to well-characterized hand trajectories. For all subjects, the endpoint typically depended on the starting posture and was subject to larger variability across trials than if a target point was prescribed. The endpoint distributions were principally oriented along the vertical axis (i.e., the unspecified dimension of the task). These observations were found to be consistent with an optimal manifold reaching strategy in which goal and movement selection are simultaneously resolved through the optimization of an expected cost involving a mix between mechanical energy expenditure and joint smoothness. Our results emphasize the fundamental role of stochastic optimal feedback control in replicating the experimental endpoint distributions. More precisely, by modeling signal-dependent motor noise and by exploiting the minimal intervention principle during movement execution (Todorov and Jordan 2002), it was possible to reproduce the structure and the magnitude of intertrial variations.

## MATERIALS AND METHODS

### Experimental Task

#### Participants.

Twenty naive subjects [16 males; means ± SD: age 26.9 ± 2.5 yr, range (18–31); mass 69.9 ± 8.4 kg; height 1.76 ± 0.06 m] volunteered to participate in the experiment. All of them were healthy, right-handed, and with normal or corrected-to-normal vision. The experimental protocol used was in accordance with the principles expressed in the Declaration of Helsinki and approved by a local ethics committee (Azienda Sanitaria Locale, Genoa).

#### Pointing-to-a-bar paradigm.

The motor task used in the experiments is illustrated in Fig. 1. From a sitting position, participants were asked to perform a series of pointing movements toward a vertical target bar. The bar was a rigid and thin gray tube (1 cm of diameter). For the task, shoulder and elbow rotations were allowed, while the wrist joint was frozen by means of two light and small sticks attached to the distal part of the forearm and the proximal part of the hand. The vertical bar was placed in front of the participants in the para-sagittal plane intersecting the shoulder joint. No target point was emphasized on the bar, and its height was 2.50 m so that subjects could not see its extremities without moving the head or the trunk. The horizontal distance of the shoulder from the bar was set to 85% of the subject's full arm length (*L* = *l*_{1} + *l*_{2}, with *l*_{1} and *l*_{2} the lengths of the upper arm and forearm, respectively; see Fig. 1). Five initial arm postures, denoted by P1 to P5, were defined by means of reference points located in a vertical plane, placed laterally at ∼10 cm from the subject's movement plane. To this aim, we used a wooden hollow frame containing 1.5 cm-spaced thin vertical fishing wires to which lead weights (small spherical balls used to fish) indicating the requested fingertip initial position were attached. Finally, differently colored pieces of scotch-tape were stuck on the leads to easily identify the references. This color code was then used to specify the initial posture that the subject had to select at the beginning of each movement. By imposing the initial finger position, a unique starting posture of the arm was consequently defined in the para-sagittal plane. The positions of the leads were adjusted before the experiment, based on the subject's upper arm and forearm lengths and the vertical distance shoulder ground. The initial postures that we used, denoted by P1, P2, P3, P4, and P5, are reported in Table 1.

The experimenter then gave the following instruction to the participants: “look at the bar in front of you, close the eyes and quickly show the location of the bar by touching it with the fingertip, performing a one-shot movement.” No other instruction was given to the subjects with respect to where and how to reach the bar. Because of the features of the task itself, participants had to implicitly control the finger position along the anteroposterior and lateral directions whereas complete freedom was left along the vertical one. Note that the challenge (explicit reward function of the task) for the subjects was to be precise enough to actually touch the bar, since no on-line vision was allowed. Since subjects moved in 3-D, touching the bar was not so easy because of lateral and anteroposterior errors (and, indeed, this did not happen often). Nevertheless, reaching any point on the vertical bar allowed the subject to perform the task successfully. The experimenter could verbally congratulate/motivate the subjects in case of accurate reaching, but the subjects were not paid to perform the experiment nor were they given any performance-based monetary reward. The experimenter also waited until subjects closed the eyes before giving the “go” signal. Thus the delay between the closure of the eyes and the movement start was minimized. During the protocol, the five initial postures were tested in a random order to prevent subjects from remembering precisely the location of the endpoint during the preceding trial. For each initial posture, 20 trials were recorded, so that a total of 100 movements per subject were monitored. A few trials were repeated during the experiment (<5%), when the subjects missed the bar with a very large error or did not perform a one-shot movement. Every 25 movements, subjects were allowed to rest. The full experiment lasted ∼1 h for a single subject. Data from a total of 2000 pointing movements were collected and analyzed as explained below.

### Data Collection and Processing

#### Materials.

Arm and head motion were recorded by means of a motion capture system (Vicon, Oxford, UK). Ten cameras were used to capture the movement of six retro reflective markers (15 mm in diameter), placed at well-defined anatomical locations on the right arm and head (acromial process, humeral lateral condyle, ulnar styloid process, apex of the index finger, external cantus of the eye, and auditory meatus).

#### Motion analysis.

All the analyses were performed with custom software written in Matlab (Mathworks, Natick, MA) from the recorded 3-D position of the six markers (sampling frequency of 100 Hz). Recorded signals were low-pass filtered using a digital fifth-order Butterworth filter at a cutoff frequency of 10 Hz (Matlab *filtfilt* function).

The temporal finger movement onset was defined as the instant at which the linear tangential velocity of the fingertip exceeded 5% of its peak and the end of movement as the point at which the same velocity dropped below the 5% threshold. All time series were normalized to 200 points by using Matlab routines of interpolation (Matlab *spline* function). Standard kinematic parameters described in previous experimental arm pointing studies were calculated (Atkeson and Hollerbach 1985; Papaxanthis et al. 2005): movement duration (MD), peak velocity (PV), average velocity (AV), relative time to peak velocity (TPV) defined as the ratio between the acceleration duration and MD, index of finger path curvature (IPC = Dev/LD) defined as the ratio of the maximum path deviation (Dev) from a straight line connecting the initial and final finger positions [linear distance (LD)], index of velocity shape (PV/AV) defined as the ratio between the peak of velocity and its mean value, and curvilinear distance of the finger defined by the integral over time from 0 to MD of the norm of the fingertip velocity vector. The constant error was computed as the orthogonal distance between the terminal finger position and the bar. The variable error was computed as the SD of distances between final finger positions, across trials.

For subsequent analyses and comparisons with models, we projected the 3-D coordinates of the markers onto a vertical plane. It will be shown thereafter that the movements carried out by the participants almost lay on a para-sagittal plane. The motion capture system was calibrated such that the axes *X* and *Y* corresponded to the anteroposterior and vertical axes, respectively. Thus movements were approximated as movements performed mainly in the *XY* plane.

Angular displacements of the arm segments (upper arm and forearm) were then evaluated using the inverse kinematic function, relating the (*x, y*) position of the finger in plane *XY* to the arm configuration θ = (θ_{1}, θ_{2})^{t} (subscript 1 standing for the shoulder joint). This inverse kinematic function can be found in standard textbooks (e.g., Murray et al. 1994). Note that the shoulder joint was defined as the origin of the frame of reference, i.e., Sh: (0, 0) (see Fig. 2).

Finally, additional task-relevant parameters were computed. The endpoint consistency index (CI) was defined as the ratio between the SD of the fingertip position on the *Y*-axis and the length of the reachable region, which was computed from the intersection points between the bar and a shoulder-centered circle of radius *L*. The CI parameter provides information concerning the percentage of the bar used by the subjects. The smaller this index, the more consistent the subject's behavior for the selection of a terminal point on the bar. The location of the reached point was calculated with respect to the shoulder position and normalized by the subject's arm length *L* (referred to as RP). In other words, the location of the endpoint on the bar is RP × *L* (in meters). Confidence ellipses at a 95% threshold were also computed to analyze the structure of variability in the endpoints. Three parameters were then computed to characterize an ellipse (see van Beers et al. 2004): *1*) the aspect ratio defined as the square root of the ratio of the two eigenvalues of the covariance matrix (the larger divided by the smaller), *2*) the orientation corresponding to the orientation of the main eigenvector, and *3*) the total variance corresponding to the trace of the covariance matrix. To detect whether subjects chose to move upward or downward, we computed the movement vector angle (denoted by MV) and defined as the counterclockwise oriented angle between a horizontal line and the line connecting the initial and terminal fingertip positions. Moreover, the index of path curvature defined above was slightly modified. To assess whether the finger path had a convex or concave trend, we computed the average of the maximum deviations with respect to the virtual straight line, attributing a positive sign when the finger position was above the straight line (for concavity). Thus this parameter, denoted by signed index of path curvature (sIPC), evaluates the average convexity or concavity of a path. Also, joint coupling was assessed by computing the correlation coefficients between the joint angles time series.

### Statistical Analysis

We used quantile-quantile plots to visually check that the data were normally distributed (*qqplot* Matlab function). Shapiro-Wilk's test was used to quantify these observations for some relevant parameters. One-way ANOVAs were also performed to analyze the effects of the initial posture on certain parameters. Post hoc tests were conducted with Scheffé's test when necessary and appropriate (the chosen threshold was 0.05).

### Control Experiments

In addition to the main protocol, three control experiments were conducted to answer the following questions: *1*) To what extent the endpoint variability may be attributed to the lack of visual feedback during movement execution? *2*) What would be the endpoint distributions if a target point rather than a bar were reached to, without on-line vision? *3*) Can we observe an effect of learning, in particular corresponding to a reduction of intertrial variability with practice?

In the first control experiment, we asked three subjects to perform the original protocol and to repeat it again but with eyes open (i.e., with on-line vision). The second control experiment consisted of testing the effect of replacing the target bar by a target point. The average target points were known for each starting posture after the subjects had performed the original experiment with a target bar. Then, three subjects were asked to produce 20 movements directed toward the corresponding target point, with eyes closed (i.e., no on-line vision). These control experiments permitted us to test the hypothesis that the subjects may choose a priori the target point based on the initial arm posture. In the third control experiment, two subjects repeated the main protocol 5 times in a row so that 100 movements per initial posture were recorded (total of 500 movements for the whole session). To reduce the effects of fatigue, resting periods were allowed between sets of 50 trials. This protocol allowed us to clarify the role of trials repetition on the intertrial consistency of subjects.

### Modeling

We use deterministic and stochastic optimal control theories to model the behavior of the subjects. By doing so, we assume that the CNS optimizes some cost functions to plan and execute movements. Optimal control theory was proven to be successful in predicting many features of point-to-point reaching movements (see Engelbrecht 2001; Todorov 2004 for reviews). Therefore, we hypothesize here that the strategy employed by the subjects comes from a manifold reaching strategy guided by some optimal processes resolving the task indeterminacy.

#### Model of the musculoskeletal system.

The recorded 3-D arm movements approximately lied on the para-sagittal plane (see the results). Thus a reasonable approximation was to model the arm as a two-joint rigid body system moving in the gravitational field. A classical application of Lagrangian mechanics allows us to express the arm dynamics using the general form (Murray et al. 1994):
_{1}, θ_{2})^{t}, τ= (τ_{1}, τ_{2})^{t} denote the joint angle and torque vectors, respectively. A dot above a variable stands for its time derivative. The quantities *M*, *C*, **G**, and *F* are the inertia matrix, the Coriolis/centripetal terms, the gravitational vector, and the viscosity matrix, respectively. Numerical values are reported in the appendix.

Furthermore, we modeled the fact that the joint torques τ are smoothly generated by muscle contractions:
**u** is the motor command and can be thought as the neural input given by the motor neurons to the muscles, even though this is clearly an overinterpretation in that case. Note that for simplicity, we did not model agonist muscles, antagonist muscles or the complex dynamics of muscle contraction. Nevertheless, our tests indicated that the results presented in this study do not critically depend on this choice. For instance, we checked that the classical modeling of agonist/antagonist muscles as a second order low-pass filters (Van der Helm and Rozendaal 2000) did not change the results of this study.

To simplify the derivations and notations in the next subsections we define **q**^{⊤} = (θ^{⊤}, θ̇^{⊤}, τ^{⊤}, τ̇^{⊤})∈ℝ^{8} and **u** = μ∈ℝ^{2}. The pair of *Eqs. 1* and *2* can be rewritten in state-space as:

#### Optimal control models.

As mentioned above, we made use of optimal control to resolve the redundancy of the task. The task is redundant because the target is a vertical bar, given by the equation *x* − 0.85*L* = 0 in Cartesian coordinates. This can be rewritten as *l*_{1} cos θ_{1} +*l*_{2} cos(θ_{1} +θ_{2}) − 0.85*L* = 0 in joint space. The final position was characterized by zero velocity and zero acceleration. Therefore, the target manifold can be defined in state-space by a certain vector-valued mapping **m**(**q**) = **0** (see the appendix for details). The fact that this mapping is surjective is exactly the reason why the task is kinematically redundant, even though we modeled the arm as a simple two-joint manipulator moving in a vertical plane. Note that in general reaching to a Cartesian manifold can be an interesting way to overcome the curse of dimensionality issue while preserving redundancy of a motor task.

The goal of optimal control is to find a motor command **u** and the corresponding state trajectory **q** satisfying *Eq. 3*, connecting a (given) starting arm posture to an (a priori unknown) endpoint on the target bar in time *T* and yielding a minimal value of a cost function *C*.

To explain the average behavior of subjects, it is usual to consider deterministic optimal control models (i.e., models neglecting sensorimotor noise). However, the manifold reaching paradigm may involve more variability than classical point-to-point reaches so that modeling motor noise may reveal itself crucial to account for motor variability. Therefore, we will consider both deterministic-based and stochastic-based optimal control models to separately test the effects of noise about localization/planning and motor execution.

In each case, we will analyze the predictions of two cost functions already proposed in the literature. On the one hand, it is often argued than the energy of motor neurons (in the sense of signal theory) is a natural criterion for motor planning (Todorov and Jordan 2002; Guigon et al. 2007). This cost is often referred to as “effort” cost (here denoted by *C*_{Eff}). This type of cost is convenient in optimal control because it is quadratic in the control variable, which usually simplifies computations in both deterministic and stochastic contexts (e.g., Anderson and Moore 1971 and Athans 1971). On the other hand, it has also been shown that mechanical energy expenditure and motion smoothness are likely to play a fundamental role in motor planning (Flash and Hogan 1985; Ben-Itzhak and Karniel 2008; Soechting et al. 1995; Berret et al. 2008). We will thus consider a compromise between the absolute work of torques and the integrated squared acceleration, a cost that was found to be relevant for vertical pointing movements (see Berret et al. 2008, this cost being denoted by *C*_{En/Sm}, for energy and smoothness). Here we use the term smoothness in the broad sense of “having small high-order derivatives” (Todorov and Jordan 1998). The integrated squared acceleration is just a member of the more generic class of mean squared derivative costs, which favor motion smoothness to different degrees (see Richardson and Flash 2002). For instance, the minimum jerk and snap costs are also smoothness-based cost functions. Table 2 gives the precise expression of the costs under consideration in this study. Throughout this article, we will often refer to those two costs as effort and energy/smoothness, respectively.

#### Models based on deterministic optimal control.

For deterministic-based optimal control models, the goal was to find the optimal arm trajectories to reach the bar with respect to the costs *C*_{Eff} and *C*_{En/Sm}, while taking into account noise about the localization and planning processes of a reaching movement. The optimal control problem (OCP) was subject to additional but classical constraints: the movement duration was specified a priori (denoted by *T*), and linear constraints were imposed on the control **u** and the state **q** to keep the system within biological bounds (see the appendix for numerical values). Anthropometric parameters were also set to realistic values for each participant (see Table 5).

To take into account uncertainty about the initial parameters of the task, we introduced noise about three important parameters of the reaching. First, localization of both the initial arm posture [θ_{1}(*t* = 0), θ_{2}(*t* = 0)] and the target bar location (horizontal shoulder-bar distance) are sources of error. Both processes are actually noisy, mainly based on proprioception and vision, and could therefore affect the bar reaching strategy. Second, movement duration *T* is also variable, which may result from noise in the motor plan itself. In this way, we could therefore test the effects of uncertainty about localization/planning while relying on deterministic optimal control algorithms. This was possible without the need for feedback control because there was no motor noise during execution. The trial-to-trial variability was simply the result of different sets of initial parameters for deterministic OCPs. For each noise instance, a specific OCP had to be solved. The localization/planning noise statistics was evaluated from the experimental measurements (fit to 1-D Gaussians). Note that these models have no free parameters.

There exist efficient numerical techniques to find approximate solutions of deterministic OCPs. The method that we used transforms the continuous OCP into a static nonlinear programming (NLP) problem with constraints. In such a method, time is discretized following a specific scheme. Here we used an orthogonal collocation method, precisely the Gauss pseudospectral method, which is efficiently implemented in the open-source Matlab software *GPOPS* (Benson et al. 2006; Garg et al. 2010; Rao et al. 2010). The NLP problem was solved by means of the well-established numerical software SNOPT (Gill et al. 2005). Briefly, this pseudospectral method relies on time discretization at some points chosen to be the Legendre-Gauss ones, i.e., the roots of a certain order Legendre polynomial. Then, the state and control are approximated using interpolating Lagrange polynomials. Finally, note that we replaced the absolute value function *z* → |*z*|, which is nondifferentiable at the origin, by the smooth function *z* → tanh(10*z*)*z*.

#### Models based on stochastic optimal control.

Another source of variability occurs during motor execution itself (Faisal et al. 2008; van Beers et al. 2004). In such cases, the statistics of the motor noise can be taken into account during the planning process and the task can be achieved with certain accuracy by means of a (optimal) feedback controller. In this context, stochastic optimal control is a classical and efficient tool to model motor behavior (Todorov and Jordan 2002).

Thus, in the stochastic context, the minimum effort model aims to minimize the following expected cost:
* _{p}*[

*l*

_{1}cosθ

_{1}+

*l*

_{2}cos(θ

_{1}+ θ

_{2}) − 0.85

*L*]

^{2}+ 0.1w

*[θ̇*

_{p}_{1}

^{2}+ θ̇

_{2}

^{2}] is the final error cost, encoding target redundancy, and

*C*

_{Eff}is the integral cost used in the deterministic setting. The parameter

*w*is a tuning parameter to adjust the relative weights of the two cost components (i.e., movement and accuracy costs).

_{p}The deterministic dynamics was also modified accordingly to the following nonlinear stochastic differential equation:
**w** is a standard Brownian motion and *F*(**q**, **u**) = *w _{n}*[

*B*diag(1, 0)

**u**,

*B*diag(0, 1)

**u**] is a multiplicative noise (also called signal-dependent noise in some studies, scaled by

*w*). This formula means that we consider the motor noise acts on each control component separately. To solve this nonlinear stochastic optimal control problem (with non-quadratic cost), we used the iterative linear-quadratic-Gaussian (ILQG) algorithm developed by Todorov and Li (2005). At each iteration, an approximation of the problem is obtained by linearizing and quadratizing the dynamics and the cost function, respectively. In all our simulations, Jacobians and Hessians were calculated analytically using Maple (Maplesoft, Waterloo) to avoid finite-differences approximations and speed up computations. The ILQG method eventually yields a locally optimal feedback control law. More precisely, after convergence, a feedforward control sequence

_{n}**ū**and a corresponding optimal state trajectory

**q̄**are returned. An optimal feedback gain matrix is also obtained to correct local deviations from the nominal motor plan due to the presence of motor noise during execution. Note that the nominal trajectory

**q̄**provided by the ILQG algorithm is an optimal manifold reaching solution. Indeed, the endpoint was not given to ILQG but was found automatically by the algorithm such as to minimize the expected cost. The trajectory

**q̄**should not be confused neither with the optimal trajectory of deterministic optimal control (which can be different in theory) nor with a user-specified trajectory to be tracked during execution.

The ILQG algorithm was either initialized with the deterministic solution of the problem found above plus some randomness or directly with random constant vectors to test the algorithm sensitivity with respect to the initial guess. The best run among 20 repetitions was kept. For simplicity, we considered that there was no uncertainty about the system state, that is, we assumed that **q** was fully observable. We also assumed that the initial posture, the movement duration and the target localization were known without uncertainty, to isolate the effects of the different sources of uncertainty.

For this model, two parameters were actually tunable, the nature of the solutions depending on *w _{p}* and

*w*. The precise procedure used to adjust these parameters is explained in

_{n}*Comparisons between data and models*. During the simulations for the effort model, the numerical settings were

*w*= 10

_{p}^{6}and

*w*= 0.16 [similar to values used in Todorov and Li (2005) in the context of point-to-point movements]. The latter expression means that the magnitude of the control-dependent noise was 16% of the magnitude of control signal

_{n}**u**.

For the energy/smoothness problem, using straightforwardly the ILQG algorithm with the cost *C*_{En/Sm} is impossible since there is no control-dependent term in it. We circumvented this problem by including a control-dependent term in the cost but with a very small weight. We thus consider a regularization of the original optimal control problem with the aim to obtain approximate solutions. Theoretical considerations on the regularization method for control-affine systems and state-dependent cost functions can be found in Guerra and Sarychev (2009). Doing so, we could still use the ILQG algorithm while giving priority to the optimization of smoothness and mechanical energy costs. More precisely, we considered the following “regularized” expected cost:
*g* was parametrized by a tunable weight *w _{p}*. The parameter ε was fixed and set to the small number ε = 10

^{−6}to ensure that the contribution of the control energy to the total movement cost was negligible but that the numerical computations were still stable. This was checked a posteriori: the control energy cost represented <1% of the total movement cost of the optimal solutions. Again, the absolute value function was approximated by the smooth function

*z →*tanh(10

*z*)

*z*to permit the quadratization of the cost function. The second tunable parameter was the noise magnitude

*w*. For this model, we used the following set of parameters:

_{n}*w*= 10

_{p}^{5}and

*w*= 0.18. Note that both stochastic optimal control models have the same number of free parameters (two), adjusted as just explained below.

_{n}#### Comparisons between data and models.

The models based on deterministic optimal control had no free parameters. For each cost (energy/smoothness and effort), starting posture (from P1 to P5) and subject (from S1 to S20), we generated 20 movements by varying the initial parameters of the task according to the experimental measurements. Each single movement simulation implied solving a specific instance of a deterministic optimal control problem. The simulated movements resulted to be different because the optimal control problems themselves were different. We thus obtained particular endpoint distributions even though we relied on deterministic models. By fitting the simulated endpoint distributions to 2-D Gaussians, we defined the model prediction. The model prediction was then used to compute log-likelihood scores of the observed endpoints (see below).

The stochastic optimal control models had two adjustable parameters (*w _{p}* and

*w*). To estimate these parameters we fitted each model to a subset of the data by minimizing the negative loglikelihood of the observed endpoints, similarly to van Beers et al. (2004). Here we used the data of the typical subject, including all starting postures, to maximize the likelihood. During the process, we required to keep the total variance of confidence ellipses within physiological bounds. For this reason, sets of parameter yielding a total variance >40 cm

_{n}^{2}(i.e., the experimental mean plus 2 SDs) were rejected because the variability was then unrealistic and could reveal irrelevant solutions found by the ILQG algorithm, even though it could yield lower negative log-likelihood. Because the procedure of likelihood maximization was computationally intensive, we used a discrete grid approach to find the best estimate for

*w*and

_{p}*w*. We assumed that the signal-dependent noise magnitude (

_{n}*w*) could range between 0.06 and 0.20 (step size of 0.02) for both models. For the positional weight (

_{n}*w*) we used the grids [10

_{p}^{3}, 5.10

^{3}, 10

^{4}, 5.10

^{4}, 10

^{5}, 5.10

^{5}, 10

^{6}] and [10

^{5}, 5.10

^{5}, 10

^{6}, 5.10

^{6}, 10

^{7}, 5.10

^{7}, 10

^{8}] for the energy/smoothness and effort costs, respectively. Different grids were used because the movement costs have different units and thus different orders of magnitude.

In practice, we thus generated 200 reaching movements for each possible pair of parameter and each starting posture for the typical subject. We then fitted each set of 200 movements to a 2-D Gaussian distribution to define the prediction for the current candidate model. The negative log-likelihood of each observed endpoint given the corresponding model prediction was eventually computed, and we added these log-likelihood scores together (i.e., across trials). These scores were also summed across starting postures because we wanted to find a single pair of parameters valid for all conditions simultaneously. This cumulative log-likelihood score quantified how well the experimental endpoints matched the model prediction. The pair of parameter yielding the highest log-likelihood score was then selected and used for all subjects and conditions. Doing so, we performed one round of cross-validation. Indeed, the two free parameters were adjusted using only a subset of the data (here the typical subject) and validation was done on the remaining 17 subjects, by computing the negative log-likelihood score for every subject (the 2 atypical subjects were excluded from this analysis).

Finally, note that for the four models under consideration (2 costs; effort and energy/smoothness × 2 frameworks; deterministic and stochastic), we also computed the movement parameters defined in *Motion analysis*.

## RESULTS

### Experimental Analysis

#### Task achievement and general movement features.

All subjects could perform the task easily. The behavior of a representative subject is illustrated in Fig. 3*A*. Participants did not raise any particular question concerning the goal of the task, such as where to reach the bar. Even though the subjects were instructed to produce one-shot movements without terminal adjustments, they performed reasonably well with respect to the precision of the movement. The horizontal constant error (distance to the bar on the *x*-axis) was 2.2 ± 1.4 cm on average across subjects and initial positions, indicating that the subjects controlled their movements more precisely in anteroposterior axis. The variable error (i.e., the endpoint dispersion) was 1.4 ± 0.4 cm. The lateral error was disregarded here because participants approximately displaced their arm in a vertical plane. Indeed, principal component analyses performed on the 3-D coordinates of the moving markers for each subject showed that the variance accounted for by the two first components was >98% and that the angle between normal vectors of this plane and the vertical plane defined by the acquisition system was ∼4°. Therefore, movements were approximately effected in a vertical plane and subsequent analyses could be performed on the projected data without significant loss of information.

Table 3 reports the general motion characteristics. Movement duration slightly varied across participants and starting positions, and lasted ∼700 ms in general [ANOVA, P1 × P2 × … × P5; *F*(4,15) = 3.04; *P* < 0.05]. In fact, only movements starting from posture P3 were different from the others, but this was not significant (post hoc test). The distance covered by the hand significantly depended on the initial posture [*F*(4,15) = 84.6; *P* < 0.001], and therefore, the average velocity varied accordingly (AV parameter). In particular, the smaller arc length was obtained when starting from P2 (∼30 cm) and the larger distance was from P4 (∼70 cm).

#### Consistency and variability.

Subjects could reach wherever they desired on the bar. Therefore, before going further, it appeared important to verify whether the data were consistent across trials and subjects. In particular, even when starting from the same initial position, it was possible to observe very large intertrial variability. Instead, an analysis of CI (a parameter similar to a normalized variable error along the vertical axis, although this is not an error in this task) showed that this was not the case: on average, each subject used only 5.3 ± 2.2% of the reachable region on the bar (intrasubject variability, means ± SD across conditions). Concretely, this corresponded to a SD of 4.5 ± 1.9 cm on the vertical axis. In other words, this was three times larger than the variability measured on the anteroposterior axis. Thus rather than varying totally the endpoints on a trial-to-trial basis, participants constantly reached to certain preferred regions. In particular, this was true for all starting postures without significant statistical differences [ANOVA, *F*(4,15) = 1.12; *P* = 0.35]. The endpoint regions of all subjects are reported in Fig. 4*A*. On average, the intersubject SD was 10% of the arm length. The endpoint as a function of the starting posture was nevertheless consistent across subjects as indicated by high correlations between the RP parameter and its average value across subjects (r = 0.90 ± 0.17, computed across starting postures for each subject). This means that intersubject differences were mainly related to constant shifts in the vertical axis.

A complementary analysis concerned the 95% confidence ellipses (in the *XY*-plane). A visual inspection indicated that they were vertically oriented (see ellipses in Fig. 3*A*). More precisely, the clockwise angles formed by the major axis and the horizontal were 104 ± 11, 80 ± 21, 96 ± 19, 89 ± 5, and 90 ± 13°, for the initial postures P1 to P5, respectively. Therefore, confidence ellipses were oriented along the bar. The minor axis length was 4.9 ± 0.3 cm on average, whereas the major axis length was 16.0 ± 2.2 cm. Normalized by the length of the reachable region on the bar, it corresponded to <18% of it. Therefore, the variability of subjects was significantly greater along the task-irrelevant direction.

To better identify the cause of this structured variability, we performed three control experiments (see Fig. 5). We first observed that repeating the task multiple times leads to a significant reduction of variability. During the first 60 movements, the total variance of confidence ellipses however, ranged between 20 and 25 cm^{2} on average for all starting postures. Nevertheless, with extensive practice, the total variance decreased significantly (*t*-tests between *trials 1–20* and *81–100*; *P* < 0.001; Fig. 5, *A* and *D*). The subjects thus exhibited a significant intertrial variability for several dozens of movements (Fig. 5*D*, *first row*). After more repetitions, and likely with fatigue and habituation, they adopted a more consistent behavior (total variance ∼15 cm^{2}). Importantly, this total variance was still much larger than when reaching to a target point with eyes closed in a comparable situation (total variance ∼5 cm^{2}). While the orientation of the ellipses was always vertical in the presence of a target bar, this was no more the case when reaching to a target point attached to the bar. Accordingly, the aspect ratio drastically decreased in the latter case indicating more circular confidence ellipses (as illustrated in Fig. 5*B*). When reaching to the bar with on-line vision, confidence ellipses became thinner but with a major axis still aligned with the vertical one. The total variance was ∼15 cm^{2} and the aspect ratio increased compared with the nominal case with eyes closed (∼6 vs. 4 arbitrary units). While the variability along the vertical axis was not significantly changed with eyes open (CI was 5.9 ± 2.2 vs. 5.2 ± 2.2% with eyes closed for the control subjects), a clear improvement of the accuracy in the anteroposterior axis was observed (see Fig. 5*C*). Consequently, the strategy of subjects did not consist in choosing a final point for each starting posture and reaching to it (as Donders' law would stipulate). Instead, these results suggest that the final point was rather the outcome of a bar reaching movement plan. It is likely that the larger variability we observed (as compared with point-to-point movements) was induced by the greater target redundancy. It is nevertheless clear that not all the available freedom was exploited, indicating the existence of a typical trade-off between consistency and variability in the behavior of most subjects. It has to be noted that out of the 20 tested subjects, only 2 behaved quite atypically. One of them exhibited a highly variable behavior, clearly reaching to any point of the bar trials after trials. The second one started to increase drastically his trial-per-trial variability during the second half of the experiment. This kind of behavior could be considered marginal since it appeared for only 10% of our participants and reflected uncommon movement motivations. The first subject decided consciously to reach to a point differing from the one he was looking at before closing the eyes. This strategy resulted in a quite random endpoint selection, and the reason he provided to the experimenter was: “it was allowed by the task.” The second subject started suddenly to vary the endpoint to make the experiment less repetitive/uneventful. Interestingly, even though the intertrial variability was higher for these two participants, their average behavior seemed roughly similar to the other subjects. In the results above and the following, these subjects were thus removed from intrasubject analyses of variability but were kept for intersubject analyses.

#### Endpoint dependence on the starting posture.

The average behavior is illustrated in Fig. 3*A*. A qualitative analysis of the RP parameter showed that the chosen final point depended on the starting arm posture. A statistical analysis proved that this was a significant effect [*F*(4,15) = 36.5; *P* < 0.001]. Post hoc analysis showed that the endpoint when starting from P1 was significantly different from all the others (*P* < 0.05). Similarly, the point reached when subjects started from P5 was significantly different from all the others. Finally, no significant difference was found within the group P2-P3-P4, although a trend was apparent and robust across subjects. Figure 4*A* summarizes these observations and also depicts the location of the terminal point for each posture with respect to the shoulder and head levels. An inspection of the final arm postures also indicated significant changes for the shoulder angle [*F*(4,15) = 14.2; *P* < 0.001] but not for the elbow angle [*F*(4,15) = 0.61; *P* = 0.60]. Precisely, the average final postures were (θ_{1} = −24.7° ± 11.7; θ_{2} = 55.9° ± 15.4), (−45.3 ± 8.8; 56.7 ± 14.2), (−40.7 ± 10.3; 50.7 ± 18.3), (−44.9 ± 8.5; 53.2 ± 13.7), and (−38.8 ± 8.8; 58.0 ± 14.7) for postures P1 to P5, respectively. A post hoc analysis showed that initial position P1 significantly differed from all the other ones.

Finally, we also conducted an analysis on the MV angle (see Fig. 4*B*). An ANOVA revealed a significant effect of the starting posture on the MV parameter [*F*(4,15) = 242.7; *P* < 0.001]. The MV values were negative for P1 and P5 indicating that the hand moved downward. The most vertical movements were obtained when starting from P1 and P4 (average MV equal to −35 and 50°, respectively). Movements starting from P2, P3, and P5 were the most horizontal (MV angles ∼25°, 15°, −14°, respectively).

#### Shape of the finger paths.

A visual inspection of the finger paths showed that their shape was generally curved in the *XY*-plane. Figure 3*A* illustrates that the paths had typical curvatures. In fact, finger paths strongly differed from straightness as reported in Table 3. The IPC parameter was 0.11 ± 0.02 on average. A more precise analysis shows that this result was quite robust across subjects (see Fig. 4*C*). For most initial postures, paths were globally concave, except for P4 for which the fingertip path was clearly convex. An ANOVA confirmed these differences since a significant effect of the starting posture on the sIPC parameter was found [*F*(4,15) = 69.9; *P* < 0.001]. Post hoc tests showed that three distinct groups could be extracted: the convex group (P4), the very concave group (P1, P5), and the slightly concave group (P2, P3). It is noticeable that for the latter group, some subjects indeed produced quasi-straight paths (5/20 for P2 and 10/20 subjects for P3). Nevertheless, we never measured significantly convex paths when starting from P2 and P3.

#### Time-course of joint and finger trajectories.

Figure 3*B*, *left*, depicts the average angular displacements. The graphs were generally monotonic for all subjects and conditions, except for posture P4 at the elbow joint. Based on angular covariations, we determined that the forearm and upper-arm segments were globally well coordinated. The correlation between the elbow and shoulder angles was high on average (*r*^{2} = 0.88 ± 0.09). However, the starting posture had a significant effect on the joint coupling [*F*(4,15) = 21.6; *P* < 0.001]. A post hoc analysis showed that P1 and P4 were significantly different from other initial postures. Movements starting from P4 showed a reduction of joint coupling for 13/20 subjects (*r*^{2} < 0.8) and, more generally, the determination coefficient decreased for all subjects compared with initial postures P2, P3, or P5. The results were similar for P1, for which the *r*^{2} coefficient decreased significantly for the twenty participants. The low joint coupling measured in conditions P1 and P4 was linked to the nonmonotonic nature of the angular displacements and the relatively small amplitude measured at the shoulder and elbow joints, respectively (∼20° on average, see Fig. 4*D*). Cross-correlation analyses confirmed the persistence of a significant decrease for P1/P4 when considering the presence of time lag, possibly due to different onset times (maximum *r*^{2} < 0.86 for P1/P4 vs. *r*^{2} ≈ 0.99 for P2/P3/P5). In fact, an analysis of the angular displacements magnitude showed (Fig. 4*E*) that starting from posture P1 mainly involved an elbow rotation with a small rotation at the shoulder joint. Starting from posture P2 or P3 involves similar angular excursions at both joints, while from posture P4, subjects tended to mainly rotate the shoulder joint with a significantly smaller forearm flexion. Finally, movements from posture P5 implied large rotations of both joints (but twice larger for the elbow).

The finger velocity profiles were bell-shaped in all cases, that is with unique acceleration and deceleration phases, as depicted in Fig. 3*B*, *right*. Therefore, movements were one-shot without terminal adjustments. Velocity profiles presented, however, some asymmetry: acceleration always lasted less than deceleration, whatever the starting position. Table 3 shows that, on average, acceleration represented only 42% of the whole movement time. However, the TPV parameter was dependent on the initial posture. Specifically, the TPV for downward movements was larger than the TPV for upward movements [TPV = 0.41 vs. 0.44, ANOVA, *F*(4,15) = 3.6; *P* < 0.01, and post hoc analysis]. The peak of velocity also depended on the initial posture and varied from 0.81 m/s for P2 to 1.79 m/s for P4. The ratio *V*_{mean} to *V*_{peak} ranged between 1.8 and 2.1 (mean 1.97 ± 0.06), indicating quite narrow velocity profiles in general (for comparison, the value predicted by the minimum hand jerk model is 1.875).

### Comparisons with Models

#### Analysis of the movement consistency.

A preliminary inspection of Fig. 6 shows that the two cost functions predicted highly different hand paths on average. Considering deterministic (Fig. 6*A*) or stochastic (Fig. 6*B*) optimal control frameworks seemed to cause only slight changes on the mean reaching behaviors. A quick overview on these results suggests that the energy/smoothness cost performed better than the effort one. Qualitatively, the effort cost predicted geometric paths that were clearly incompatible with the typical experimental data (see Fig. 3*A*). Similar observations held regarding the angular displacements. Figure 7*A* depicts the average joint displacements across subjects for the four models. It shows that the energy/smoothness model (dark gray solid and dashed) and the experimental data (black solid lines) traces were globally superimposed, except maybe for posture P4 at the elbow joint. The matching was even slightly better for the stochastic model in some cases. For instance, a better fitting of the experimental data is apparent for condition P5. Larger differences were visible for the effort model (light gray traces), in agreement with the above description about hand paths. The finger velocity profiles (Fig. 7*B*) were smooth and bell-shaped for all conditions and models. A slight but constant discrepancy between the models prediction and the recorded data was nevertheless present. Indeed, the deceleration phase was always longer in the real data compared with the simulated ones. This longer deceleration might be attributed to factors not modeled here, such as the risk aversive behavior of subjects regarding the possibility of hitting the (rigid) bar with the fingertip. The minimum hand jerk model, which is often considered as the best model to reproduce the time-course of the end effector in humans, would also suffer from the same discrepancy.

To quantify these observations, a specific analysis of some task-relevant parameters was performed for each starting posture separately (see Fig. 8). The most basic task parameter was the relative reached point on the bar (RP; Fig. 8*A*). The energy/smoothness model performed globally well with 8 cm of error on average for the deterministic case and <6 cm for its stochastic counterpart. This error was reasonable with respect to the SD measured at the end point during the real experiment (∼4.5 cm). The effort model was quite discrepant with the experimental data with ∼23 cm and 19 cm of error for the deterministic and stochastic cases, respectively. In terms of the cumulative error *d*, the two effort-based models yielded errors more than six times larger than the energy/smoothness ones. This result was confirmed by an analysis of the MV, describing the pointing direction (see Fig. 8*B*). The energy/smoothness models replicated very well the sequence of MV for all initial postures (*r* = 0.99 with an error of 4° on average for MV). In contrast, the effort models were quite discrepant with the real data: the error on the MV parameter was >25° on average (*d* > 3,000) and the relationship between pointing directions and starting postures was poorly reproduced (*r* < 0.6).

Concerning the shape of the paths (sIPC parameter; Fig. 8*C*), the change of curvature with respect to the five starting postures was relatively well predicted by the effort models (*r* = 0.75) but much better by the energy/smoothness models (*r* = 0.98). Moreover, the error was much smaller for the energy/smoothness models (*d* ≤ 0.001 vs. *d* ≥ 0.007). In fact the effort models tended to predict quite straight trajectories, which was not in agreement with the high curvatures observed when starting from P1 and P5 for instance. Overall, only small quantitative differences were observed between the stochastic and deterministic cases. In other words, the average simulated behaviors were not strongly affected by the different types of noise we tested (localization/planning or motor).

The joint coupling analysis (Fig. 8*D*) revealed that the deterministic models replicated well the trend of the experimental observations (*r* = 0.94), with a decrease for their stochastic counterparts (*r* ≤ 0.75). The poor joint covariation measured for P1 and P4 was generally accounted for by the models, except for the stochastic effort model that predicted high joint coupling irrespective of the condition and, therefore, was quite discrepant with the data (*d* > 0.13 vs. *d* < 0.05 for the other models). The energy/smoothness models tended to overevaluate the decrease of joint coupling for P1 and P4, because the optimal movements resulted in mainly rotating the elbow for P1 and the shoulder for P4, while keeping the other joint almost frozen. This strategy was produced by some subjects in practice. For instance, several subjects did use a single-joint rotation of the elbow to reach the bar when starting from P1 (8/20 subjects rotated the shoulder <10° and, for every subject, the elbow rotated four times more than the shoulder). Above all, the analysis shows that the composite energy/smoothness cost accounts well for the average spatio-temporal movement features of 20 subjects and, in particular, performs significantly better than the effort model.

#### Analysis of the endpoint distributions.

We now focus on the structure of the intertrial variability that we described in *Consistency and variability*. The main finding was that the SD of the endpoints was significantly greater on the vertical axis than on the anteroposterior one. Our control experiments demonstrated that the endpoint distributions were strongly linked to the target redundancy, instead of the absence of on-line visual feedback and the use of a memorized target. The typical confidence ellipses predicted by the different models are given in Fig. 6. Deterministic models, even though taking into account uncertainty about the initial parameters of the task, do not seem to be able to explain the particular structure of the endpoint variance. Qualitatively, both optimal feedback control models accounted well for the vertical orientation of confidence ellipses, irrespective of the cost they optimized (state-dependent or control-dependent). When motor noise was integrated to the model, the simulated trial-to-trial variability seemed to better match the experimental data. This result agrees with the idea of optimal feedback control that predicts to let motor noise deviate the hand along the bar direction but to correct deviations in the anteroposterior/task-relevant axis (cf. the minimal intervention principle, Todorov and Jordan 2002).

A quantitative analysis of variability is reported in Fig. 9. The total variance (Fig. 9*A*) was generally underestimated by the models based on deterministic optimal control (∼8 compared with 20 cm^{2} for the experimental data). Larger values (∼14 cm^{2}) were obtained when modeling motor noise, leading to a slight decrease of the global error for stochastic models (see the *d* parameter). Interestingly, the variation of the total variance with respect to the starting posture was well reproduced by deterministic models (*r* > 0.7) whereas stochastic models failed quite clearly in this respect (*r* < 0.05). Thus the experimental total variance seems caused by the combination of localization/planning and execution noise. Interestingly, we observed that the total variance varied according to the curvilinear distance of the end effector (*r* = 0.82). In other words, noise also accumulates during execution; the farther the distance, the larger the total variance. Inspection of the aspect ratio confirms this observation and gives additional insights. Figure 9*B* shows that the aspect ratio of endpoint ellipses was globally overestimated by all the models and ellipses were generally thinner. Interestingly, aspect ratio values were actually in agreement with the control experiment with eyes open (see Fig. 5*C*, *right*). Precisely, Fig. 5*D* shows that the aspect ratio is ∼6 (arbitrary units) when on-line vision guided the movement, which agrees quite accurately with the prediction of stochastic models (Fig. 9*B*). Moreover, focusing on the variability in the vertical axis (consistency index CI, not reported in Fig. 9), we observed that stochastic models with control-dependent noise predicted better the data (on average, CI was 4.8% compared with 5.3% for the experimental data; the deterministic models predicted ∼3% of the reachable region). Therefore, the endpoint SDs (computed across trials) was well accounted for by both stochastic models. Note nevertheless that the CI parameters appeared to be negatively correlated with starting postures as indicated by negative correlation coefficients for the optimal feedback control models (*r* <− 0.4; in contrast, deterministic based models yielded positive correlation coefficients, *r* > 0.7).

Importantly, Fig. 9*C* shows that modeling motor noise is sufficient to replicate the vertical orientation of confidence ellipses. Stochastic models reproduced remarkably well this salient behavioral feature. On average, the effort and energy/smoothness models predicted confidence ellipses whose main axis angle was equal to 89 ± 2° and 93 ± 3°, respectively (compared to 92 ± 6° for the real data, which corresponds approximately to a vertical orientation). Predictions were stable across trials and subjects. Modeling motor noise led to a dramatic decrease of the cumulative squared error (the *d* parameter decreased by 94%), whereas the ellipses predicted by deterministic models appeared to be quite randomly oriented, without any clear underlying rule (116 ± 32° and 96 ± 26° for the effort and energy/smoothness models). In several cases, ellipses for deterministic models were clearly not oriented along the vertical axis, reflecting mainly localization/planning errors and not task-specific variability.

Finally, a global statistical analysis taking into account both the mean and the variance of the endpoints was performed through negative log-likelihood estimations (see Table 4). This confirms that deterministic models provided a poor statistical description of the experimental endpoints, in contrast to stochastic models that drastically decreased the negative log-likelihood. Using the state-dependent cost function decreased the negative log-likelihood by 70%, as a consequence of a greater mean endpoint prediction.

In summary, the specific trade-off between variability and consistency depicted in Fig. 3*A* is compatible with an optimal feedback control scheme essentially based on a certain state-dependent cost function (energy/smoothness) along with signal-dependent motor noise.

## DISCUSSION

In this study, we analyzed the behavior of subjects when emphasis was put on target redundancy. Our results demonstrate that, when pointing to a very long bar, subjects did not use all possible solutions but restricted themselves to a specific choice of preferred hand trajectories and endpoint regions. The average behavior of subjects suggested that goal and movement selection jointly relied on a biomechanically sensed mixture of mechanical energy and joint smoothness costs rather than a simple effort cost related to the amount of motor command. Furthermore, the nonnegligible and well-structured variability of the endpoints was compatible with the use of the minimal intervention principle during movement execution, as revealed by stochastic optimal feedback control simulations. We discuss below the possible origins of this particular trade-off between consistency (relative to the freedom offered by the task) and variability (relative to similar point-to-point reaches) and its link with the manifold reaching paradigm.

### Target Redundancy and Kinematic Invariance

An important result is that most part of the subjects produced stereotypical trajectories to the bar as a function of the starting posture despite the lack of a precise final point to achieve. The random use of any possible solution would have resulted in a very large variability across trials, for instance in the movement direction for a given starting posture. The use of a virtual target to resolve the target selection issue once and for all would have resulted in the independence of the endpoint with respect to the starting posture. Relatively small regions (given the full freedom provided by the target bar) were instead reached selectively depending on the initial posture, suggesting that internal rules guided the choice of finger endpoint and lead to those reproducible hand trajectories. Our results raise a fundamental question: why did subjects adopt such particular arm trajectories trial after trial even though the task was compatible with larger motor variability or independence of the starting point?

As already proposed for point-to-point movements, recorded reliable hand trajectories may be the result of optimal control processes, simultaneously yielding a specific final point as well as the associated arm trajectory to it (see Todorov 2004 for a review). Indeed, varying significantly the endpoint across trials or, in contrast, constantly reaching the same point whatever the starting posture, would necessarily be nonoptimal and more costly strategies on average. In contrast, our data confirmed that the endpoints (and thus the final arm postures), in addition to the hand paths, were accurately predicted by a mixed cost based on minimum energy expenditure (measured as the absolute work of torques, see Berret et al. 2008) and maximum smoothness (measured as the negative integral of the squared acceleration, see Ben-Itzhak and Karniel 2008). This finding is in line with previous studies which pointed out independently the importance of such variables in motor planning (Soechting et al. 1995; Flash and Hogan 1985). It is worth noting that energy refers here to mechanical energy instead of control energy as in the terms of signal theory, which represents the amount of control signal used to generate a movement (that is referred here to as effort). Our study suggests the minor role of such a neural-level cost in planning a reaching movement toward a manifold while it has been shown to be appropriate to derive realistic motor behaviors in classical point-to-point paradigms with intrinsic redundancy (Guigon et al. 2007). Here, minimizing the control energy frequently resulted in predicted movements with large directional error. In contrast, minimization of costs related to the state of the musculoskeletal system provided an accurate model of the recorded arm trajectories, explaining the choice of an endpoint and the way to reach it in a single framework.

Interestingly, optimal control for manifold reaching may also be seen as a decision-making process, as many targets on the bar without discriminability can be equivalently reached. In fact, there is now a lot of evidence that the CNS performs optimal choices in decision-making contexts (including planning), based on the prediction of how “good” for the system will be the selected action, i.e., how well it maximizes reward or utility functions (see Trommershäuser et al. 2008 and Körding 2007 for reviews). When pointing to a bar, one may argue that there is no reason or value that motivates the participant to choose one action over another. However, we argue that in this context energy and smoothness may represent the utility functions that the CNS would select and that would provide the most beneficial choice in face of many possible outcomes. In most studies on decision making, the reward was explicit (e.g., monetary or food incentives). Here, in contrast, the reward corresponding to the vertical position on the bar is necessarily internal. If the goodness of an action is actually measured by some internal cost or loss functions, then the choice of the endpoint becomes critical with respect to the internal reward defined by the motor system and, thus, we propose that this choice may be guided by optimality processes.

From a more general point of view, the typical movements recorded here whereas manifold reaching offers wide freedom may reflect the natural tendency of any biological systems to resist disorder and converge to desirable stable states (Friston 2010). Presently, by minimizing body fatigue (i.e., by reducing the mechanical energy expenditure) and preserving the musculoskeletal system from self-injury (i.e., by maximizing joint smoothness), these two functions could significantly contribute to the homeostatic processes that maintain the motor system close to its nominal state (Bernard 1878; Cannon 1932). By extension, we speculate that the recorded trajectories represent a prior expectation, which is a primary repertoire of valuable states/trajectories thought to come from evolutionary, hereditary, and learning processes. This idea is actually formalized in the frame of the free-energy principle (Friston et al. 2010) suggesting that this prior would prescribe a small number of attractive states through sensory predictions and that, in turn, would drive movements.

How such a manifold reaching behavior may be implemented at the neural level remains quite unknown, even though the neural basis of optimal feedback control for point-to-point movements has been reviewed by Scott (2004). The internal reward associated with almost-optimal manifold reaching strategies could be reflected by the activation of dopaminergic systems, which are known to be important during motor planning (e.g., bradykinesia in Parkisonian patients, see Mazzoni et al. 2007). Furthermore, Cisek and Kalaska (2002) reported evidence that parallel simulations of different possible movements are processed in neurons of the dorsal premotor cortex of monkeys when a couple of targets were presented and that the “best” solution is subsequently selected and executed. This cortical area may represent the neural substrates for such a near optimal action decision in a task characterized by endpoint indeterminacy. Indeed, a target manifold can be viewed as a continuum of possible targets without saliency. Thus, here, eyes and arm movements are not driven by a saliency map, initially proposed to be located in the lateral intraparietal area (LIP; Bisley and Goldberg 2010) and that would guide attention primarily based on bottom-up inputs. Rather, in the present case, decision making would rely on a priority map where endogenous state and top-down influences play a major role in the selection of final finger location. In this view, a top-down enhancement would be due to the increase in behavioral relevance driven by the relationship between the desired arm trajectory and the relative reward linked with it. This possibility agrees with the finding of LIP neurons response to expected value (Sugrue et al. 2004) or subjective desirability (Dorris and Glimcher 2004), as well as modulation of LIP activity to prior information on the type of action to be performed (Calton et al. 2002; Cui and Andersen 2007). In other words, the expected outcome would evoke attention toward a virtual target depending on each starting position, in contrast to the classical reverse process where one salient target attracts attention. Therefore, we may hypothesize that when reaching without a specific spatial endpoint, the CNS would plan the action by evaluating the expected reward associated with a particular trajectory. In this case, action would be also motivated by intrinsic instead of only extrinsic cues. The dorsal striatum has been identified as a structure able to shift the guidance strategy (from extrinsic to intrinsic) in rats after a training period in a maze (Packard and McGaugh 1996). Further, the prefrontal cortex also contributes to the coordination of memory strategies by integrating the predictive relationships among stimuli, actions, and reward (Rich and Shapiro 2009). In the present reaching to a bar task, one may further speculate that cortex-basal ganglia network (see Doya 2008 for a review) would shape the decision/planning process to satisfy desirable physiological values (energy and smoothness).

Besides stochastic optimal control and decision theories, a number of alternative theories have been proposed in the field of motor control, but whether these theories generalize to manifold reaching is not obvious. In particular, the equilibrium point hypothesis (Feldman 1966; Bizzi et al. 1992), the dynamical field theory (Erlhagen and Schöner 2002), and the vector-integration-to-endpoint (VITE) model (Bullock and Grossberg 1988) were initially designed to account for the neurophysiological processes underlying movement generation when a target point is presented to a subject. The specification of a particular target point is a crucial requirement in all the above-mentioned models usually focusing on the excess of intrinsic degrees of freedom necessary to perform a task. The equilibrium point theory fundamentally relies on the specification of a target point prior to movement planning to define a virtual trajectory toward the goal. Adapting the theory to deal with virtual trajectories toward an equilibrium manifold has not been considered yet. It would be possible to apply such models to the manifold reaching paradigm by selecting the endpoint before the movement starts, but this process may be elusive. Models from the dynamical field theory (e.g., Martin et al. 2009) suffer from the same drawback with respect to target redundancy, except if one particular endpoint can be chosen a priori during movement preparation. The VITE and similar vector planning models also require the endpoint specification to build the MV. Therefore, many models of motor control, including point-to-point optimal control models actually, might easily generalize to manifold reaching if the endpoint can be chosen in advance. The problem is then shifted to the control of gaze direction to determine the endpoint on the manifold. Importantly, optimal control with a target manifold makes no prior assumption about the final point to be reached, the latter being simply determined to minimize the expected movement cost. To better understand the influence of gaze, we performed an additional control experiment by asking three subjects to fix the gaze direction toward a point located 10 cm above the upper limit of the reachable region on the bar. It was found that the endpoint positions were highly correlated to the nominal ones (computed across starting postures, *r*^{2} = 0.99) but with a constant bias toward the fixation point (1.5% of arm length, i.e., <1.5 cm). This confirms that gaze direction most likely depends on an expected movement cost and does not predominate for the endpoint selection itself. To what extent the endpoint was explicitly known by the subjects before movement initiation remains an open question. Psychophysical experiments (e.g., measuring reaction times) or neural recordings in animals (e.g., in posterior parietal cortex, premotor and motor areas) during similar manifold reaching tasks might provide interesting insights.

### Target Redundancy and Kinematic Variability

Besides the above described consistency of subjects' behavior, we nevertheless reported larger variability across trials compared with classical point-to-point movements (see the control experiment results). Importantly here, the main part of the variance was distributed along the bar. This may be in agreement with the uncontrolled manifold hypothesis, which states that larger variance should be observed along the uncontrolled dimensions of the task (UCM; Scholz and Schöner 1999), but the UCM refers to variance analysis in joint space. Because variability analysis is coordinate dependent (see Sternad et al. 2010), it is unclear whether the UCM hypothesis is verified in the present case. Nevertheless, the bar defines a 1-D manifold in the 2-D joint space and, theoretically, adopting different final postures within this subspace does not affect the task achievement. Only atypical subjects seemed to randomly choose joint postures lying on the uncontrolled manifold. For the remaining subjects (18/20), the variance of the endpoints was much smaller and final postures were restricted to specific regions. It turned out that the shape and size of the endpoint distributions were accurately predicted when modeling motor noise. By using stochastic optimal feedback control, we showed that extrinsic freedom provided by the manifold was exploited within trials through the minimal intervention principle (Todorov and Jordan 2002). Our results demonstrated empirically that deterministic movement execution with localization/planning noise could not fully replicate the observed task-dependent variability and that exploiting motor noise via stochastic optimal feedback control played a crucial role in presence of target redundancy.

The effectiveness of modeling signal-dependent noise and exploiting principles from stochastic optimal control is in agreement with several other studies of motor control (e.g., Harris and Wolpert 1998). Indeed, the sensorimotor system is inherently noisy (Faisal et al. 2008) and the nervous system is likely to have found some reliable and near-optimal solutions to deal with uncertainty (e.g., Burdet et al. 2001). In particular, our results illustrate how the nervous system may ideally combine the freedom given by the target manifold with motor noise and only corrects the errors when they affect the goal of the task. Remarkably, this optimal feedback strategy was observed even in the absence of visual feedback, thus testifying to the ability of the CNS to estimate whether motor errors affect or not the goal through proprioceptive feedback. Similar observations have been previously reported during locomotor path planning (Pham and Hicheur 2009). At last, it is worth noting that another source of variability could reflect the fact that the brain has no direct measures of energetic and smoothness costs. Here, we assumed a full knowledge of the body states while they are hidden in practice, so that these cost values can only be inferred by the CNS. Therefore, the valuable states whose existence was hypothesized above could only be reached on average and using efficient predictive and sensorimotor processes.

It is likely that the effects of noise and uncertainty were increased by the target redundancy. Several authors previously suggested or demonstrated that adding degrees of freedom on the target could be insightful in better understanding motor control. However, manifold reaching has not been explicitly recognized as a part of a generic class of motor control paradigms. For example, target lines were already used in simulation to illustrate the minimal intervention principle for a nonbiological point-mass system (Todorov and Jordan 2002; Guigon et al. 2008) and to emphasize how variance analyses can be affected by the choice of coordinates (relative vs. absolute joint angles, Sternad et al. 2010). Scholz et al. (2000) used a target disc (i.e., a manifold with boundary) during pistol shooting to test the uncontrolled manifold hypothesis by analyzing the structure of the variance in joint coordinates. Goble et al. (2007) exploited circular targets during a drawing task by instructing subjects to trace straight lines in as many different directions as possible in the horizontal plane. Subjects showed biases to preferred directions corresponding to a tendency to minimize interaction torques. Using rectangular targets of Fitts' law type, another type of manifold with boundary, Knill et al. (2011) demonstrated that the CNS can adapt quickly its feedback control law for individual movements to fulfill task demands. In the same vein, Diedrichsen et al. (2010) studied reaching movements toward a horizontally elongated target to show that, besides error-based learning, a second adaptive process affects movements to make them more similar to the last movement (use-dependent learning). Alternatively, reaching to a cylindrical target using a wood stick was studied by Vetter et al. (2002) to address the problem of final posture selection (transport model vs. Donders' law). In this study, we also used a target manifold but with the objective to simultaneously address the “where” and “how” aspects of voluntary movements, which had been traditionally studied separately in computational studies. The present manifold reaching paradigm gives the possibility to better understand the neurophysiological processes involved in the volitional control of actions (i.e., to decide what action to perform).

To conclude, it is worth noting that a link between the solutions of a particular yet general class of stochastic optimal control problems and the free-energy principle above mentioned has been established recently (see Kappen 2005, Todorov 2009, and Friston 2010). This may allow us to naturally embed the theory of stochastic optimal control in a more unified brain theory, possibly encompassing existing theories of motor control.

## DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the author(s).

## ACKNOWLEDGMENTS

We thank Marco Jacono for technical assistance and Ioannis Delis, Christian Darlot, Olivier White, and Elizabeth Thomas for useful suggestions and comments.

## APPENDIX

### Details on the Two-link Arm and the Target Bar

The two-joint arm model is given by the following equations:
*M* = (*M _{i,j}*)

_{1≤i,j≤2}, similarly for

*F*and

*C*, and the vector

**G**= (

*G*)

_{i}_{1≤i,j≤2}. Then, we have:

In state-space form, we can rewrite the control system as:
*x* = 0.85*L*. In state space it corresponded to the vector-valued mapping **m** given by *Eq. 13*:

- Copyright © 2011 the American Physiological Society