Decoding motor behavior from neuronal signals has important implications for the development of a brain–machine interface (BMI) but also provides insights into the nature of different movement representations within cortical ensembles. Motor control can be hierarchically characterized as the selection and planning of discrete movement classes and/or postures followed by the execution of continuous limb trajectories. Based on simultaneous recordings in primary motor (MI) and dorsal premotor (PMd) cortices in behaving monkeys, we demonstrate that an MI ensemble can reconstruct hand or joint trajectory more accurately than an equally sized PMd ensemble. In contrast, PMd can more precisely predict the future occurrence of one of several discrete targets to be reached. This double dissociation suggests that a general-purpose BMI could take advantage of multiple cortical areas to control a wider variety of motor actions. These results also support the hierarchical view that MI ensembles are involved in lower-level movement execution, whereas PMd populations represent the early intention to move to visually presented targets.
Recent developments in brain–machine interface (BMI) technology have demonstrated how the activity of a cortical ensemble can be used to control the position of an external device in real time. These efforts have mainly focused on the continuous control of a moving object such as a computer cursor or the end effector of a robotic arm (Carmena et al. 2003; Kennedy et al. 2000; Kostov and Polak 2000; Serruya et al. 2002; Taylor et al. 2003, 2002; Wolpaw et al. 1991). This is clearly a useful function for a BMI, given that many forms of motor behavior such as writing, drawing, and reaching require precise trajectory and path control. There are, however, other motor acts that can be viewed as equivalence classes, which may require discrete control to be selected (Buneo et al. 2003; Pesaran et al. 2002; Shenoy et al. 2003). These include movement initiation and termination, typing, and discrete grasp and postural configurations. It would thus seem necessary that a general-purpose BMI would require both forms of control mechanisms. Moreover, different cortical areas may be ideally suited for these different forms of control. Based on dual electrophysiological recordings in primary motor (MI) and dorsal premotor (PMd) cortices, we demonstrate a double dissociation in which ensemble activity in MI more accurately reconstructs continuous movement, whereas PMd ensemble activity can more effectively predict upcoming movements to discrete targets.
Two macaque monkeys (Macaca mulatta) were operantly trained to perform an instructed-delay 8-direction center-out (CO) task and a random-sequence (RS) task by moving a cursor to targets by contralateral arm movements. Both tasks involved projecting the cursor and targets onto a horizontal, reflective surface in front of the monkey above the monkey's hand. The monkey's arm rested on cushioned arm troughs secured to links of a 2-joint robotic arm (KINARM system; Scott 1999) underneath the projection surface. The shoulder joint was abducted 90° such that shoulder and elbow flexion and extension movements were made in the horizontal plane.
The CO task involved movements from a center target to one of 8 peripherally positioned targets (5–7 cm distance). The task consisted of 3 epochs: 1) a 500-ms hold period during which the monkey was required to hold its hand over the center target, 2) a fixed instruction period (600 or 1,000 ms depending on the recording session) during which one of the 8 final targets appeared but the monkey was not allowed to move, and 3) a “go” period during which the target began blinking. This informed the monkey to begin moving to the peripherally positioned target.
In the RS task, a sequence of 7 targets appeared on the projection surface. At any one time, a single target appeared at a random location in the workspace, and the monkey was required to move to it (Fig. 1A). As soon as the cursor reached the target, the target disappeared and a new target appeared in a new, random location. After reaching the 7th target, the monkey was rewarded with a drop of water. During this task the workspace was sampled thoroughly, which was important in developing a hand position decoder that could generalize adequately (Fig. 1B).
Two silicon-based electrode arrays composed of 100 electrodes (1.0 mm electrode length; 400 μm interelectrode separation) were implanted in the arm areas of MI and PMd of each monkey (see Maynard et al. 1999 for more details concerning the electrode array) (Fig. 2A). During a recording session, signals from ≤128 electrodes were amplified (gain, 150) and recorded digitally (14-bit) at 30 kHz per channel using a Cerebus acquisition system (Cyberkinetics, Salt Lake City, UT). Only waveforms (1.6 ms in duration) that crossed a threshold were stored and spike-sorted using Off-line Sorter (Plexon, Dallas, TX). Interspike interval histograms were computed to verify single-unit isolation by ensuring that <0.05% of waveforms possessed an interspike interval <1.6 ms. Figure 2B shows average (±1 SD) waveforms from one representative data set. Figure 2C shows the distribution of signal-to-noise ratios for this data set. Signal-to-noise ratios were defined as the difference in mean peak-to-trough voltage divided by twice the SD. All isolated single units used in this study possessed signal-to-noise ratios (rounded to the nearest whole number) of 3:1 or higher. A total of 10 data sets (7 using the RS and 3 using the CO tasks) for animal B and 4 data sets (2 using the RS and 2 using the CO tasks) for animal R were analyzed, where a data set is defined as all simultaneously recoded neural data collected in one recording session. Each data set contained between 32 and 143 simultaneously recorded units from MI and PMd (Table 1). The data sets used to decode continuous hand position contained equal numbers of units from MI and PMd by selecting the units with the largest signal-to-noise ratios that made both ensembles of equal size. Thus both ensembles consisted of “randomly” selected units from MI and PMd except for a possible bias for neurons with large cell bodies that would generate higher signal-to-noise ratios. All of the surgical and behavioral procedures were approved by the University of Chicago's IACUC and conform to the principles outlined in the Guide for the Care and Use of Laboratory Animals (National Institutes of Health publication no. 86–23, revised 1985).
The shoulder and elbow joint angles were sampled at 500 Hz by the robotic arm's motor encoders and acquired along with the neural data. The x and y positions of the hand were computed by transforming the joint angles into Cartesian endpoint positions using the forward kinematics equations for a 2-joint arm with the measured lengths of the upper arm and forearm of the monkey.
Reconstructions of Cartesian x and y endpoint positions were computed from the neural data using linear and nonlinear regression methods. The endpoint position at time t (pt) was predicted from the neural responses (rt−in) of multiple neurons (n) at multiple time points (t−i) in the past. In the linear case, the optimal set of linear filter coefficients (fin) was found that minimized the sum squared difference between the predicted position (p̂t) and the actual position using a closed-form solution (Warland et al. 1997) (1) where N is the total number of simultaneously recorded neurons and L is the filter length in number of bins. The neural response of an individual neuron at one time point was computed as the number of spikes evoked in a time bin of length B. The 3 parameters, N, LT (filter length in time, LT = L*B), and B, were systematically varied in this study: N ranging from 1 to 44 neurons, LT ranging from 0.2 to 1.4 s, and B ranging from 40 to 200 ms. As an example, a filter length of 1 s with a bin of 50 ms resulted in a filter composed of 20 bins. The performance of the linear model on each data set was estimated using 10-fold cross-validation, wherein the data set was randomly partitioned into 10 “folds,” or equally sized disjoint subsets, and for each fold, a model was built leaving that fold out as testing data and using the remaining 9 folds as training data. Because the division into folds is random, the 10-fold cross validation procedure was repeated 10 times to better estimate the performance of the model.
In the nonlinear case, a 2-layer neural network was implemented using the Matlab Neural Network Toolbox. The input layer consisted of N × L input nodes representing the responses of multiple neurons at multiple time points. Sigmoidal activation functions were used between the input and hidden layer. Linear activation functions were used between the hidden and output layer, which consisted of one node representing the predicted position at time t, p̂t. The number of hidden nodes, the learning algorithm, and the learning rate were selected using cross-validation techniques on one data set and then were used in the networks built for 4 other data sets, 2 from each animal. Ten hidden units, a learning rate of 0.01, and the scaled conjugate gradient (gradient descent) learning algorithm were selected as network parameters. Each data set was randomly partitioned into a training set (80% of the data), a validation set (10% of the data), and a test set (the remaining 10%), and the weights of each network were then trained so that the sum squared difference between the actual and predicted positions for the training set was minimized (although a global minimum was not necessarily reached). To prevent overfitting, the sum squared error for the validation set was computed after each training epoch, and training was stopped when the value began to increase. The test set was then used to estimate the generalization performance of the trained network. Because both the initialization of network weights and the division into training, test, and validation sets are random, the training and testing procedure was performed 10 times for each network.
Parametric and nonparametric classifiers were developed to assign individual trial responses to one of the 8 target/direction classes in the CO task. In the parametric case, a probability distribution model conditioned on each target/direction class was estimated for the response of a single neuron. Both Poisson and Gaussian distributions were used but the Poisson model generally performed better than the Gaussian, and so results from the Gaussian model are not presented. For the Poisson model, a mean (μ) parameter was estimated based on the sample mean response. Assuming conditional independence among the neurons, a joint probability distribution for the combined responses over all neurons was computed by multiplying the single-neuron distributions for each direction. Conditional independence is an approximation. However, spike count correlation coefficients were generally small between pairs of neurons during a 400-ms window in the instruction period. Average pairwise correlation coefficients (averaged over all pairs and directions in a data set) ranged from 0.002 to 0.02 for all but one data set, which had slightly higher but still small values ranging from 0.05 to 0.11. A maximum-likelihood classification rule was used such that a trial response was assigned to the target/direction class that maximized the likelihood of observing the response. Accuracy of the classifiers was estimated using leave-one-out cross-validation, wherein the class of each trial was predicted from a classifier built using all other trials, and the percentage of trials that were classified correctly is then computed to estimate the model performance.
A nonparametric 2-layer neural network similar to that described above (with sigmoidal activation functions at the output layer instead of a linear activation function) was used to classify individual trial responses into one of 8 target/direction categories. The output layer of the network possessed 8 nodes, one for each category. The training data used for the output layer consisted of an 8-dimensional vector with a value of 1 for the correct category and −0 for the other categories. During testing, the trial response was assigned to the category whose node was maximally activated. Results using the neural network on one data set demonstrated a similar improved performance using PMd versus MI and thus are not reported.
We systematically varied the bin size B and the filter length LT, used to build the linear filter decoder to find the optimal set of parameters (Fig. 3). Although there were slight variations with bin size as well as variations depending on the decoded parameter and the cortical area used, we found that performance depended primarily on filter length. A filter length of 1 s was near optimal in terms of accounting for the largest percentage of the hand position variance and was used for all reported analyses.
We then compared the performance of a hand position decoder in reconstructing the x and y trajectories of the endpoint from two equally sized ensembles of MI and PMd neurons. A linear filter decoder reconstructed the current hand position using spike counts measured in 50-ms bins extending back 1 s in the past. The decoder was built using a random sample of 90% of successful trials and tested on the remaining 10%. Figure 4 shows the x and y trajectories for a particular 20-s test sample along with the neural reconstructions from the 2 ensembles. The MI ensemble reconstructs the trajectories more accurately, particularly for large deviations of the hand. To quantify this difference, we computed the mean (average over ten 10-fold cross-validation runs) percentage of total hand position variance accounted for (r2) on the test trials by each of the ensembles over 9 data sets (Fig. 5, A and B). MI ensembles (47 and 50% for x and y positions, respectively) consistently accounted for a larger percentage of the variance than did the PMd ensembles (difference of, 31% and 27% for x and y positions, respectively) with an average improvement of 50%, and 82% for x and y trajectories, respectively. The r2 using the MI ensemble was significantly higher than that of an equally sized PMd ensemble for all 9 data sets (P < 0.01, one-tailed t-test). A wide range of filter shapes and sizes associated with each neuron were observed. However, the absolute filter values averaged over all neurons indicate that the filter coefficients were generally weaker at time points further in the past (Fig. 5, A and B, insets).
Similar decoding results were observed using the nonlinear neural network on 4 of the data sets (Fig. 5C). A thorough analysis of nonlinear decoding algorithms (by searching all network parameters as well as by using parametric models) is beyond the scope of this study. Therefore we cannot conclude that the mapping between neural activity and hand position is essentially linear.
To examine the effect of ensemble size on decoding accuracy, we used a neuron-dropping procedure in which subsets of neurons were randomly sampled and used to build the decoder and cross-validated on 10% of the trials. Figure 6A (left and middle panels) show scatterplots of reconstruction performance values for one data set (measured in terms of r2) using N unique subsets for each ensemble size, where N is the total number of recorded neurons. We termed these subsets “unique subsets” because they were chosen so that no two subsets of size N were identical. That is, if AiK is the ith subset of K neurons, then at least one neuron that is a member of AiK is not a member of AjK, whenever i ≠ j. Hyperbolic functions [f(x) = ax/(1 + ax)] (Wessberg et al. 2000) were fit to the mean reconstruction performance (Fig. 6A, third panels) values to extrapolate to near-perfect performance (Fig. 6A, fourth panels). Based on all 9 data sets, we predict that an ensemble size of 256 ± 38 (mean ± SE) in MI could account for 90% of the x hand position variance as compared with 438 ± 32 neurons in PMd. For the y hand position, the difference was even larger, with 222 ± 53 neurons in MI versus 710 ± 168 neurons in PMd accounting for 90% of the variance.
Despite the monotonic improvement in mean reconstruction performance with ensemble size, the scatterplots in Fig. 6A indicate a highly uneven distribution of performance, suggesting that certain neurons or groups of neurons perform much better than others. For example, the 2 neurons surrounded by the oval in the lower left-hand panel accounted for over 20% of the y position variance individually despite the average performance of approximately 5% for the remaining individual neurons. These high-performing “outlier” neurons were observed in all data sets in both monkeys. The high-performing neuron ensembles that appear above the dashed line in Fig. 6A (lower left-hand panel) always contained one of both of the two “outlier” neurons in this data set.
Hyperbolic functions have been shown previously to fit the monotonically increasing performance with ensemble size (Wessberg et al. 2000). However, there were clear deviations from a hyperbolic fit for the one data set that contained many more units in both cortical areas (r031206). Figure 6B shows the decoding performance as a function of ensemble size for this data set. The best-fit hyperbolic functions (third panels in Fig. 6B) overestimated the actual performance of the both cortical areas when the ensemble size was large. The fourth panels show the best-fit logarithmic function to the mean performance. Using this function for this data set to extrapolate in MI suggests that 402 neurons (averaged over x and y) would be required to account for 90% of the hand position variance as opposed to 204 neurons based on the hyperbolic fit. In PMd, 970 neurons would be required as opposed to 331 neurons using the best-fit hyperbolic function.
As previous studies have shown, neurons in PMd exhibit robust activity related to movement planning. By aligning perievent histograms on the onset of the instruction signal in the instructed-delay CO task, we often observed transient “signal-related” activity lasting about 300–400 ms whose magnitude varied with target direction (Fig. 7A, top) (Weinrich and Wise 1982). This early instruction-related activity was generally either weakly present or entirely absent in MI (Fig. 7A, bottom). To quantify this difference in “signal-related” activity, Fig. 7B compares the firing rate modulation (i.e., percentage change relative to the baseline firing rate in the first 100 ms after the instruction signal onset) averaged over the ensemble of recorded PMd neurons (solid line) and simultaneously recorded MI neurons (dashed line) for one data set.
We compared the ability of equally sized MI and PMd ensembles to predict target/direction during the instruction delay period before movement onset. We developed a probabilistic model of the ensemble activity conditioned on each of the target directions and then used a maximum-likelihood rule to classify single trials of data into one of the 8 target/direction categories. Figure 8 plots the classification performance of both ensembles using spike counts measured in a sliding 200-ms window incremented in 10-ms steps. In contrast to continuous reconstruction, the PMd ensemble attained higher decoding performance compared to that of the MI ensemble. This was systematically examined over all data sets by using 20 random subsets of equally sized (20 neurons) ensembles of MI and PMd neurons and comparing their classification performance using a single 400-ms window of data beginning 100 ms after the onset of the instruction signal (Fig. 9A). The percentage of correctly classified trials using PMd was significantly higher than that of MI for all data sets (P < 0.01, one-tailed t-test). Classification performance was also compared using a single 1,000-ms window of data beginning at the instruction signal onset for the 4 data sets (2 data sets per animal) with a 1,000-ms instruction period (Fig. 9B). PMd ensembles continued to classify target/direction significantly better than equally sized MI ensembles, although the differential performance was not as strong.
This performance difference was also evident by examining the effect of ensemble size on classification (Fig. 10). Despite considerable scatter in performance over different unique subsets of neurons (Fig. 10, left and middle panels), the average performance increased monotonically as in the continuous decoding case. However, instead of a hyperbolic function the effect of ensemble size on classification performance was fit well with a power function (PCC = a × Nb, where PCC is the percentage of correctly classified trials and N is the number of neurons in the ensemble) (Fig. 10, right panel). The mean exponent b, from the best-fit power functions, was 0.23 for MI versus 0.30 for PMd. Extrapolation using this power relationship suggested that an ensemble of 418 ± 61 (mean ± SE) neurons in PMd would be necessary to attain 90% correct classification performance as compared with 32,858 ± 30,247 neurons in MI. The large SE for MI is attributable to an outlier data set (b1030314), which indicated that 153,799 neurons would be required to attain 90% correct classification performance. After removing this outlier data set, however, an ensemble size of 2,623 ± 1,074 in MI would be required to attain 90% performance as compared with 425 ± 69 in PMd.
We have provided evidence for a double dissociation in function between ensembles of MI and PMd neurons such that primary motor cortex activity is more suitable for reconstructing a dynamically varying endpoint, whereas premotor cortex is more effective in predicting discrete visual targets to be reached. Motor control can be characterized hierarchically as the selection and planning of discrete movement types or goals followed by the execution of dynamic, time-varying motor output. The results of this study are consistent with the prevailing view that PMd neurons represent the early selection of movements to visual targets, whereas MI neurons encode lower-level aspects of movement execution. These previous studies, however, have focused on the encoding problem characterizing how motor behavior affects the firing properties of single neurons. The current study extends this view by examining the inverse problem of decoding motor behavior from single trials of simultaneously recorded ensembles in MI and PMd. This is the perspective the nervous system is faced with when selecting motor actions and controlling the peripheral motor apparatus. Schwartz and colleagues have recently observed a double dissociation in function between MI and ventral premotor cortex (PMv) that is similar in spirit to our findings (Schwartz et al. 2004). By changing the gain relating the motion of the arm to the visually observed cursor controlled by the arm, they observed that a population of MI neurons reconstructed the actual movement of the arm more accurately, whereas a population of PMv neurons more accurately reconstructed the visual trajectory of the cursor.
Our findings, however, are inconsistent with previous data from Nicolelis and colleagues (Wessberg et al. 2000) who indicated that PMd ensembles could reconstruct hand position more accurately than MI. The reason for this discrepancy is unclear. There are at least 2 possible differences in experimental protocol that may explain the inconsistency. First, they used a different species of monkey (owl monkey vs. rhesus macaque that we used) in their study. Second, the task they used to quantify the functional differences between MI and PMd was one-dimensional, in which the monkey moved a lever either to the left or right. Recently, however, the Nicolelis group published a study using rhesus macaques that performed a 2-D task very similar to our random-sequence task (Carmena et al. 2003). In that study, they observed that hand position reconstructions were more accurate using neuronal ensembles in MI versus PMd. Their results were similar to ours, indicating approximately 70% improvement using an MI ensemble as compared with an equally sized ensemble in PMd.
At first glance, the shapes of the optimal decoding filters generated in the random-sequence task for PMd neurons appear to be counterintuitive. Although there was considerable variation in linear filter shape across individual neurons in both MI and PMd, the absolute filter coefficient magnitude tended to decrease from the immediate past to the distant past for both areas (see insets in Fig. 5, A and B). It might have been predicted that the filter magnitudes associated with PMd neurons would be larger for time points in the distant past. The observed filter shapes suggest that PMd activity in the intermediate and distant past does not correlate well with the current kinematics of the hand that are being decoded in the random-sequence task but rather may carry information about the target to be reached as in the center-out task. This is consistent with previous data showing that PMd neuronal activity is generally not related to the on-line control of arm movements as MI is (Crammond and Kalaska 1996). We are currently investigating whether PMd activity may be used to predict the position of the current target in the RS task.
The effect of ensemble size on decoding performance demonstrated that less than half the number of neurons in MI would be required to attain ideal performance (i.e., accounting for 90% of the hand position variance) as compared with PMd in the continuous case, whereas a 6th the number of PMd neurons would be required to correctly classify 90% of trials during the preparatory period in the discrete case as compared with MI. These results are based on extrapolation using functions that fit the data. Extrapolation, however, should be viewed with extreme caution because the data may deviate from the best-fit functions (using smaller groups of neurons) as the ensemble sizes grow larger. This was made clear in Fig. 6B in which the hyperbolic function overestimated the continuous decoding performance for the one data set that contained many more neurons. Extrapolation can also amplify small differences, as is evident in the large variability in the predicted number of MI neurons required to attain 90% correct classification in the discrete CO task. In the future, we plan to examine these issues by recording from larger groups of neurons within each cortical area.
The early planning activity to visual targets observed in PMd is similar to findings by Andersen and colleagues who measured early movement “intentional” signals in an area called the parietal reach region (PRR) partially overlapping with the medial intraparietal area (MIP) of the posterior parietal cortex (Batista et al. 1999). This is, perhaps, not surprising given the direct anatomical connections between MIP and dorsal premotor cortex (Matelli and Luppino 2001). A similar probabilistic classifier was shown to accurately predict upcoming movement direction using spiking activity from serially recorded neurons within PRR (Shenoy et al. 2003). There may be differences, however, in the coordinate systems used in each area. Arm movements appear to be encoded in eye-centered coordinates in PRR (Batista et al. 1999). Although eye position appears to influence activity in PMd when monkeys are required to fixate (Boussaoud et al. 1998), the effects are quite small when monkeys are free to change their gaze (Cisek and Kalaska 2002).
Our results have implications for the development of a brain–machine interface. There are discrete control tasks such as selecting from a number of distinct buttons or attaining a particular postural arm or hand configuration that can take advantage of the early movement-selection properties of PMd ensembles. On the other hand, a BMI may also be faced with controlling continuously varying motor variables such as guiding a cursor on a screen, moving a robotic arm in space, or navigating an autonomous robot on the ground, which could be handled using single units from MI. A general-purpose BMI could benefit from the complementary nature of motor and premotor ensemble activity.
This work was funded by the Whitehall Foundation, the Brain Research Foundation, and National Institute of Neurological Disorders and Stroke Grant N01-NS-2-2345.
N. Hatsopoulos has stock ownership in a company, Cyberkinetics, Inc., that is commercializing neural prostheses for severely motor disabled people.
We thank Z. Haga and D. Paulsen for help with the surgical implantation of the arrays, training of monkeys, and data collection. We also thank M. Fellows and E. Gunderson for help with the surgical procedures.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Copyright © 2004 by the American Physiological Society