Journal of Neurophysiology

Learning of Sequences of Finger Movements and Timing: Frontal Lobe and Action-Oriented Representation

Katsuyuki Sakai, Narender Ramnani, Richard E. Passingham

Abstract

Motor sequence learning involves learning of a sequence of effectors with which to execute a series of movements and learning of a sequence of timings at which to execute the movements. In this study, we have segregated the neural correlates of the two learning mechanisms. Moreover, we have found an interaction between the two learning mechanisms in the frontal areas, which we claim as suggesting action-oriented coding in the frontal lobe. We used positron emission tomography and compared three learning conditions with a visuo-motor control condition. In two learning conditions, the subjects learned either a sequence of finger movements with random timing or a sequence of timing with random use of fingers. In the third condition the subjects learned to execute a sequence of specific finger movements at specific timing; we argue that it was only in this condition that the motor sequence was coded as an action-oriented representation. By looking for condition by session interactions (learning vs. control conditions over sessions), we have removed nonspecific time effects and identified areas that showed a learning-related increment of activation during learning. Learning of a finger sequence was associated with an increment of activation in the right intraparietal sulcus region and medial parietal cortex, whereas learning of a timing sequence was associated with an increment of activation in the lateral cerebellum, suggesting separate mechanisms for learning effector and temporal sequences. The left intraparietal sulcus region showed an increment of activation in learning of both finger and timing sequences, suggesting an overlap between the two learning mechanisms. We also found that the mid-dorsolateral prefrontal cortex, together with the medial and lateral premotor areas, became increasingly active when subjects learned a sequence that specified both fingers and timing, that is, when subjects were able to prepare specific motor action. These areas were not active when subjects learned a sequence that specified fingers or timing alone, that is, when subjects were still dependent on external stimuli as to the timing or fingers with which to execute the movements. Frontal areas may integrate the effector and temporal information of a motor sequence and implement an action-oriented representation so as to perform a motor sequence accurately and quickly. We also found that the mid-dorsolateral prefrontal cortex was distinguished from the ventrolateral prefrontal cortex and anterior fronto-polar cortex, which showed sustained activity throughout learning sessions and did not show either an increment or decrement of activation.

INTRODUCTION

Accurate and quick performance of motor action requires both effector information (which effector to use to perform the action) and temporal information (at which timing to perform the action). Thus in motor sequence learning, one needs to learn both a sequence of effectors and a sequence of timings. To date, studies on neural mechanisms of motor sequence learning have been mostly focused on the learning of an effector sequence; subjects learned a sequence of finger movements while the timing of the movements were paced at a constant rate (Doyon et al. 2002; Grafton et al. 1995; Hazeltine et al. 1997; Honda et al. 1998; Jenkins et al. 1994; Jueptner et al. 1997; Sakai et al. 1998a; Toni and Passingham 1999; Toni et al. 1998). Only a few studies have examined the mechanisms to learn a sequence of timing or a rhythm (Ramnani and Passingham 2001; Sakai et al. 2000;Schubotz and von Cramon 2001). Because of the variable behavioral settings used to study learning of effector sequences and of timing sequences, it is hard to determine whether the two learning mechanisms are subserved by separate neural systems or not.

In the first experiment reported here, we investigated the learning mechanisms of a sequence of finger movements and of a sequence of timing in the same experimental setting. An important difference from previous studies (Doyon et al. 2002; Grafton et al. 1995; Hazeltine et al. 1997; Honda et al. 1998; Jenkins et al. 1994; Jueptner et al. 1997; Ramnani and Passingham 2001;Sakai et al. 1998a; Toni and Passingham 1999; Toni et al. 1998) is that we used random timing when testing learning of a finger sequence and used random finger movements when testing learning of a timing sequence. Thus in this study, even when subjects learned a finger sequence, they did not know at which timing to execute the finger movements. When subjects learned a timing sequence, they did not know with which finger to execute the movements. In other words, the subjects were unable to prepare a specific motor action that was defined by specific effector and specific timing. The coding of a motor sequence remained at an abstract level. In contrast, previous studies used constant timing for learning of a finger sequence (Doyon et al. 2002;Grafton et al. 1995; Hazeltine et al. 1997; Honda et al. 1998; Jenkins et al. 1994; Jueptner et al. 1997; Sakai et al. 1998a; Toni and Passingham 1999; Toni et al. 1998) and a fixed finger movements for learning of a timing sequence (Ramnani and Passingham 2001). Therefore subjects became able to prepare specific finger sequence at specific timing in both types of learning. The results could have been confounded by preparation of a motor sequence; it is hard to determine whether the observed brain activation is associated with acquisition of sequence information or an increased level of motor preparation. In fact, a similar set of brain areas including prefrontal and premotor areas has been shown to be active in learning of finger sequences (Doyon et al. 2002; Grafton et al. 1995;Hazeltine et al. 1997; Honda et al. 1998;Jenkins et al. 1994; Jueptner et al. 1997; Sakai et al. 1998a; Toni and Passingham 1999; Toni et al. 1998) and in learning of timing sequences (Ramnani and Passingham 2001; Sakai et al. 2000; Schubotz and von Cramon 2001). Our first experiment is free from the confound of motor preparation.

In the second experiment reported here, we tested a third learning condition where subjects learned a sequence of specific finger movements at specific timing. In this condition, the subjects were able to prepare a sequence of specific finger movements at specific timing. Here the motor sequence could be coded at an action-specific level as opposed to an abstract level in the other two learning conditions mentioned above.

The three learning conditions were compared with a control condition where subjects made random finger movements at random timing according to visual cues. Thus we tested four conditions where subjects learned or did not learn a sequence of effectors or a sequence of timing. This design not only allowed us to investigate differences in the learning mechanisms between sequences of effectors and sequences of timing, but also allowed us to assess the interaction term, that is, differences between the action and abstract levels of sequence coding.

We used positron emission tomography to measure learning-related changes in brain activation. By looking for the change of brain activation relative to the control condition, the nonspecific time effects could be removed. We discriminated areas that showed alearning-related increment of activation, alearning-related decrement of activation, andsustained activation during learning.

METHODS

Subjects

Twenty normal right-handed male subjects (ages 20–33 yr) participated in the imaging experiment (12 in experiment1and 8 in experiment2). Another eight subjects (ages 19–25 yr) participated in behavioral pilot experiments. For all subjects, written informed consent was obtained prior to the study. The experimental procedure was approved by the ethics committee of the National Hospital for Neurology and Neurosurgery and the Institute of Neurology, London, and the Administration of Radiation Safety Advisory Committee, United Kingdom.

Behavioral task

The subjects performed sequential finger movements in response to visual stimuli. A stimulus was an array of two horizontal bars and an asterisk and was presented on a screen for 80 ms (Fig.1 A). Eight arrays of stimuli, with an asterisk appearing at different positions, were presented in a rhythmic sequence. The intervals of the stimulus onsets consisted of four 625-, two 1,250-, and one 2,500-ms intervals. Subjects responded to each stimulus by pressing buttons using the index, middle, or ring finger of the right hand when an asterisk appeared on the left, center, or right, respectively (Fig. 1 A). We asked the subjects to respond to the visual stimuli as accurately and quickly as possible. An eight-stimulus sequence comprised a trial, and the sequences were given for 18 (experiment1) or 12 trials (experiment2) in one session. A visual cue, “start,” indicated the start of each trial of a sequence, and the trials were separated from each other by 2.5 s.

Fig. 1.

A: behavioral task. Subjects saw an array of 2 horizontal bars and 1 asterisk on the screen and responded by pressing a button using the right index, middle, or ring finger to the asterisk shown on the left, middle, or right, respectively. Eight arrays of stimuli were presented in a rhythmic sequence, with the intervals of the stimulus onsets of 625, 1,250, or 2,500 ms. Sequence of stimuli was repeated for 12 or 18 trials in 1 session. By setting the positions of the asterisks in the 8 stimuli fixed or variable across trials, we created a situation where subjects learned or did not learn a finger sequence. Also by setting the timing of presentation of the 8 stimuli fixed or variable across trials, we created a situation where subjects learned or did not learn a timing sequence. B: task conditions. Four conditions (RANDOM, FINGER, TIMING, and COMBINED) were created in a 2 × 2 factorial design, where learning occurred or did not occur in effector domain (finger sequence) or temporal domain (timing sequence). In experiment1, FINGER, TIMING, and RANDOM were tested for 4 sessions each. In experiment2, COMBINED and RANDOM were tested for 6 sessions each.

We created four conditions (RANDOM, FINGER, TIMING, and COMBINED), where learning occurred or did not occur in terms of the order of finger movements or in terms of the timing of finger movements (Fig.1 B). In the RANDOM condition, the order of the position of an asterisk and the timing of the presentation of the eight stimuli in a sequence were varied across trials. Thus the finger to be used and the timing to press buttons remained unpredictable, and no learning occurred. In the FINGER condition, the order of the position of the asterisks for the eight-stimulus sequence was fixed across trials, whereas the timing of stimulus presentation was varied. Thus subjects repeated the same sequence of finger movements with different timing for each trial. Learning occurred only in terms of the order of finger movements. In the TIMING condition, the timing of the presentation of the eight stimuli was fixed across trials, whereas the order of the positions of asterisks was varied between trials. Thus subjects repeated the same sequence of timing using different fingers for each trial. Learning occurred only in terms of the timing of movements. In the COMBINED condition, the position of the asterisks and the timing of the presentation of the eight stimuli were fixed across trials. Subjects repeated the same sequence of finger movements at the same timing. Learning occurred both in terms of the order of finger movements and timing of movements. Thus in the COMBINED condition, the subjects were able to execute specific finger movements at specific timing—i.e., to code a sequence at an action level—whereas in the FINGER and TIMING conditions, the subjects were not able to specify an action because either the timing or finger remained unpredictable—i.e., to code a sequence at an abstract level.

In the three learning conditions (FINGER, TIMING, and COMBINED), we gave explicit instructions to the subjects to learn a finger sequence and/or timing sequence and to make the best use of the knowledge to press buttons as quickly as possible. Thus the learning was explicit in all the three conditions, although implicit learning may have occurred in parallel.

Our aim was to investigate the difference in brain activation between the three learning conditions (FINGER, TIMING, and COMBINED) and the control condition (RANDOM). However, in a behavioral pilot experiment, we found that learning a FINGER sequence hindered learning a COMBINED sequence when the two learning sessions were tested 3 min apart, but this interference was not found if the two conditions were tested on different days. Comparing the within-day and between-day presentations, the three subjects tested showed an increase in the number of wrong button presses (+7%) and an increase in the reaction times (+44 ms) for the within-day presentation. We could not, therefore test the FINGER and COMBINED conditions in the same positron emission tomography (PET) experiment. For the same reason, we could not test the TIMING and COMBINED conditions in the same PET experiment.

By contrast, learning of a FINGER sequence did not have adverse effects on learning of a TIMING sequence and vice versa. In another behavioral pilot experiment, we required five subjects to learn a FINGER sequence, and 3 min later, learn a TIMING sequence. On different days, we also required the same subjects to learn another FINGER sequence, and on the next day, learn another TIMING sequence. Comparing the within-day and between-day presentations, there was no significant difference in terms of accuracy and quickness of the performance: increase in the number of wrong button presses and in the reaction times was −1.2% and +5 ms, respectively, for the within-day presentation compared with between-day presentation. There was no significant difference in the performance when the same subjects were tested on a TIMING sequence first and then on a FINGER sequence: increase in the number of wrong button presses and in the reaction times was +1% and −10 ms, respectively.

Based on these behavioral pilot experiments, we concluded that the learning of FINGER and TIMING sequences did not interfere with each other and can be tested in the same PET experiment. However learning of a COMBINED sequence could not be tested in the same experiment with learning of a FINGER or a TIMING sequence because of interference. We therefore conducted two PET experiments using different set of subjects, and tested the RANDOM, FINGER, and TIMING conditions inexperiment1, and the RANDOM and COMBINED conditions inexperiment2.

Experiment 1

Twelve subjects were scanned while they performed one of three alternating conditions (RANDOM, FINGER, and TIMING), for four sessions each. The order of the three conditions was counter-balanced across subjects. Each session started with a visual cue indicating the name of the condition and lasted for 180 s (18 trials of a sequence). Before the scans, subjects practiced each of the three conditions for 3 min using sequences different from the ones used during the scans. After the scans (12 sessions), the subjects performed each of the three conditions for one more session without scans, using the same sequence as used during the scans. The subjects were also asked to reproduce the learned sequence without visual cues. Subjects reproduced the order of finger movements of the learned FINGER sequence at an arbitrary timing for 10 trials. Accuracy of the reproduced order was assessed by comparing it with the order of the original sequence and calculating the percentage of the correctly performed movements. Subjects also reproduced the timing of button presses of the learned TIMING sequence with the right index finger for 10 trials. The accuracy of the reproduced timing was assessed by comparing the intervals of the eight moves of the reproduced TIMING sequence with the inter-stimulus intervals of the original TIMING sequence. We performed a one-samplet-test over the difference between the reproduced intervals and the original intervals. The null hypothesis is that there is no difference between them.

Experiment 2

Eight subjects were scanned while they performed one of two alternating conditions (RANDOM and COMBINED), for six sessions each. The order of the two conditions was counter-balanced across subjects. Each condition lasted for 120 s (12 trials of a sequence). Thus the total number of trials to learn a sequence (12 trials × 6 sessions) was equated to that in experiment1 (18 trials × 4 sessions). The number of scans given for each condition (4 scans × 12 subjects) was also matched with those inexperiment1 (6 scans × 8 subjects), thus enabling us to compare the results across the two experiments. The subjects had prescan training and postscan sessions in the same way as inexperiment1.

Image data acquisition

The subjects lay supine in the scanner. Head movements were minimized by an adjustable helmet. PET images covering the whole brain were collected using an ECAT Exact HR + PET scanner (CTI Siemens, Knoxville, TN) in a three-dimensional (3D) mode with the inter-detector collimating septa removed. Relative regional cerebral blood flow (rCBF) was measured by recording the regional distribution of cerebral radioactivity using H2 15O as tracer.

For each session, 9 mCi H2 15O in 3 ml saline solution was injected intravenously over 20 s at a rate of 10 ml/min through a cannula placed in the left cubital vein. Subjects started the behavioral task 30 s before each tracer administration. Each PET scan began with a 30-s background period acquired before the delivery of the tracer. Thereafter, emission data were acquired in one 90-s frame, beginning 5 s before the rising phase of the count curve. The interval between successive administrations of the radioactive tracer was 8 min. Correction for radiation attenuation was made by means of a transmission scan collected for the first emission scan. The corrected data were then reconstructed by 3D filtered back projection (Hanning filter, cutoff frequency of 0.5 cycles/pixel) and scatter correction. Sixty-three transverse planes (2.4-mm thickness) were obtained, each with a 128 × 128-pixel image matrix (size 2.1 mm), giving a resolution of 6 mm at full width half-maximum.

We also collected structural magnetic resonance images (MRIs) of all the subjects using a 2-tesla magnetic resonance scanner (Vision, Siemens, Erlagen, Germany) with a T1 MPRAGE sequence (repetition time, 9.5 s; echo time, 4 ms; inversion time, 600 ms; voxel size 1 × 1 × 1.5 mm, 108 axial slices).

Image data analysis

PREPROCESSING.

The data were analyzed with SPM99 (http://www.fil.ion.ucl.ac.uk/spm) in Matlab (Version 5; Mathworks, Sherborn, MA). The PET images were realigned with respect to the first scan of the time series to correct for head movement. Six parameters (3 translations and 3 rotations) were extracted from the rigid body transformation that minimized the difference between each image and the reference image. A mean rCBF image was also computed. Then, for each subject, the structural T1 image was corregistered to the mean rCBF image. The structural image was then spatially normalized into the system of reference ofTalairach and Tournoux (1988) using as template the averaged image from the Montreal Neurological Institute (MNI) series (Cocosco et al. 1997). Twelve linear parameters (translation, rotation, zoom, and shear) were estimated for correcting the position and size of the structural images with respect to the template image. Residual differences between each pair of images were corrected using nonlinear basis functions (Friston et al. 1995). The normalization parameters were then applied to the PET images. Finally, the PET images were subsampled to a voxel size of 2 mm and were smoothed with an isotropic Gaussian kernel of 12 mm full-width half-maximum, to conform to the multivariate Gaussian assumptions of the data on which SPM is based and also to reduce the variance due to individual anatomical variability.

STATISTICAL MODEL AND INFERENCE.

We compared each of the three learning conditions with the RANDOM condition: FINGER versus RANDOM and TIMING versus RANDOM inexperiment1 and COMBINED versus RANDOM inexperiment 2.

We analyzed the data using a fixed-effects model that included covariates representing effects of interest and confounds. Separate covariates were given to the four learning sessions (separately for each of FINGER and TIMING conditions in experiment 1) or six learning sessions (COMBINED inexperiment2), in the order these sessions were performed. Another set of covariates were given to the four (experiment1) or six (experiment 2) control RANDOM sessions, also in the order performed.

First, we used a statistical model that looked for a condition-by-session interaction, in which the activity in a learning condition was initially similar to the activity in RANDOM condition but became increasingly higher over sessions than in the activity in RANDOM (learning-related increment of activation). The covariates for the learning condition were given linearly increasing weightings according to their position in the time series, reflecting an increase of activity over time; correspondingly, the covariates for the RANDOM condition were given linearly decreasing weightings. Thus the model identifies voxels in which activity in the learning condition is similar to the activity in the random condition at the start of learning, but becomes significantly different from it in the final session of learning. Since learning and control conditions were given alternately and their order was counter-balanced across subjects, we were able to control for changes in the brain activation over time, which were not related to learning. As will be shown inresults and Fig. 2, the decrement of reaction time was approximately linear during the learning sessions.

Fig. 2.

Reaction times (A) and SD of reaction times (B). Data from experiment1(left) and experiment 2(right). Mean and SE across subjects are shown for each session for each condition (FINGER, red; TIMING, green; COMBINED, blue; RANDOM, white). For each condition, sessions progressed from left to right (sessions 1–5 in experiment1 and sessions 1–7 inexperiment2). The last postscan sessions were shown in light shading.

Second, we used a statistical model that looked for a condition-by-session interaction, in which the activity in a learning condition was initially similar to the activity in RANDOM condition but became increasingly smaller over sessions (learning-related decrement of activation). The covariates for the learning condition were given linearly decreasing weightings according to their position in the time series; correspondingly, the covariates for the RANDOM condition were given linearly increasing weightings.

Third, we used a statistical model that looked for a main effect of condition, in which the activity in a learning condition was higher than the activity in RANDOM condition. To identify sustained activations that showed no changes with learning, we excluded areas that showed significant condition-by-session interactions.

For all the models, confounding covariates were removed by modeling the differences in rCBF between subjects as effects of no interest. The effect of global differences in rCBF between scans was also removed in the following way. Global flow (gCBF) was measured as the mean rCBF over all intra-cerebral voxels. The activity at each voxel was then scaled such that the grand mean of all voxels was 50 ml/min/dl. Analysis of covariance was then used to model subject-specific gCBF values as a confounding covariate.

For each comparison, a statistical parametric map of the tstatistics, SPM{T}, was created using a general linear model with a threshold of P < 0.001, uncorrected. Only activities involving contiguous clusters of at least five voxels were reported.

Because subjects were not treated as a random effect, it could be argued that the obtained results may only reflect the brain activation pattern of the subjects studied and cannot be generalized at a population level. However, the scan-to-scan variability within a PET session and the session-by-contrast interactions are about the same, and thus in PET, the difference between inferences based on fixed- and random-effects analyses is greatly attenuated (Friston et al. 1999).

RESULTS

Behavioral data

The mean reaction times (RTs) of the key presses to visual stimuli are shown in Fig. 2 A (left, experiment 1; right, experiment 2). The trials with incorrect button presses, which accounted for only 2.8% of the total, were excluded from the analysis. As shown in the figure, the RTs decreased from session 1 to session 4 (or session 6) (from left to right in the figure) in all three learning conditions (FINGER, red; TIMING, green; COMBINED, cyan), as well as in the RANDOM condition (white). There was a significant condition-by-session interaction (P < 0.05), indicating the difference in the rate of decrease in RTs across conditions. The rate of decrease was significantly larger in FINGER and TIMING than in RANDOM (experiment 1;P < 0.05) and was significantly larger in COMBINED than in RANDOM (experiment 2; P< 0.05). In the last scan sessions, RTs were significantly smaller in COMBINED (268 ms) than in FINGER (292 ms) and TIMING (324 ms) (P < 0.05).

The RTs in the postscan sessions (shown in light shading in Fig.2 A, session 5 in experiment 1 and session 7 in experiment 2) were not significantly different from the RTs in the last scan sessions (session 4 inexperiment 1 and session 6 inexperiment 2; P > 0.1). The shortening of RT had reached a plateau stage during the scanning sessions for the three learning conditions.

The across-trial variability in RTs also decreased concomitantly with learning. We calculated the SD of the RTs within each session. Figure2 B shows its mean across subjects. There was a significant condition-by-session interaction (P < 0.05), and the SD of RTs decreased significantly more in the three learning conditions than in RANDOM (P < 0.05). Such shortening of RT variability suggests an improvement of subjects' performance not only in terms of quickness but also in terms of stability of the performance. In the last scan sessions, the SD of RTs was significantly smaller in COMBINED (46 ms) than in FINGER (62 ms) and TIMING (70 ms) (P < 0.05).

After the scanning sessions, our subjects were able to reproduce the learned sequence without visual cues. In FINGER, all the subjects correctly reproduced the order of fingers with 100% accuracy. In TIMING, the reproduced time intervals between the eight moves did not differ significantly from those of the presented sequence (P > 0.1). In COMBINED, subjects correctly reproduced the order of fingers (100% accuracy). The reproduced time intervals did not differ significantly from the inter-stimulus intervals of the original sequence (P > 0.1). These results suggest that the subjects had acquired explicit knowledge of the sequence during scanning session.

Learning-related increment of activation

We first looked for a condition-by-session interaction, where the difference in brain activation between the learning and control conditions became increasingly larger as the learning sessions progressed. Figure 3 shows brain areas with significant learning-related increment of activation compared with RANDOM. Activations in FINGER, TIMING, and COMBINED are shown from top to bottom. The coordinates and the t values for the active areas are listed in Table1.

Fig. 3.

Areas showing learning-related increment of activation compared with RANDOM. Statistical parametric maps indicating significant condition-by-session interaction (P < 0.001, uncorrected), in which activity in a learning condition was initially similar to the activity in RANDOM condition, but became increasingly higher over sessions. From top to bottom, active areas on the left and right hemisphere in FINGER, TIMING, and COMBINED conditions are shown.

View this table:
Table 1.

Increment of activation

We found several brain areas that were specific to the learning of a finger sequence or to the learning of a timing sequence. The right intraparietal sulcus regions (IPS; red circle in Fig. 3) and medial parietal cortex (precuneus) showed an increment of activity in FINGER but not in TIMING. In contrast, the lateral part of the right cerebellar hemisphere showed an increment of activity in TIMING but not in FINGER (green circle in Fig. 3). The right IPS, precuneus, and cerebellum showed an increment of activity in COMBINED, where both finger sequence and timing sequence were learned.

In the left panels of Fig. 4are shown the activation in the right IPS on a axial section (A) and activation in the cerebellum on a coronal section (B). Regions showing significant increment of activity in FINGER, TIMING, and COMBINED are shown in red, green, and blue, respectively. Activation in FINGER and COMBINED overlapped in the right IPS (shown in pink in Fig. 4 A), whereas activation in FINGER and COMBINED overlapped in the cerebellum (shown in cyan in Fig.4 B). The adjusted rCBF data for the activation in the right IPS (coordinate: 50, −50, 54) and cerebellum (48, −64, −50) are shown in the center (experiment 1) and on the right (experiment 2) in Fig. 4,A and B. The rCBF in RANDOM, FINGER, TIMING, and COMBINED are colored in white, red, green, and blue, respectively. As the learning sessions progressed (from left toright in each panel), the rCBF in the right IPS gradually increased in FINGER and COMBINED but not in RANDOM and TIMING. The rCBF in the cerebellum decreased in RANDOM and FINGER but stayed constant in TIMING and COMBINED. Thus relative to RANDOM, the cerebellum in TIMING and COMBINED showed an increment of activation.

Fig. 4.

Segregation and overlap of areas showing learning-related increment of activation in learning effector and temporal sequences;A: right intraparietal sulcus region; B: cerebellum; C: left intraparietal sulcus region.Left: activation superimposed onto a section of the MNI template brain. Active areas in the 3 learning conditions are coded in red, green, and cyan for FINGER, TIMING, and COMBINED conditions, respectively. The overlaps between FINGER and TIMING, FINGER and COMBINED, and TIMING and COMBINED are coded in magenta, yellow, and cyan, respectively. The overlaps between the 3 conditions are shown in white. The left side of the brain is shown on the left.Center and right: adjusted rCBF for the voxel indicated by cross hairs on the left. Their coordinates are (50, −50, 54; A), (48, −64, −50; B), and (−38, −64, 54; C). Bars in red, green, blue, and white indicate rCBF in FINGER, TIMING, COMBINED, and RANDOM, respectively. The sessions progressed from left to right.

In contrast to these activations that are specific to the effector or temporal domain, the left intraparietal sulcus region (IPS) showed an increment of activity in all of the three learning conditions (yellow circle in Fig. 3). In the left panel of Fig. 4 C, activation in FINGER, TIMING, and COMBINED overlapped in the left IPS (shown in white). As shown in the center and right panels, the rCBF in the left IPS (−38, −64, 54) decreased in RANDOM and increased in FINGER, TIMING, and COMBINED.

The prefrontal cortex and medial and lateral premotor areas showed a different pattern of activation. They showed an increment of activity only in COMBINED, but not in the other conditions (blue circle in Fig.3). The activity in the prefrontal cortex, medial premotor area, and lateral premotor area is shown in Fig. 5,A–C. The activity in the prefrontal cortex was observed bilaterally within the middle frontal gyrus (DLPF) (Fig. 5 A,left), and this was confirmed from the structural images of each subject. This area was considered to be Brodmann's area 46 (Rajkowska and Goldman-Rakic 1995). The adjusted rCBF for the activation of DLPF (−38, 40, 20) did not change significantly across sessions in FINGER and TIMING compared with RANDOM (Fig.5 A, center), but showed an increment over sessions in COMBINED (Fig. 5 A, right). The activity in the medial premotor area had its peak anterior to the anterior commissure and above the cingulate sulcus, and thus was considered to be the presupplementary motor area (Pre-SMA; Fig.5 B). The activity in the lateral premotor area had its peak around the precentral sulcus at Z >50, and thus was considered to be the dorsal premotor area (PMd; Fig. 5 C). As for the DLPFC, the rCBF for the Pre-SMA (−10, 6, 56) and PMd (−22, −10, 58) showed a learning-related increment only in COMBINED.

Fig. 5.

Areas showing learning-related increment of activation specifically in learning a combined sequences. A: dorsolateral prefrontal cortex; B: presupplementary motor area;C: dorsal premotor cortex. Left: activation superimposed onto a section of the MNI template brain. Color codes are the same as in Fig. 4. Center andright: adjusted rCBF for the voxel indicated by cross hairs on the left. Their coordinates are (−38, 40, 20;A), (−10, 6, 56; B), and (−22, −10, 58; C).

Learning-related decrement of activation

We also looked for condition-by-session interactions, where brain activation in learning conditions became progressively smaller compared with that in the control condition (coordinates shown in Table2). We found that the medial temporal lobe area (MTL) and the inferior temporal lobe region (IT) on the right showed a learning-related decrement of activation. Theleft panels of Fig. 6 show activation in MTL (A) and IT (B). The decrement of activation is shown by the rCBF plots on the center and the right in Fig. 6. The MTL showed a learning-related decrement of activation in FINGER and COMBINED (note overlap of red and blue regions in Fig. 6 A, left). The peak of MTL activation (24, −22, −18) lay within the parahippocampal cortex. The area also showed a learning-related decrement of activation in TIMING at a lower threshold (P < 0.005). In contrast, IT (52, −34, −30) showed a decrement of activation only in COMBINED but not in FINGER and TIMING (Fig. 6 B, center andright).

View this table:
Table 2.

Decrement of activation

Fig. 6.

Areas showing learning-related decrement of activation.A: medial temporal lobe; B: inferior temporal gyrus. Left: activation superimposed onto a section of the MNI template brain. Color codes are the same as in Fig.4. Center and right: adjusted rCBF for the voxel indicated by cross hairs on the left. Their coordinates are (24, −22, −18; A) and (52, −34, −30; B).

Sustained activation

We then looked for a main effect of learning without condition-by-session interactions. Table3 shows coordinates of areas in which activations were significantly higher in the learning conditions than in RANDOM, but were constant across sessions. We found that the anterior fronto-polar region (APF; Fig.7 A) and the ventrolateral prefrontal cortex (VLPF; Fig. 7 B) showed suchsustained activation.

View this table:
Table 3.

Sustained activation

Fig. 7.

Areas showing sustained activation. A: anterior fronto-polar cortex; B: ventrolateral prefrontal cortex.Left: activation superimposed onto a section of the MNI template brain. Color codes are the same as in Fig. 4.Center and right: adjusted rCBF for the voxel indicated by cross hairs on the left. Their coordinates are (−30, 60, 4; A) and (−48, 22, 12;B).

For the APF, the overlap between the three conditions (colored in white in Fig. 7 A, left) lay around the inferior frontal sulcus, extending anterior to the sulcus but not extending to the orbital surface. The area was well anterior and inferior to area 46 (Rajkowska and Goldman-Rakic 1995), and thus was considered to be area 10. On the center and right in Fig. 7 Ais plotted the adjusted rCBF for APF (−30, 60, 4). As shown, the APF was more active in all of the three learning conditions than in RANDOM, and the activity did not show linear trend of changes across learning sessions.

In contrast, the left VPLF showed sustained activation only in COMBINED (Fig. 7 B). The peak lay in the pars triangularis of the inferior frontal gyrus, anterior to the ascending branch of the lateral fissure. The area was considered to be area 45 (Amunts et al. 1999). The rCBF at this peak (−48, 22, 12) showed a sustained increase of activation only in COMBINED (Fig. 7 B,center and right).

DISCUSSION

Our results show partial segregation of the neural mechanisms in learning of finger and timing sequences. Moreover, the results suggest that the two learning mechanisms interact in the frontal lobe. The prefrontal and premotor areas were active only when subjects learned both. Based on the results, we argue for the role of the frontal lobe in integrating the effector and timing information and implementing action-oriented representations.

Separate and overlapping mechanisms for learning effector and temporal sequences

We found that the learning of a finger sequence was associated with an increment of activity in the right intraparietal sulcus region (IPS) and medial parietal area (precuneus), whereas learning of a timing sequence was associated with an increment of activity in the lateral part of the cerebellum. The result suggests distinctive learning mechanisms for effector and temporal information.

Activation in the right IPS and precuneus in effector sequence learning has been shown by a number of previous imaging studies (Doyon et al. 2002; Grafton et al. 1995; Hazeltine et al. 1997; Honda et al. 1998; Jenkins et al. 1994; Jueptner et al. 1997; Sakai et al. 1998a; Toni and Passingham 1999;Toni et al. 1998). The IPS, especially on the right side, is thought to process spatial information (Corbetta et al. 2000; Culham and Kanwisher 2001). In our task, the relationship between the fingers and buttons was spatially compatible and a finger sequence may have been learned as a spatial sequence. The precuneus may also be related to spatial components of sequence processing (Sadato et al. 1996). Alternatively, the precuneus may be involved in visual imagery processes (Fletcher et al. 1996) in anticipation of an upcoming stimulus, which developed during learning. When a spatial component was removed by using a color-motor association or auditory-motor association, there was no activation in the right IPS and precuneus (Hazeltine et al. 1997; Sakai et al. 2000).

Cerebellar activation in learning of a timing sequence is consistent with a previous study by Ramnani and Passingham (2001), which has shown an increment of cerebellar activation concomitant with the progress of rhythm learning. The cerebellum has been regarded as one of the key structures for timing processing (Ivry and Keele 1989; Sakai et al. 1998b). Interestingly, the increment of cerebellar activation was relative rather than absolute. The area showed sustained activation during learning, but decreased over time in the control. A similar finding was also observed inRamnani and Passingham (2001). We consider the decrement of cerebellar activation in control as nonspecific time effects and the sustained activation in TIMING as reflecting learning-related activation. Another possibility is that in RANDOM and FINGER the brain initially tried to detect and learn a temporal pattern although it was impossible to learn. As a sequence was repeated, the cerebellar learning mechanisms were disengaged because there was no fixed temporal pattern. In TIMING, the cerebellar learning mechanisms kept operating throughout the sessions. It remains undetermined whether the cerebellar activation in this study reflects the representations of the learned timing or learning processes per se. Recently,Penhune and Doyon (2002) have shown a decrease of cerebellar activation when a timing sequence was learned extensively over 5 days. As they have suggested, the cerebellar activation in the present study may reflect an early phase of the timing learning mechanism, in which adjustment of movement timing to visual cues is thought to occur.

In COMBINED where learning occurred both in terms of finger and timing sequences, the right IPS, precuneus and the cerebellum were active, consistent with the idea that the right IPS and precuneus were involved in learning of a finger sequence and that the cerebellum was involved in learning of a timing sequence. The two sets of brain areas were simply added in learning both types of sequences.

We also found that the learning of an effector sequence and a temporal sequence share increment of activation in the left IPS, suggesting an overlap between the two learning mechanisms. The finding is consistent with Sakai et al. (2000), which showed common involvement of the left IPS in both effector selection and timing adjustment in a choice reaction time task. In fact, this area has been shown to be active in finger sequence learning (Doyon et al. 2002; Grafton et al. 1995; Hazeltine et al. 1997; Honda et al. 1998; Jenkins et al. 1994; Jueptner et al. 1997; Sakai et al. 1998a; Toni and Passingham 1999; Toni et al. 1998) and also in timing sequence learning (Ramnani and Passingham 2001; Sakai et al. 2000; Schubotz and von Cramon 2001). As suggested by Ramnani and Passingham (2001), this parietal region may play roles in sensori-motor mapping of temporal relations (converting timing cues to timed finger movements) as well as spatial relations (converting spatial cues to spatially compatible finger movements).

Frontal areas were not active in learning at an abstract level

Surprisingly, we did not find significant activation in the frontal lobe when subjects learned either a sequence of fingers (FINGER) or a sequence of timing (TIMING) alone. This is in marked contrast to the previous studies that showed activation in prefrontal and premotor areas for both learning of the order of a finger sequence (Grafton et al. 1995; Hazeltine et al. 1997; Honda et al. 1998) and learning of the timing of movements (Ramnani and Passingham 2001). In those studies, effector learning was tested while timing of movements was set constant, and timing learning was tested while the order of the fingers was set constant. Under these conditions, as subjects learned a sequence, they knew which finger to be used and at which timing to execute the movements. Thus subjects were able to prepare a specific action sequence. In contrast, our experiment 1used random timing for learning of a finger sequence and random order of fingers for learning of a timing sequence. Under these conditions, subjects did not know at which timing to execute a FINGER sequence or with which finger to execute a TIMING sequence. Thus subjects were not able to prepare a specific action sequence; they learned the sequence only at an abstract level. Thus the frontal lobe takes part when subjects can prepare for a specific action sequence, in other words, when a sequence is learned at an action-oriented level rather.

The idea is consistent with the finding in the previous studies that used a simple reaction time task (Deiber et al. 1991,1996; Jahanshahi et al. 1995; Jenkins et al. 2000). The prefrontal cortex was not active either when the finger to be used was indicated by unpredictable cues (Deiber et al. 1991, 1996; Jahanshahi et al. 1995) or when the timing of finger movements was externally triggered by unpredictable cues (Jenkins et al. 2000). Instead, the prefrontal cortex was active when the finger to be used was chosen by the subjects and the timing of finger movements was predictable (Deiber et al. 1991, 1996; Jahanshahi et al. 1995) or when the timing was chosen by the subjects and the same finger was used (Jenkins et al. 2000). Thus the critical determinant of prefrontal activation is whether the subjects knew which finger to move and when to make the finger movement. This idea was supported by the significant activation in frontal areas when subjects learned a finger sequence and a timing sequence (COMBINED).

Interaction between effector and temporal sequence learning

We found that the mid-part of the DLPF, as well as the PMd and Pre-SMA, was specifically active in COMBINED. As discussed above, only in this condition were the subjects able to prepare specific action in terms of effector and timing. The increment of activity in prefrontal and premotor areas may reflect increased level of preparatory activity in these regions. Both the PMd and Pre-SMA show motor preparatory activity in monkeys and humans (Kurata 1993;Matsuzaka et al. 1992; Weilke et al. 2001). Other reports also suggest that the DLPF is involved in the active preparation for forthcoming action (Pochon et al. 2001; Quintana and Fuster 1999). There are massive projections from DLPF to the PMd and Pre-SMA (Bates and Goldman-Rakic 1993; Lu et al. 1994), and they may transmit action control signals to enable quick and accurate movements.

The prefrontal cortex is also active when subjects imagine or mentally simulate movements (Deiber et al. 1998; Gerardin et al. 2000). Such activation is thought to reflect active maintenance of action representations. The increment of the frontal activation observed in this study can also be taken to reflect the development of action representations during learning.

Because we used different subjects for experiments 1 and 2, it could be argued that the frontal activation observed in COMBINED may only reflect the difference in the subjects group. However, unlike fMRI studies, the between-subjects variability in rCBF in PET studies is similar to the within-subject, inter-session variability in rCBF (Friston et al. 1999). In addition, as shown in Fig. 5, the time course of prefrontal activation in RANDOM were similar between experiments 1 and 2. We believe that the difference in subjects groups accounted little for the different pattern of frontal lobe activation. There could be a confound due to the difference in the experimental context; subjects in experiment 1learned two sequences, FINGER and TIMING, whereas subjects inexperiment 2 learned only one sequence, COMBINED. Thus in experiment 1, there may be interference effects from learning one sequence over learning of the other. However, as far as the behavioral data were concerned, there was no interference between learning of a finger sequence and learning of a timing sequence. We believe that the contextual difference betweenexperiments 1 and 2 had little effect on the learning processes. Interestingly, when FINGER and COMBINED were tested in the same session, we observed significant interference effects. This is probably because the two sequences used different sequences of finger movements. Thus it seems that interference occurs when the two sequences contain information of the same domain (effector or temporal information), but interference does not occur when the two sequences contain information of different domains.

The activity in frontal areas cannot be explained in terms of task difficulty because the activity increased as RTs shortened. It is also unlikely that the frontal activation is associated with quick performance per se, because the activity did not increase in FINGER and TIMING where the RTs decreased with learning (see Fig. 5). There could be a difference in the progress of learning stages between FINGER, TIMING, and COMBINED; in COMBINED learning may have occurred faster than in FINGER and TIMING. However, the shortening of reaction times reached a plateau in the final session in FINGER and TIMING as in COMBINED. As far as the reaction times were concerned, the learning stage seems to have reached an advanced level during the scanning session similarly in all the three conditions. We do not think that the difference in RT at the final session of learning between COMBINED and other two learning conditions is due to the difference in the learning stages. Instead, the difference in RT reflects the difference in the level of sequence coding; a sequence coded at an action-oriented level (in COMBINED) can be performed quicker than a sequence coded at an abstract level (in FINGER and TIMING) because subjects can prepare a specific finger movements at a specific timing.

The DLPF and frontal premotor areas may play roles in integrating the information in effector (finger) and temporal (timing) domains. Our results suggest that the initial processing in the two domains may be carried out independently: the right IPS and precuneus for the effector information and the cerebellum for the temporal information. The DLPF may receive and integrate the two kinds of information from these domain specific posterior areas (Cavada and Goldman-Rakic 1989; Middleton and Strick 2001; Petrides and Pandya 1984) and construct action-oriented representations for a motor sequence. By “action-oriented representations,” we do not intend to imply that the DLPF is involved in coding of specific movements. There is other evidence that the information is integrated in the DLPF: Prabhakaran et al. (2000) have shown that there is more activation in the dorsolateral prefrontal cortex when letters and positions have to be integrated rather than being remembered separately.

Time course of prefrontal activation

The DLPF was less active at the beginning of learning, but became more active as the subjects learned the sequence. The direction of the time course of prefrontal activation, however, differed in different studies. Some studies on motor sequence learning have shown an increment of activation in DLPF as in the present study (Grafton et al. 1995; Hazeltine et al. 1997; Honda et al. 1998), whereas other studies have shown a decrement of activation in similar regions (Jenkins et al. 1994;Jueptner et al. 1997; Sakai et al. 1998a;Toni and Passingham 1999; Toni et al. 1998). This difference results from the difference in the strategies to learn a motor sequence. A learning-related increment of prefrontal activation was observed when subjects initially responded to externally presented stimuli, but later gained explicit knowledge of a motor sequence and became able to prepare for the next movements (Grafton et al. 1995; Hazeltine et al. 1997; Honda et al. 1998). In contrast, a decrement of prefrontal activation was observed when subjects initially learned a sequence by trial and error, and later the task performance became automatic (Jenkins et al. 1994; Jueptner et al. 1997; Sakai et al. 1998a; Toni and Passingham 1999; Toni et al. 1998). In the present study, the task was not learned by trial and error. Our subjects were required to maintain attention throughout learning sessions so as to respond as accurately and quickly as possible. It has been shown that attention to the performance of a well-learned sequence results in reactivation of the DLPF (Jueptner et al. 1997).

DLPF differs from APF and VLPF

The DLPF was distinguished from anterior fronto-polar region (APF), in which activity was found both at an action level of sequence learning (COMBINED) and at an abstract level of sequence learning (FINGER and TIMING). This area also differed from the DLPF in that it was active throughout the learning sessions. APF may play a role in forward planning of sequence performance, which is necessary for performance of an action sequence as well as performance of an abstract sequence. A similar region was active in Tower of London task (Baker et al. 1996), as well as in a range of prospective memory tasks (Burgess et al. 2000). Especially in sequence learning, it is essential that subjects hold in mind the entire sequence information (a main goal) while processing individual button presses (concurrent subgoals). APF activation in this study may reflect such branching processes (Koechlin et al. 1999) or the set-up of multiple processing nodes during sequence performance (Tanji and Hoshi 2001). Alternatively, the APF activation may reflect episodic retrieval of the learned sequence representations regardless of whether they are abstract or action-oriented representations (see Buckner and Koutstall 1998).

There was also a sustained activation in the VLPF, but it differed from the APF in that it was active only in COMBINED, suggesting its specific involvement in an action-oriented representation. Passingham et al. (2000) argued that the VLPF represents the association between visual cues and action. Its sustained activation in this study might reflect active maintenance of over-learned visuo-motor association to enable quick and accurate action performance. This region showed a sustained activation across sessions when subjects performed an overlearned visuo-motor association task (Passingham et al. 2000).

A similar regional dissociation between an increment of activation and sustained activation was observed in the lateral premotor cortex: The more dorsal part of the premotor cortex (Z = 58 on the left and 60 on the right hemisphere) showed an increment of activation in COMBINED, whereas the more ventral part of the premotor cortex (Z = 46) showed sustained activation in COMBINED. This may reflect their separate inputs from the DLPF and VLPF (Lu et al. 1994; Pandya and Yeterian 1996;Preuss and Goldman-Rakic 1989).

Learning-related decrement of activation

We found that the MTL showed a decrement of activation in FINGER and COMBINED relative to RANDOM. In FINGER and COMBINED, the eight patterns of visual stimuli were presented in the same order for every trial. As subjects learn the sequence, they were able to predict the next stimulus pattern, i.e., the arrangement of one asterisk and two horizontal bars. In contrast, the stimulus pattern was unpredictable for every trial of RANDOM. The decrement of MTL activation may reflect an increase in the predictability of the stimulus pattern. The decrement of MTL activation was also found in TIMING at a lower threshold, which may suggest an involvement of this area in the prediction of timing (N. Ramnani and R. E. Passingham, unpublished observation). The decrement of MTL activation may also reflect less demanding encoding processes or more efficient retrieval processes concomitant with the learning.

We also found a decrement of activation in the IT, but only in COMBINED. IT has been shown to respond to perception and memory of visual patterns (Miyashita 1993). Only in COMBINED did stimulus occurrence become predictable both in terms of the stimulus pattern and timing of stimulus presentation. IT may be sensitive to the predictability of both visual pattern and timing. Ramnani and Passingham (2001) also found a decrement of activation in a similar region. In their study, the timing of stimulus presentation was learned and the sequence of stimulus pattern had been overlearned.

Action-oriented representation in the frontal lobe

An action-oriented representation of a motor sequence requires both effector information (with which finger to execute the motor sequence) and temporal information (at which timing to execute the motor sequence). We only have an abstract representation for the motor sequence when we learn either of them. Our results suggest that the frontal lobe is specifically involved in action-oriented representations, whereas the posterior areas (parietal and cerebellar) are involved in abstract representations. This is consistent with the idea that the frontal areas select and implement specific motor action while the parietal areas maintain the spatial representation of the potential action (Kalaska et al. 1997; Scott et al. 1997). Furthermore, we have added the new finding that the frontal action representation also requires the timing information with which the motor action is carried out. An action-oriented representation may be an integrative product of spatial/effector and temporal information. It remains open how an abstract representation for finger order sequence and an abstract representation for temporal sequence are transformed into an action-oriented representation. We predict that dynamic interactions between the frontal lobe and posterior regions are the key to achieve this transformation.

Acknowledgments

This study was supported by the Wellcome Trust. K. Sakai was supported by the Human Frontier Science Program.

Footnotes

  • Address for reprint requests: K. Sakai, Wellcome Department of Imaging Neuroscience, Institute of Neurology, 12 Queen Square, London WC1N 3BG, UK (E-mail: ksakai{at}fil.ion.ucl.ac.uk).

REFERENCES

View Abstract