|
|
||||||||
The Journal of Neurophysiology Vol. 80 No. 2 August 1998, pp. 964-977
Copyright ©1998 by the American Physiological Society
Institute of Physiology, University of Fribourg, CH-1700 Fribourg, Switzerland
| |
ABSTRACT |
|---|
|
|
|---|
Tremblay, Léon, Jeffrey R. Hollerman, and Wolfram Schultz. Modifications of reward expectation-related neuronal activity during learning in primate striatum. J. Neurophysiol. 80: 964-977, 1998. This study investigated neuronal activity in the anterior striatum while monkeys repeatedly learned to associate new instruction stimuli with known behavioral reactions and reinforcers. In a delayed go-nogo task with several trial types, an initial picture instructed the animal to execute or withhold a reaching movement and to expect a liquid reward or not. During learning, new instruction pictures were presented, and animals guessed and performed one of the trial types according to a trial-and-error strategy. Learning of a large number of pictures resulted in a learning set in which learning took place in a few trials and correct performance exceeded 80% in the first 60-90 trials. About 200 task-related striatal neurons studied in both familiar and learning conditions showed three forms of changes during learning. Activations related to the preparation and execution of behavioral reactions and the expectation of reward were maintained in many neurons but occurred in inappropriate trial types when behavioral errors were made. The activations became appropriate for individual trial types when the animals' behavior adapted to the new task contingencies. In particular, reward expectation-related activations occurred initially in both rewarded and unrewarded movement trials and became subsequently restricted to rewarded trials. These changes occurred in parallel with the visible adaptation of reward expectations by the animals. The second learning change consisted in decreases of task-related activations that were either restricted to the initial trials of new learning problems or persisted during the subsequent consolidation phase. They probably reflected reductions in the expectation and preparation of upcoming task events, including reward. The third learning change consisted in transient or sustained increases of activations. These might reflect the increased attention accompanying learning and serve to induce synaptic changes underlying the behavioral adaptations. Both decreases and increases often induced changes in the trial selective occurrence of activations. In conclusion, neurons in anterior striatum showed changes related to adaptations or reductions of expectations in new task situations and displayed activations that might serve to induce structural changes during learning.
Several lines of evidence postulate a role of the basal ganglia in various forms of learning. Deficits in Parkinson and Huntington patients suggest an involvement in motor learning, habit formation, and procedural memory (Butters et al. 1985
The study was performed on the same two Macaca fascicularis monkeys (A and B) using the same experimental procedures with the same delayed go-nogo task as described in the preceding report (Hollerman et al. 1998
Learning behavior
The repeated learning of new stimuli within the same task structure resulted in a learning set in which each problem of three new stimuli was learned rapidly. At the onset of neuronal recordings, monkeys A and B had learned 275 and 78 problems. Thereafter, learning was relatively stable, with an overall >80% correct performance in blocks of 60-90 learning trials. Learning occurred largely within the 1st trials and approached an asymptote within 5-10 trials of each trial type, although animals occasionally made errors during the subsequent consolidation period (Fig. 2A; see also Figs. 13, 16, and 17). Learning curves remained stable during the course of experimentation in all three trial types (Fig. 2B), although reduced learning was occasionally observed (Fig. 2C). Medians of correct performance in the 1st 15 trials in each of the 3 trial types were, respectively, 87, 100, and 93% (monkey A) and 73, 80, and 93% (monkey B). Performance in familiar trials exceeded 95% throughout the period of neuronal recording.
Overview of neuronal changes
A total of 205 slowly discharging, task-related neurons was tested in both familiar and learning trials in the anterior striatum between the anterior commissure and 7 mm rostral to it. Their behavioral relationships were characterized in two respects. First, neuronal activations were event-related in temporally preceding or following the instructions, the trigger or the reinforcers. According to the preceding report (Hollerman et al. 1998
Adaptation of maintained neuronal activations
Activations were maintained in 150 of the 205 tested neurons (73%), with at least one event relationship in at least one trial type failing to show significant differences in magnitude between familiar and learning trials (Table 1). However, activations in only 39 neurons (19%) showed unchanged magnitudes in every event relationship in every trial type. Activations in 141 of the 205 tested neurons (69%) remained selective for the same trial types as with familiar performance. Activations reflected the type of trial actually performed by the animal, which was not necessarily the trial type indicated by the instruction. They occurred in inappropriate trial types when animals performed a different type of trial than indicated by the instruction. After a few learning trials, animals adapted their behavior to the type of trial indicated, and neuronal activations adapted accordingly. Maintained trial selectivities involved activations with unchanged, decreased, or increased magnitudes. Activations following the three task events showed mean latencies and durations of 112-380 ms and 250-2,400 ms, respectively, which varied statistically insignificantly during learning.
RELATIONSHIPS TO REWARD.
As described in the preceding report, many of the task-related neuronal activations in the anterior striatum showed pronounced relationships to the reinforcers during familiar performance (Hollerman et al. 1998
RELATIONSHIP TO BEHAVIORAL REACTIONS.
Several task-related neuronal activations in anterior striatum differentiated between the execution and withholding of movement during familiar performance (Hollerman et al. 1998
RECORDING POSITIONS.
Neurons with maintained relationships during learning were distributed throughout the entire recording area in caudate nucleus (n = 71), putamen (n = 46), and ventral striatum (n = 24), without local preferences for any form of task relationship (Fig. 12).
Decreased neuronal activations
A total of 90 of the 205 tested neurons (44%) showed significantly decreased magnitudes of at least one event-related activation in at least one trial type. In 46 of these neurons, the decreases occurred transiently during initial learning trials until more consistent task performance was obtained. In 48 neurons, decreases were sustained in outlasting this period and often remaining present throughout the entire testing period of 60-90 trials (Table 1). Combinations between transient and sustained decreases occurred occasionally with different task events or trial types. Transient or sustained decreases resulted in the complete absence of activation in some trial types in 40 and 33 of these neurons, respectively. This abolished entirely the responsiveness in neurons activated in a single familiar trial type and increased the trial selectivity of activations in neurons activated in several familiar trial types. Neurons with decreased activations during learning were unpreferentially distributed over the entire recording area in caudate nucleus (n = 48), putamen (n = 22), and ventral striatum (n = 20).
Increased neuronal activations
A total of 95 of the 205 tested neurons (46%) showed significantly increased magnitudes of at least 1 event-related activation in at least 1 trial type. The increases occurred transiently during initial learning trials in 38 of these neurons and remained sustained beyond learning and often during the entire 60-90 testing trials in 72 neurons (Table 1). Combinations between transient and sustained increases occurred occasionally with different task events or trial types. The transient or sustained increases resulted in the appearance of new activations in some trial types in 28 and 38 of these neurons, respectively. Thus 11 neurons not activated in any familiar trial type showed task-related activations during learning, and the trial selectivity of neurons with existing task relationships was decreased. These increases do not include the transiently appearing reward-related activations when initial unrewarded movement trials were performed with parameters of rewarded movements. Neurons with increased activations were unpreferentially distributed over the entire recording area in caudate nucleus (n = 54), putamen (n = 22), and ventral striatum (n = 19).
These data show that task-related neuronal activations in the anterior striatum undergo changes when animals rapidly adapt their behavioral reactions to new instruction stimuli. In the first form of neuronal change during learning, the behavioral adaptations were accompanied by adapting neuronal activations that otherwise maintained their relationships to behavioral reactions and upcoming reinforcers. In the second form, neuronal activations were decreased or abolished. In the third form, neurons were more strongly or exclusively activated during learning. Together, these data suggest an involvement of the striatum in associating new environmental signals with known behavioral reactions and reinforcers.
Learning behavior
In most learning situations, only a part of the environment changes, and learning is confined to the modified contingencies. Repeated learning in an unchanged context leads to a learning set in which new stimuli are fully learned in a few trials (Gaffan et al. 1988 Adaptation of maintained activations
Many neurons in the anterior striatum are activated in relation to the preparation of movements and the expectation of individual task events, including reward (Alexander and Crutcher 1990 Decreased neuronal activations
A sizeable fraction of the investigated striatal neurons showed decreased event-related activations during learning compared with familiar performance. Decreases occurred in relation to any of the events and often resulted in the complete disappearance of activations in particular trial types, thus making the neurons either entirely unresponsive or increasingly task selective. The shorter lasting decreases resembled the "learning-dependent" changes that occurred transiently in premotor cortex and supplementary eye field during learning and subsequently recovered to levels of familiar performance (Chen and Wise 1995a Increased neuronal activations
Several striatal neurons showed transient or sustained increases of existing event-related activations in learning trials. Activations appearing in additional trial types during learning induced decreases in trial selectivity. These striatal neurons apparently processed unidentified instructions during learning with less selectivity and thus broadened their potential coding of task events. The transient nature of increases in several striatal neurons suggests that a comparable trial selectivity was regained after the significances of instructions were acquired.
Comparison with responses of dopamine neurons
Midbrain dopamine neurons respond to primary rewards and to conditioned, reward-predicting stimuli in a manner compatible with the concept of an error in the prediction of reward (Ljungberg et al. 1992
![]()
INTRODUCTION
Abstract
Introduction
Methods
Results
Discussion
References
; Canavan et al. 1989
; Harrington et al. 1990
; Knopman and Nissen 1991
; Saint-Cyr et al. 1988
; Vriezen and Moscovitch 1990
). This view is also supported by increases in striatal blood flow during motor learning (Seitz and Roland 1992
) and, by exclusion, by preserved procedural learning after temporal lobe lesions leading to declarative memory deficits (Mishkin and Appenzeller 1987
; Phillips and Carr 1987
). Although a specific involvement in procedural memory has been questioned (Gaffan 1996
; Wise 1996
), a role in motor learning would be compatible with the known motor functions of basal ganglia, including the formation of movement sequences (Graybiel 1995
; Hikosaka et al. 1995
). Psychopharmacological experiments suggest a role for the basal ganglia in incentive or reward-directed learning, in particular the ventral striatum and the dopamine systems (Beninger 1983
; Fibiger and Phillips 1986
; Robbins and Everitt 1992
; Wise 1982
). Further cognitive learning functions are suggested by the effects of lesions of the monkey anterior striatum, which induce deficits in spatial delayed response and alternation learning (Bättig et al. 1960
; Divac et al. 1967
). Striatal lesions in rats lead to learning deficits in spatial navigation (Whishaw et al. 1987
) and radial arm maze tasks (Packard et al. 1989
). Finally, the basal ganglia may mediate the influence of declarative memory functions of the temporal lobe on behavioral output controlled by the frontal lobe, as visual discrimination learning remains unimpaired after section of all temporal lobe projections to prefrontal cortex and diencephalon except those via the striatum (Gaffan 1996
). Taken together, the basal ganglia appear to be involved in a considerable number of learning functions, rather than subserving a single learning mechanism. This appears to be in accordance with the multiple behavioral functions attributed to these nuclei.
) and midbrain dopamine neurons (Ljungberg et al. 1992
) acquired responses during appetitive learning and may be involved in reward detection, acquisition of stimulus-response associations, and reward prediction. Neurons in the tail of caudate failed to display major changes during discrimination learning (Brown et al. 1995
). However, learning-related changes were found in frontal cortical areas projecting to the head and body of striatum. Neurons in dorsolateral prefrontal cortex transiently lost their behavioral selectivity during the initial learning of new instruction stimuli in delayed response tasks (Niki et al. 1990
; Watanabe 1990
). Selectivity reappeared when task performance reached 85-90% correct trials. Neurons in premotor cortex and supplementary and frontal eye fields showed decreased, increased, or even new task-related activations, as well as changes in behavioral selectivity, when new instruction stimuli were introduced in conditional motor tasks (Chen and Wise 1995a
,b
; Mitz et al. 1991
).
; Harlow 1949
). Individual neurons were studied during a whole learning episode and their activity compared with familiar performance. This appeared appropriate in view of the heterogeneity of task relationships of striatal neurons, rather than studying different neurons before, during, and after the learning of entirely new tasks. The learning set was based on a delayed go-nogo paradigm comparable with the tasks used in learning studies on cortical neurons. Depending on an initial instruction picture, animals executed an arm movement reinforced either by liquid or a conditioned sound, or they withheld the movement and were rewarded by liquid. Neurons in the anterior striatum showed several forms of activations related to the preparation of movement and the expectation of reward during the performance of this task with familiar instructions (Hollerman et al. 1998
). The present report describes changes of neuronal activity during the association of new visual instruction stimuli with known behavioral reactions and reward. These data were previously presented as abstract (Tremblay et al. 1994
).

View larger version (96K):
[in a new window]
FIG. 1.
Examples of instruction pictures for the 3 trial types. Top row: familiar stimuli. Middle and bottom: stimuli for 2 learning problems. From left to right, instructions indicate rewarded movement trials, rewarded nonmovement trials, and unrewarded movement trials, respectively.
![]()
METHODS
Abstract
Introduction
Methods
Results
Discussion
References
). One of three colored instruction pictures was presented on a computer monitor in front of the animal for 1.0 s (13 × 13°) and specifically indicated one of three trial types (rewarded movement, rewarded nonmovement, and unrewarded movement). A red trigger stimulus presented at a random 2.5-3.5 s after instruction onset required the animal to execute or withhold a reaching movement according to each trial type (13 × 13°, same position as instruction). The trigger was the same in each trial type. In rewarded movement trials, the animal released a resting key and touched a small lever below the trigger to receive a small quantity of apple juice (0.15-0.20 ml) after a delay of 1.5 s. In rewarded nonmovement trials, the animal remained motionless on the resting key for 1.5 s and received the same liquid reward after a further 1.5 s. In unrewarded movement trials, the animal reacted as in rewarded movement trials, but correct performance was followed not by liquid reward but by a 1-kHz sound. The sound constituted a conditioned auditory reinforcer, because it visibly helped the animal to perform the task, but it was not an explicit reward, hence the simplifying term "unrewarded" movements. Thus each instruction was the unique stimulus in each trial indicating the behavioral reaction to be performed after the trigger (execution or withholding of movement) and predicting the type of reinforcer (liquid or sound). Correctly performed unrewarded movements were followed by one of the rewarded trials. Any incorrectly performed trial was repeated. Apart from that, the three trial types alternated semirandomly, with the number of consecutive trials of the same type restricted to three rewarded movement trials, one or two nonmovement trials, and a single unrewarded movement trial. Trials lasted 11-13 s, and intertrial intervals were 4-7 s.

View larger version (33K):
[in a new window]
FIG. 2.
A: learning curves for the 3 trial types, averaged from 117 blocks of trials presented during neuronal recordings with monkey A. B: stability of learning curves during the course of the experiment, shown here for rewarded movement trials in monkey A. The 1st 39 problems were presented during the initial period of recording, the 2nd 39 at the midpoint, and the 3rd 39 at the end of neuronal recordings. In A and B, trial numbers start with introduction of a new learning problem, and % indicates level of correct performance. C: average learning during neuronal recordings for familiar (top) and learning trials (bottom) in monkey A. Tick marks below curves indicate learning problems, consisting of 3 new instruction stimuli. Percent correct performance was calculated from the 1st 15 trials of each type for each block.

View larger version (15K):
[in a new window]
FIG. 3.
Development of movement parameters during learning, distinguishing rewarded from unrewarded movement trials. Differences in movement parameters in rewarded vs. unrewarded movement trial types were stable throughout a block of familiar trials (left panels). In contrast, during learning the movement parameters differentiated with experience with the new stimuli (right panels). Sequential occurrence of each trial type after introduction of a new problem is indicated below curves. Curves show medians obtained from 117 problems tested during neuronal recordings with monkey B.

View larger version (38K):
[in a new window]
FIG. 4.
Development of muscle activity distinguishing rewarded from unrewarded movements during learning. Left: muscle activity accompanying the return movement from the lever to the resting key was similar in initial learning trials with movements of both types and later became distinctive in unrewarded movement trials (arrows). Right: occasionally, movement parameters in monkey A were similar in the 2 movement trial types with familiar instructions and failed to differentiate during learning. All muscle recordings were obtained during neuronal recordings from electrodes chronically implanted in the extensor digitorum communis. Dots in rasters indicate the times at which rectified muscle activity exceeded a preset level. Original trial sequence is shown from top to bottom in each part.
![]()
RESULTS
Abstract
Introduction
Methods
Results
Discussion
References

View larger version (39K):
[in a new window]
FIG. 13.
Transiently decreased activations in 2 caudate neurons during learning. A: during initial learning, decrease of the selective pretrigger activation in rewarded movement trials. B: decrease of posttrigger activation during initial learning trials with gradual buildup during subsequent trials. The posttrigger activation occurred in all 3 trial types during both familiar performance and learning. Correctly and incorrectly performed trials are indicated by plus sign and minus sign, respectively. Only rewarded movement trials are shown in A and B.

View larger version (28K):
[in a new window]
FIG. 16.
Appearance of activations with different durations in 3 caudate neurons during learning. A: transient appearance of activation following the trigger stimulus during learning in rewarded movement trials. A posttrigger activation occurred also in unrewarded movement trials and was maintained during learning (not shown). B: sustained appearance of selective posttrigger activation in initial nonmovement trials during learning, with gradual disappearance during subsequent trials. C: sustained appearance of instruction response in rewarded movement trials during learning. This neuron also showed a new instruction response in unrewarded movement trials during learning. Correctly and incorrectly performed trials are indicated by plus sign and minus sign, respectively.

View larger version (39K):
[in a new window]
FIG. 17.
Sustained increase and new appearance of instruction response in a caudate neuron during learning. With familiar instruction stimuli, the neuron responded almost exclusively in rewarded movement trials. During learning, the response was increased in rewarded movement trials. In nonmovement trials, a new response appeared and became gradually weaker during the course of learning. In unrewarded movement trials, a response appeared also but remained present during the testing period (not shown). The appearance of substantial responses in other than rewarded movement trials constituted a reduction in trial selectivity. Correctly and incorrectly performed trials are indicated by plus sign and minus sign, respectively.
View this table:
TABLE 1.
Numbers of neuronal activations influenced by learning

View larger version (26K):
[in a new window]
FIG. 5.
Adaptation of reward expectation-related sustained instruction response during learning. During familiar performance, this caudate neuron showed a sustained response in rewarded movement trials (top) and a transient response in nonmovement trials (not shown). Typically, the hand returned later to the resting key in rewarded as compared with unrewarded movement trials (return times of 958-2,539 ms vs. 404-735 ms). During learning, the sustained response occurred initially also in unrewarded movement trials, which were performed with parameters of rewarded movements (return times of 1,606-2,971 ms in trials 1-9 and 1,141-2,814 ms in trials 13-16). The response disappeared when movement parameters became typical for unrewarded movements (return times of 687-888 ms in trials 10-12 and 457-700 ms in trials 17-end; brackets to the right). Rewarded movement, nonmovement, and unrewarded movement trials alternated semirandomly during the experiment and were separated for analysis. Familiar and learning trials were performed in separate blocks. Dots in rasters denote the time of occurrence of neuronal impulses, referenced to instruction onset. Each line of dots represents 1 trial. In this and the following figures, the sequence of trials is plotted chronologically from top to bottom, learning rasters beginning with the 1st presentations of new instructions. Data in familiar and learning trials are only shown from correctly performed trials, unless mentioned otherwise.

View larger version (16K):
[in a new window]
FIG. 6.
Adaptation of reward expectation-related activation following the trigger stimulus during learning. This caudate neuron was activated when a movement for liquid reward was performed, as opposed to sound reinforcement, and showed no activation in nonmovement trials. During learning, activations occurred in correct rewarded movement trials and in initial unrewarded movement trials performed with parameters of rewarded movements, as judged by the return to the resting key.

View larger version (20K):
[in a new window]
FIG. 7.
Adaptation of reward expectation-related activation during learning. This ventral striatum neuron was activated during familiar performance and learning in both rewarded trials irrespective of movement (A and B), but not in trials in which only sound reinforcement was given (C). During learning, activations occurred in addition in initial unrewarded movement trials that were performed with parameters of rewarded movements (F). Neuronal activations in unrewarded movement trials disappeared 3 trials before the animal performed with parameters of unrewarded movements, as judged by return of the hand to the resting key (arrows at right in F). In agreement with activations in familiar nonmovement trials (B), erroneous nonmovement reactions in rewarded movement trials were also accompanied by activations (G). Here, impulses were referenced to trigger offset as the last signal before potential reward delivery, preceding reward delivery by a constant 1.5 s. In erroneous nonmovement trials in which a movement was performed (H), the trigger disappeared already on erroneous key release, and the movement was aborted by the animal before lever touch (return to key in H). This suggests an absence of reward expectation and may explain the missing activation in H. In addition, activations in correctly performed rewarded movement trials during learning (D) were increased compared with familiar trials (A).
), activations preceding individual task events were probably related to the expectation of the future event or the preparation of the behavioral reaction. Activations following an event constituted a response to the event or, when following the trigger stimulus, may be related to the execution or withholding of movement. Second, neuronal activations were trial selective in occurring only in single trial types or combinations of two trial types. Activations were more frequent in trials reinforced by liquid as opposed to sound. They further discriminated between movement and nonmovement trials. Several neurons showed multiple event relationships, each one often with different trial selectivity.
View this table:
TABLE 2.
Numbers of activations with maintained trial selectivity

View larger version (24K):
[in a new window]
FIG. 8.
Adaptation of reward expectation-related activation during learning without changes in movement parameters. This ventral striatum neuron was activated before the liquid reward in rewarded movement and nonmovement trials during familiar and learning situations (only movement trials shown). In contrast to the preceding figure, movements in rewarded and unrewarded trials were performed with similar parameters, in both familiar and learning situations.
). Most activations related to the instructions, trigger, or reinforcers occurred only in rewarded movement trials, in both rewarded trials irrespective of movement, or occasionally only in unrewarded movement trials. Many preinstruction activations occurred selectively after rewarded trials and probably reflected an upcoming unrewarded trial.

View larger version (17K):
[in a new window]
FIG. 9.
Schematic overview of the development of reward expectation-related neuronal activations during learning. A: activations during the instruction-trigger interval related conjointly to movement preparation and expectation of liquid reward. Activations occurred inappropriately also in unrewarded movement trials during early learning but became restricted again to rewarded movement trials in later learning stages. B: activations during the trigger-reward interval related to expectation of liquid reward irrespective of movement. Activations occurred inappropriately also in unrewarded movement trials during early learning. In A and B, inappropriate neuronal activations during early learning occurred in parallel with inappropriate behavioral reactions, indicating that animals had not yet acquired a differential reward expectation from the new instructions. Filled bars indicate activations occurring appropriately according to trial types; open bars show inappropriate activations. Drawings are based on time courses of activations in individual striatal neurons during the instruction-trigger and trigger-reinforcement intervals, original examples of which are shown in Figs. 5 and 6-8, respectively. Below graphs, i, t, and r indicate occurrence of the instruction, trigger, and reinforcer, respectively.

View larger version (19K):
[in a new window]
FIG. 10.
Adaptation of sustained instruction response related to the expectation of the conditioned auditory reinforcer during learning. With familiar instructions, this caudate neuron was only activated in unrewarded movement trials (top, nonmovement trials not shown). During learning, the response was initially very weak in unrewarded movement trials and became substantial only after the 3rd trial of this type (bottom right). The initial absence of response was consistent with the animal's habitual treatment of initial movements as rewarded. However, in this instance, movements in rewarded and unrewarded trials were performed with similar parameters, in both familiar and learning situations. Only data from correctly performed trials are shown.
). This was seen in relation to all three task events by activations occurring selectively or preferentially in rewarded, unrewarded or both movement trials, or in nonmovement trials. Some preinstruction activations occurred selectively after nonmovement trials and probably reflected an upcoming movement trial.

View larger version (32K):
[in a new window]
FIG. 11.
Adaptation of movement preparatory activity during learning. With familiar instructions, this caudate neuron showed a sustained instruction response only in rewarded movement trials and a transient response in nonmovement trials. During learning, the sustained response was slightly increased in movement trials (bottom left) and occurred also in erroneous nonmovement trials in which a movement was performed (right). The response in correct nonmovement trials resembled that in familiar trials. Same neuron as Fig. 5.

View larger version (21K):
[in a new window]
FIG. 12.
Anatomic positions of neurons in both monkeys showing maintained relationships to behavior and reinforcers during learning. Positions of neurons are labeled according to the 6 possible task relationships, namely responses to and activations preceding instruction, trigger, and reinforcers. Interrupted lines within sections indicate approximate borders of striatal regions (Cd, caudate; Put, putamen; V St, ventral striatum; AC, anterior commissure). Data from both monkeys are plotted on coronal sections from the left hemisphere of one monkey, which are labeled according to the distances from the interaural line (A18-A25).

View larger version (25K):
[in a new window]
FIG. 14.
Transiently decreased reward expectation-related activations during learning. In familiar trials, this putamen neuron was activated preceding liquid reward in rewarded movement and nonmovement trials, but not in unrewarded movement trials. During learning, the activations were initially reduced in both trial types and subsequently attained the level of familiar performance. Only data from correctly performed trials are shown.

View larger version (23K):
[in a new window]
FIG. 15.
Differential sustained decreases of activations following several task events during learning. In familiar trials, this ventral striatum neuron was activated following the instructions, trigger, and liquid reward selectively in rewarded movement and nonmovement trials. During learning, the activation following the instruction was almost completely abolished, the activations following the trigger decreased considerably, and the reward responses remained largely unchanged in both rewarded movement and nonmovement trials.

View larger version (14K):
[in a new window]
FIG. 18.
Increases and new appearances of reward-related activations in 3 caudate neurons during learning. A: transient appearance of reward expectation-related activation in rewarded movement trials. This neuron also showed reward expectation-related activations in nonmovement trials during familiar performance and learning (not shown). B: sustained, massive increase of reward expectation-related activation in nonmovement trials. This neuron showed a similar activation increase in rewarded movement trials, and in initial unrewarded movement trials performed with parameters of rewarded movements (not shown). C: sustained appearance of reward response in nonmovement trials. This neuron failed to respond to liquid reward or sound reinforcer in any other trial type during familiar performance or learning. Only data from correctly performed trials are shown.
![]()
DISCUSSION
Abstract
Introduction
Methods
Results
Discussion
References
; Harlow 1949
). The rapid learning is particularly appropriate for the limited periods of recordings from single neurons. The present learning paradigm was based on a conditional, delayed go-nogo task involving associations between visual stimuli and behavioral reactions without explicit sensorimotor guidance. Delay tasks typically test preparatory and expectation-related functions of the striatum (Alexander and Crutcher 1990
; Apicella et al. 1992
; Divac et al. 1967
; Hikosaka et al. 1989
). Similar conditional delay tasks were used in learning set situations for investigating neurons in prefrontal and premotor cortex (Mitz et al. 1991
; Niki et al. 1990
; Watanabe 1990
) and supplementary and frontal eye fields (Chen and Wise 1995a
,b
, 1996
).
; Apicella et al. 1992
; Hikosaka et al. 1989
; Hollerman et al. 1998
; Schultz et al. 1992
). Apparently, these neurons have access to predictive information stored during previous task experience. In the present experiments, neuronal activity related to movement preparation during the instruction-trigger interval maintained the behavioral relationship and reflected the actually executed behavioral reaction, which did not necessarily correspond to the trial type indicated by the instruction. Thus erroneously performed movements in nonmovement trials were accompanied by inappropriate but otherwise maintained movement-related neuronal activations. Also, the initial "default" expectation of reward with all movements was frequently paralleled by reward expectation-related neuronal activity, which in later trials became restricted to rewarded movements. This occurred in all reward expectation-related activations preceding or following the instruction, trigger, and reward. Correspondingly, activations reflecting the expectation of the auditory reinforcer were rare in initial learning trials and reappeared subsequently. It appears that initially inappropriate neuronal activations reflected inappropriate expectations evoked by instructions of guessed type. These activations adapted and became appropriate during the course of learning when expectations were matched to the new task contingencies and evoked appropriately. Interestingly, neuronal activations adapted a few trials before the behavioral reactions, which was also observed in prefrontal neurons (Watanabe 1990
). Taken together, these data suggest a mechanism for adaptive learning in which existing expectation-related neuronal activity is simply matched to the new conditions, rather than acquiring all task contingencies from scratch.
; Watanabe 1990
). Neurons in premotor cortex, supplementary eye field, and orbitofrontal cortex also showed maintained task relationships, albeit with lower incidence than in anterior striatum (Chen and Wise 1995a
,b
, 1996
; Mitz et al. 1991
; Tremblay and Schultz 1996
). Orbitofrontal cortical neurons changed their activity in parallel with behavioral changes during reversal of visual stimuli (Rolls et al. 1996
; Thorpe et al. 1983
). Interestingly, the inappropriate coding of sample stimuli by hippocampal place cells was suggested to play a role in bringing about inappropriate behavioral reactions (Deadwyler et al. 1996
). By contrast, task-specific expectations would not exist when a new task without preconceived structure is learned by a naive animal. Prefrontal neurons began to show expectation- and preparation-related activations during the learning of spatial delayed response tasks as soon as animals attained substantial performance (Germain and Lamarre 1993
; Kubota and Komatsu 1985
). Similarly, most tonically active neurons in striatum acquired responses to reward-predicting stimuli only after conditioning (Aosaki et al. 1994
). Thus inputs from frontal cortex could mediate some of the maintained activations presently observed during learning set performance in anterior striatum, whereas the contributions of cortical changes during the acquisition of new tasks are difficult to evaluate.
; Mitz et al. 1991
). Similar changes were also seen in prefrontal (Niki et al. 1990
) and hippocampal neurons (Cahusac et al. 1993
). The current longer lasting decreases resemble the "learning-static" decreases outlasting the learning phase in frontal eye field and supplementary eye field neurons (Chen and Wise 1995b
).
) and might correspond to rapidly habituating responses to new visual stimuli in the same structure (Caan et al. 1984
). The activations occurring exclusively during learning were similar to "learning-selective" activations in the supplementary eye field (Chen and Wise 1995a
) and comparable changes in hippocampus (Cahusac et al. 1993
). Such pronounced increases during learning are compatible with decreases in movement-related activations in the supplementary motor cortex with overtrained lever pressing (Aizawa et al. 1991
). The present increases outlasting the learning phase resembled the learning-static increases in frontal and supplementary eye fields (Chen and Wise 1995b
). Decreases in trial selectivity were also observed in prefrontal and orbitofrontal cortex (Tremblay and Schultz 1996
; Watanabe 1990
) and were compatible with the modification or breakdown of directional selectivity in the supplementary eye field (Chen and Wise 1996
). Taken together, frontal cortical areas projecting to the striatum show learning-related changes that may contribute to the increased activations found in the anterior striatum.
).
; Mountcastle et al. 1981
; Sakata et al. 1983
). Inputs from these areas to the striatum might account, at least in part, for the observed learning-related increases in activation.
; Calabresi et al. 1992
; Pennartz et al. 1993
; Wickens et al. 1996
). It could be speculated that transient increases induce synaptic changes at corticostriatal synapses and that more sustained increases participate in the consolidation of these synaptic modifications.
; Mirenowicz and Schultz 1994
; Schultz et al. 1993
, 1997
). They are activated by unpredicted rewards, are not influenced by fully predicted rewards, and are depressed when predicted rewards are omitted. Prediction errors crucially determine the rate of associative learning (Dickinson 1980
; Rescorla and Wagner 1972
). It appears that the rather homogeneous and simultaneous responses of dopamine neurons may constitute a teaching signal that is propagated as a global reinforcement message along diverging axons to neurons in the striatum and frontal cortex. By contrast, many of the presently found changes of striatal neurons were related to adaptations of previously acquired expectations to new task situations, or of reductions of such expectations. This would correspond to the striatal involvement in expectation- and preparation-related cognitive processes. These adaptive changes would not constitute a reward prediction signal suitable for modifying synaptic weights. However, the increased striatal activations during learning might contribute to synaptic modifications, although their heterogeneous nature and the different architecture of axonal projections would not allow them to serve as a global reinforcement message.
| |
ACKNOWLEDGEMENTS |
|---|
We thank D. Ballard, A. Dickinson, and M. Watanabe for discussions and comments and B. Aebischer, J. Corpataux, A. Gaillard, A. Pisani, A. Schwarz, and F. Tinguely for expert technical assistance.
This study was supported by Swiss National Science Foundation Grants 31-28591.90, 31.43331.95, and NFP38.4038-43997), the Roche Research Foundation, Switzerland, and by postdoctoral fellowships from the Fondation pour la Recherche Scientifique of Quebec to L. Tremblay and National Institute of Mental Health Grant MH-10282 to J. R. Hollerman.
Present addresses: L. Tremblay, INSERM Unit 289, Hôpital de la Salpetriére, 47 Boulevard de l'Hôpital, F-75651 Paris, France; and J. Hollerman, Dept. of Psychology, Allegheny College, Meadville, PA 16335.
| |
FOOTNOTES |
|---|
Received 11 February; accepted in final form 9 April, 1998.
| |
REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
Y. Worbe, N. Baup, D. Grabli, M. Chaigneau, S. Mounayar, K. McCairn, J. Feger, and L. Tremblay Behavioral and Movement Disorders Induced by Local Inhibitory Dysfunction in Primate Striatum Cereb Cortex, August 1, 2009; 19(8): 1844 - 1856. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Y. Kimchi, M. M. Torregrossa, J. R. Taylor, and M. Laubach Neuronal Correlates of Instrumental Learning in the Dorsal Striatum J Neurophysiol, July 1, 2009; 102(1): 475 - 489. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Y. Kimchi and M. Laubach Dynamic Encoding of Action Selection by the Medial Striatum J. Neurosci., March 11, 2009; 29(10): 3148 - 3159. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Draganski, F. Kherif, S. Kloppel, P. A. Cook, D. C. Alexander, G. J. M. Parker, R. Deichmann, J. Ashburner, and R. S. J. Frackowiak Evidence for Segregated and Integrative Connectivity Patterns in the Human Basal Ganglia J. Neurosci., July 9, 2008; 28(28): 7143 - 7152. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. S. Lansink, P. M. Goltstein, J. V. Lankelma, R. N. J. M. A. Joosten, B. L. McNaughton, and C. M. A. Pennartz Preferential Reactivation of Motivationally Relevant Information in the Ventral Striatum J. Neurosci., June 18, 2008; 28(25): 6372 - 6382. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Lau and P. W. Glimcher Action and Outcome Encoding in the Primate Caudate Nucleus J. Neurosci., December 26, 2007; 27(52): 14502 - 14514. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. H. Corbit and P. H. Janak Inactivation of the Lateral But Not Medial Dorsal Striatum Eliminates the Excitatory Impact of Pavlovian Stimuli on Instrumental Responding J. Neurosci., December 19, 2007; 27(51): 13977 - 13981. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. W. Balleine, M. R. Delgado, and O. Hikosaka The Role of the Dorsal Striatum in Reward and Decision-Making J. Neurosci., August 1, 2007; 27(31): 8161 - 8165. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Colpaert, W. Koek, M. Kleven, and J. Besnard Induction by Antipsychotics of "Win-Shift" in the Drug Discrimination Paradigm J. Pharmacol. Exp. Ther., July 1, 2007; 322(1): 288 - 298. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. W. German and H. L. Fields Rat Nucleus Accumbens Neurons Persistently Encode Locations Associated With Morphine Reward J Neurophysiol, March 1, 2007; 97(3): 2094 - 2106. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Teagarden and G. V. Rebec Subthalamic and Striatal Neurons Concurrently Process Motor, Limbic, and Associative Information in Rats Performing an Operant Task J Neurophysiol, March 1, 2007; 97(3): 2042 - 2058. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. R. Roesch, T. A. Stalnaker, and G. Schoenbaum Associative Encoding in Anterior Piriform Cortex versus Orbitofrontal Cortex during Odor Discrimination and Reversal Learning Cereb Cortex, March 1, 2007; 17(3): 643 - 652. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. A. Seger and C. M. Cincotta Dynamics of Frontal, Striatal, and Hippocampal Systems during Rule Learning Cereb Cortex, November 1, 2006; 16(11): 1546 - 1555. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Amiez, J.P. Joseph, and E. Procyk Reward Encoding in the Monkey Anterior Cingulate Cortex Cereb Cortex, July 1, 2006; 16(7): 1040 - 1055. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Hikosaka, K. Nakamura, and H. Nakahara Basal Ganglia Orient Eyes to Reward J Neurophysiol, February 1, 2006; 95(2): 567 - 584. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Haruno and M. Kawato Different Neural Correlates of Reward Expectation and Reward Expectation Error in the Putamen and Caudate Nucleus During Stimulus-Action-Reward Association Learning J Neurophysiol, February 1, 2006; 95(2): 948 - 959. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Watanabe and O. Hikosaka Immediate Changes in Anticipatory Activity of Caudate Neurons Associated With Reversal of Position-Reward Contingency J Neurophysiol, September 1, 2005; 94(3): 1879 - 1887. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Campos, B. Breznen, K. Bernheim, and R. A. Andersen Supplementary Motor Area Encodes Reward Expectancy in Eye-Movement Tasks J Neurophysiol, August 1, 2005; 94(2): 1325 - 1335. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. R. Roesch and C. R. Olson Neuronal Activity Dependent on Anticipated and Elapsed Delay in Macaque Prefrontal Cortex, Frontal and Supplementary Eye Fields, and Premotor Cortex J Neurophysiol, August 1, 2005; 94(2): 1469 - 1497. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. I. G. Wilson and E. M. Bowman Rat Nucleus Accumbens Neurons Predominantly Respond to the Outcome-Related Properties of Conditioned Stimuli Rather Than Their Behavioral-Switching Properties J Neurophysiol, July 1, 2005; 94(1): 49 - 61. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. A. Suzuki and E. N. Brown Behavioral and Neurophysiological Analyses of Dynamic Learning Processes Behav Cogn Neurosci Rev, June 1, 2005; 4(2): 67 - 95. [Abstract] [PDF] |
||||
![]() |
C. A. Seger and C. M. Cincotta The Roles of the Caudate Nucleus in Human Classification Learning J. Neurosci., March 16, 2005; 25(11): 2941 - 2951. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Arkadir, G. Morris, E. Vaadia, and H. Bergman Independent Coding of Movement Direction and Reward Prediction by Single Pallidal Neurons J. Neurosci., November 10, 2004; 24(45): 10047 - 10056. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Schmitzer-Torbert and A. D. Redish Neuronal Activity in the Rodent Dorsal Striatum in Sequential Navigation: Separation of Spatial and Reward Responses on the Multiple T Task J Neurophysiol, May 1, 2004; 91(5): 2259 - 2272. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Yamada, N. Matsumoto, and M. Kimura Tonically Active Neurons in the Primate Caudate Nucleus and Putamen Differentially Encode Instructed Motivational Outcomes of Action J. Neurosci., April 7, 2004; 24(14): 3500 - 3510. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Nicola, I. A. Yun, K. T. Wakabayashi, and H. L. Fields Cue-Evoked Firing of Nucleus Accumbens Neurons Encodes Motivational Significance During a Discriminative Stimulus Task J Neurophysiol, April 1, 2004; 91(4): 1840 - 1865. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Tindell, K. C. Berridge, and J. W. Aldridge Ventral Pallidal Representation of Pavlovian Cues and Reward: Population and Rate Codes J. Neurosci., February 4, 2004; 24(5): 1058 - 1069. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Schoenbaum and B. Setlow Lesions of Nucleus Accumbens Disrupt Learning about Aversive Outcomes J. Neurosci., October 29, 2003; 23(30): 9833 - 9841. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. R. Roesch and C. R. Olson Impact of Expected Reward on Neuronal Activity in Prefrontal Cortex, Frontal and Supplementary Eye Fields and Premotor Cortex J Neurophysiol, September 1, 2003; 90(3): 1766 - 1789. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Pessiglione, D. Guehl, Y. Agid, E. C. Hirsch, J. Feger, and L. Tremblay Impairment of context-adapted movement selection in a primate model of presymptomatic Parkinson's disease Brain, June 1, 2003; 126(6): 1392 - 1408. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Toni, J. Rowe, K. E. Stephan, and R. E. Passingham Changes of Cortico-striatal Effective Connectivity during Visuomotor Learning Cereb Cortex, October 1, 2002; 12(10): 1040 - 1047. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. N. D. Kerr and D. Plenz Dendritic Calcium Encodes Striatal Neuron Output during Up-States J. Neurosci., March 1, 2002; 22(5): 1499 - 1512. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Messinger, L. R. Squire, S. M. Zola, and T. D. Albright Neuronal representations of stimulus associations develop in the temporal lobe during learning PNAS, September 19, 2001; (2001) 211431098. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. K. Hassani, H. C. Cromwell, and W. Schultz Influence of Expectation of Different Rewards on Behavior-Related Neuronal Activity in the Striatum J Neurophysiol, June 1, 2001; 85(6): 2477 - 2489. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. T. Williams, M. J. Christie, and O. Manzoni Cellular and Synaptic Adaptations Mediating Opioid Dependence Physiol Rev, January 1, 2001; 81(1): 299 - 343. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. M. Solis, M. S. Brainard, N. A. Hessler, and A. J. Doupe Song selectivity and sensorimotor signals in vocal learning and production PNAS, October 24, 2000; 97(22): 11836 - 11842. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. M. Carelli, S. G. Ijames, and A. J. Crumling Evidence That Separate Neural Circuits in the Nucleus Accumbens Encode Cocaine Versus "Natural" (Water and Food) Reward J. Neurosci., June 1, 2000; 20(11): 4255 - 4266. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Tremblay and W. Schultz Modifications of Reward Expectation-Related Neuronal Activity During Learning in Primate Orbitofrontal Cortex J Neurophysiol, April 1, 2000; 83(4): 1877 - 1885. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Schultz, L. Tremblay, and J. R. Hollerman Reward Processing in Primate Orbitofrontal Cortex and Basal Ganglia Cereb Cortex, March 1, 2000; 10(3): 272 - 283. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. Jog, Y. Kubota, C. I. Connolly, V. Hillegaart, and A. M. Graybiel Building Neural Representations of Habits Science, November 26, 1999; 286(5445): 1745 - 1749. [Abstract] [Full Text] |
||||
![]() |
J. R. Hollerman, L. Tremblay, and W. Schultz Influence of Reward Expectation on Behavior-Related Neuronal Activity in Primate Striatum J Neurophysiol, August 1, 1998; 80(2): 947 - 963. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Messinger, L. R. Squire, S. M. Zola, and T. D. Albright Neuronal representations of stimulus associations develop in the temporal lobe during learning PNAS, October 9, 2001; 98(21): 12239 - 12244. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Visit Other APS Journals Online |