Research suggests that the basal ganglia complex is a major component of the neural circuitry that mediates reward-related processing. However, human studies have not yet characterized the response of the basal ganglia to an isolated reward, as has been done in animals. We developed an event-related functional magnetic resonance imaging paradigm to identify brain areas that are activated after presentation of a reward. Subjects guessed whether the value of a card was higher or lower than the number 5, with monetary rewards as an incentive for correct guesses. They received reward, punishment, or neutral feedback on different trials. Regions in the dorsal and ventral striatum were activated by the paradigm, showing differential responses to reward and punishment. Activation was sustained following a reward feedback, but decreased below baseline following a punishment feedback.
Grasping the mechanisms of motivated behavior requires an understanding of the neural circuitry underlying the processing of reward information. Such circuitry is of particular importance to studies of drug abuse and mood disorders in humans. However, most of the existing knowledge about reward processing has been acquired by animal research. The goal of the present study was to identify brain regions in humans associated with presentation of a reward and to correlate the findings with the current view on the neural circuitry underlying reward processing derived from animal research.
Recent advances in neuroimaging techniques allow reward-related processing and its neural correlates to be studied noninvasively in humans. Past studies have used money as a reinforcer and found increased dopamine release in both dorsal and ventral striatum during a video game playing task (Koepp et al. 1998), and activation of left frontal cortex, thalamus, and midbrain in a delayed go–no go task (Thut et al. 1997). Another positron emmission tomography (PET) study examined the response to nonmonetary feedback in planning and guessing tasks and found activation of the caudate bilaterally when feedback about task performance was given, as opposed to blocks of trials where feedback was absent (Elliott et al. 1998).
While these past studies have demonstrated that neuroimaging can be used to study motivation and reward in humans, their interpretation is limited by the use of blocked designs. In such designs, activation is observed in reference to a block of trials rather than to individual events, as in a behavioral paradigm. Consequently, past studies are not able to clearly dissociate activation related to reward from more general task-related processing effects (e.g., differences in arousal). To overcome this problem, we used an event-related functional magnetic resonance imaging (fMRI) design involving pseudorandom presentation of trials and a simple task paradigm where participants played a card game in which the outcome of each trial was either a rewarding, punishing, or neutral event (Fig. 1 A). Areas involved with sensory and other shared components of stimulus and task processing showed similar patterns of hemodynamic responses regardless of trial type. In contrast, brain regions implicated in the reward circuitry, such as the basal ganglia (Apicella et al. 1991; Schultz et al. 1998), showed different patterns of activation in response to different outcomes.
Nine right-handed volunteers participated in this study (4 female, 5 male). Participants were mostly graduate and undergraduate students drawn from the University of Pittsburgh (average age, 25.67 ± 4, mean ± SD). Participants were asked to fill out a small questionnaire to ensure that they had prior experience with gambling, but were not abusive or excessive in such behavior (i.e., have you played cards for money: not at all, less than once a week or once a week or more). The questionnaire was based on the South Oaks Gambling Screen (Lesieur and Blume 1987). Information about any family history of gambling was not acquired. All participants gave informed consent according to the Institutional Review Board at the University of Pittsburgh.
The paradigm involved a series of 180 trials, divided into 12 runs of 15 trials each. Each trial began with the presentation of a visually displayed card projected onto a screen. The card had an unknown value ranging from 1 to 9, and the participant was instructed to make a guess about the value of the card. A question mark appeared in the center of the card indicating that the participant had 2.5 s to guess whether the card value was higher or lower than the number 5. Participants pressed the left or right button of a response unit to indicate their selection. After the choice-making period, a number appeared in the center of the card for 500 ms, followed by an arrow that was also displayed for another 500 ms. The appearance of a green arrow pointing upward indicated that the participant correctly guessed the card value. Each correct guess led to a reward of $1.00. The appearance of a red arrow pointing downward indicated that the participant incorrectly guessed the card value, leading to a penalty of $0.50. When the number displayed on the card was a 5, then it was followed by a neutral sign (–), which indicated that the participant neither won nor lost money (Fig. 1 A). Trials where a response was not made on time were depicted by a pound sign (#) and were excluded from analysis. After the 3-s delay between presentation of the response cue (question mark) and the reward/punishment/neutral feedback, there was an 11.5-s delay before the onset of the next trial. Thus each experimental session consisted of 180 trials of 15 s each (Fig. 1 B). Stimulus presentation and behavioral data acquisition were controlled by a Macintosh computer with PsyScope software (Macwhinney et al. 1997).
Unknown to the participants, the outcome of each trial was predetermined to be a reward, punishment, or neutral event. Card values were selected only after the participant indicated their guess on each trial. Both reward and punishment events occurred on 40% of the trials, while the neutral events consisted of 20% of the total trials in the paradigm.
Data acquisition and analysis
A conventional 1.5-T GE Signa whole-body scanner and standard radio frequency coil were used to obtain 20 contiguous slices (3.75 × 3.75 × 3.8 mm voxels) parallel to the AC-PC line. Structural images were acquired in the same locations as the functional images, using a standard T1-weighted pulse sequence. Functional images were acquired using a 2-interleave spiral pulse sequence [TR = 1,500 ms, TE = 34 ms, FOV = 24 cm, flip angle = 70° (Noll et al. 1995)]. This T2*-weighted pulse sequence allowed 20 slices to be acquired every 3 s. Images were reconstructed and corrected for motion with Automated Image Registration (Woods et al. 1992), adjusted for scanner drift between runs with an additive baseline correction applied to each voxel-wise time course independently, and detrended with a simple linear regression to adjust for drift within runs. Structural images of each participant were co-registered to a common reference brain (Woods et al. 1993). Functional images were then globally mean-normalized to minimize differences in image intensity within a session and between participants, and smoothed using a three-dimensional gaussian filter (4-mm FWHM) to account for between-subject anatomic differences.
A repeated-measures two-way ANOVA was performed on the entire set of co-registered data, with subjects as a random factor, and condition (reward, punishment, and neutral) and time (4 sequential 3-s scans in a trial of 15 s, referred to as T2–T5) as within-subjects factors. The first scan (T1 period) represented the choice-making period and was not included in the analysis, since there should be no differences between conditions during that period. Voxels identified during the postoutcome period exhibited a main effect of time [F(3, 24) = 10.96, P < 0.0001], main effect of condition [F(2, 16) = 17.29, P < 0.0001], and-or an interaction of condition by time [F(6, 48) = 5.98, P < 0.0001]. Regions comprised of three or more contiguous voxels were selected, as a precaution against type-1 errors (Forman et al. 1995). Therefore inferences were made on regions defined by strength of effect (P< 0.0001) and size (3 or more voxels). Regions of interest were transformed to standard Talairach stereotaxic space (Talairach and Tournoux 1988) using AFNI software (Cox 1996). Further evaluation of the effects of condition and time were done by analysis of event-related time-series data for each region of interest, which depict fMRI mean intensity value for each condition for time periods T1–T5.
Main effect of time and condition
A two-way repeated measures ANOVA yielded regions that exhibited a significant main effect of time, listed in Table1. These regions of interest (ROIs) represent voxel clusters that changed activity during the postoutcome period. Most ROIs exhibited an increase in activity at the onset of each trial that decayed back to baseline before the next trial. As expected, most areas also showed a similar pattern of response across the different types of trials. For instance, sensory areas such as bilateral fusiform gyrus showed a similar pattern of response regardless of the type of outcome (Fig.2).
Three striatal regions previously associated with reward-related processing in animal studies showed a main effect of time (Hikosaka et al. 1989; Robbins and Everitt 1992; Schultz et al. 1998). Both the left and right caudate nucleus, components of the dorsal striatum, showed an increase in activation that was more sustained for trials associated with a rewarding outcome than with a punishing outcome. A left-lateralized response was also found in the ventral striatum. Much like the dorsal striatum, the ventral striatum showed a tendency to differentiate between reward and punishment trials.
No region showed a significant main effect of condition [F(2, 16) < 17.29, fewer than 3 contiguous voxels atP > 0.0001].
Interaction of condition by time
Differential striatal responses to reward and punishment trials can be best characterized by determining which voxels show a time course of activation that differs by condition (Table2). An analysis of the interaction of condition by time yielded dorsal striatal activation bilaterally. Both the left and right caudate regions showed different responses between reward and punishment (Fig.3 A), a result that is consistent with the event-related time-series data for the striatal regions that showed a main effect of time. At the onset of a trial, there was an increase in activation leading up to the revelation of the outcome at the second time point (T2, 3 s into the trial). When a reward was the outcome, activation was first sustained and then slowly decayed back to baseline for the onset of the next trial. In contrast, when the outcome was a punishment, activation decreased sharply below baseline.
A significant condition and time interaction in the ventral striatum, as suggested by the time series analysis of the main effects, was not found. However, activation in this area was observed in six participants. To further investigate behavior of the ventral striatum region, a less strict exploratory analysis was conducted [F(6, 48) = 4.55, P < 0.001]. At this threshold, a left ventral striatal region was identified that showed a sustained response following a reward and a decrease in activation after a punishment (Fig. 3 B), a pattern that was similar to the one observed in the caudate bilaterally.
The exploratory analysis also revealed activation in the left medial temporal lobe, which could also be identified as a significant region of activation in five participants. The activation falls rostromedial to the hippocampus and caudal to the amygdala—two structures known to project to the ventral striatum (Groenewegen et al. 1999). A similar pattern of activation to the striatum was produced, with activation being sustained after a reward, contrasted with a decrease in response associated with punishment (Fig.3 B).
Although a clear difference between reward and punishment responses is observed in the above-mentioned areas, the same cannot be said about neutral and reward events in the dorsal striatum. In a separate post hoc analysis, we performed a repeated measures ANOVA to identify areas that significantly differed between the reward and neutral conditions. Regions of interest that showed a significant interaction of condition by time included left caudate, left ventral striatum, and left medial temporal lobe [F(3, 24) = 3.01, P < 0.05]. A significant difference was not found in the right caudate.
Finally, three frontal regions also showed an interaction of condition by time. The interactions in these areas were driven mostly by the neutral condition, which significantly differed from the reward and punishment conditions. An additional analysis performed without the neutral trials revealed that none of the previously identified frontal regions were activated [F(3, 24) < 7.55,P > 0.001].
Using an event-related fMRI design, this study attempted to isolate and measure the neural response to stimuli that arouse emotion. During a time-period where feedback with a reward, punishment, or neutral value was given, activation was observed in brain regions implicated in reward processing and nonreward related areas responding to general sensory and cognitive components of the task. While nonreward related areas (i.e., sensory regions such as bilateral fusiform gyrus) showed a similar pattern of activation irrespective of the valence of the feedback, the pattern in reward-related areas activated by the task (i.e., basal ganglia) differentiated between reward and punishment. The data provide evidence for the involvement of the basal ganglia complex, particularly the striatum, in the processing of reward-related information. The striatum is thought to be engaged in the integration of reward-related information in the brain, receiving input from cortical and limbic regions that may be further modulated or shaped by mesencephalic dopaminergic projections (Moore et al. 1999; Rolls 1999; Schultz 1998).
The strongest activation was in the dorsal striatum, localized more specifically to the caudate bilaterally. The hemodynamic response in the caudate region showed differential responses between reward and punishment outcomes. Following a reward, activation was sustained, while after a punishment activation decreased sharply below baseline. The dorsal striatum has been implicated in the processing of reward information by lesion work done in rats (Robbins and Everitt 1992; Salinas et al. 1998), primate single-cell recordings (Hikosaka et al. 1989; Kawagoe et al. 1998; Schultz 1998), and human neuroimaging studies (Elliot et al. 1998; Koepp et al. 1998). For instance, after delivery of a reward, neuronal responses in nonhuman primates have been recorded in caudate nucleus during both go and no-go trials (Apicella et al. 1991), and during a saccade-reward task (Hikosaka et al. 1989). In humans, caudate activation has been observed with reinforcers such as cocaine (Breiter et al. 1997), nicotine (Stein et al. 1998), money (Koepp et al. 1998), and even feedback about performance in a behavioral task (Elliott et al. 1998).
The ventral striatum showed a main effect of time, indicating its recruitment during the game. Furthermore, a ventral striatal region that showed a condition by time interaction was localized in an exploratory analysis. This region showed a pattern that was similar to the response of its dorsal component, characterized by a sustaining of the activation in reward trials and a decrease in activation during punishment trials. These findings are consistent with prior work in animals and humans that implicate the ventral striatum in the processing of rewarding information. For instance, neurons in this region respond to primary rewards in a go no-go task (Apicella et al. 1991), lesions of the ventral striatum abolish learned responses that lead to a conditioned reinforcer (Robbins and Everitt 1992), and increased dopamine release in the ventral striatum is observed in humans during playing of a game for money (Koepp et al. 1998). Further evidence linking ventral striatum and reward-related information comes from its intrinsic connections with orbitofrontal cortex and limbic regions such as the amygdala, regions known to be involved in processing of motivational and emotional information (Rolls 1999).
Besides striatal activation, the exploratory analysis also yielded a region of activation in the medial temporal lobe. The same difference in response between reward and punishment outcomes observed in the striatum was repeated in this region. Precise localization, however, was not possible, due to the loss of resolution that results from a group-averaging analysis. Both the amygdala and hippocampus project to nucleus accumbens, a component of the ventral striatum. There is evidence that both structures are involved in reward processing (Salinas and White 1998), although the hippocampus may be primarily involved with more contextual aspects of reward (Moore et al. 1999). The pattern of response exhibited by the medial temporal region is more in accordance with the expected behavior of the amygdala. For example, the amygdala is more effective than hippocampus at driving cells in the nucleus accumbens (Grace et al. 1998), and its activity is correlated with enhanced recognition of pleasant and aversive emotional pictures (Hamann et al. 1999).
The pattern of response observed in the dorsal striatum was also characterized by a peak in activity at time point T2 for punishment events. It is feasible that the observed activity may reflect an early response to punishment in the caudate. Participants become aware of the value of the outcome 2.5 s into the task. Also, time point T2 reflects the average of activity between 3 and 6 s into the task. Further testing of this idea is necessary, perhaps making use of faster and more powerful scanning techniques.
Across the set of reward-related regions, a distinct pattern of lateralization was observed. In all areas that showed an interaction between condition and time, the effect was stronger in the left hemisphere. Similar lateralization has also been observed in other tasks involving monetary compensation as an incentive (Koepp et al. 1998; Thut et al. 1997). This suggests an association between the left hemisphere and the processing of secondary reinforcers, such as money. Furthermore, in this experiment the left caudate showed a stronger difference in response between reward and neutral events than the right caudate. This suggests that the left hemisphere may be more dominant when it comes to the processing of reward or positive information. This is supported by studies in normal subjects (Davidson and Irwin 1999) and patients with mood disorders that show left-lateralized regions of decreased metabolism in depressed patients (Drevets et al. 1998).
In summary, the goals of this experiment were to identify brain areas activated after presentation of a reward in a single trial, to map the temporal dynamics of such areas, and to compare the results to the existing animal literature. Using an event-related fMRI design in a simple, yet engaging paradigm allowed for the isolation of the rewards, while minimizing other possible cognitive and nonreward related confounds (such as pressing a button to make response). Nonreward related brain regions (i.e., sensory areas) as well as dorsal and ventral striatum were recruited during the game playing. As hypothesized, however, only reward-related areas such as the striatum showed responses that differed according to the valence of trial outcomes. This study shows that in humans, the basal ganglia complex is involved in reward processing, a finding that supports existing animal literature linking the dorsal and ventral striatum with reward-related activity. It also suggests that the striatum is able to differentiate between gains and losses. Finally, this study provides a foundation for the use of neuroimaging as a technique for probing the function of the human reward circuitry, and for investigating how breakdowns in the normal circuitry could give rise to addictive and mood disorders.
The authors thank V. Ortega and H. Sypher for technical assistance, as well as L. Miner, S. Sesack, C. Carter, and B. Skaggs for comments and suggestions on an earlier version of the manuscript.
This work was supported by National Science Foundation Grant LIS 9720350 and a NSF Graduate Research Fellowship.
Address for reprint requests: M. Delgado, Dept. of Neuroscience, 446 Crawford Hall, University of Pittsburgh, Pittsburgh, PA 15260 (E-mail:).
- Copyright © 2000 The American Physiological Society