Delayed rewards lose their value for economic decisions and constitute weaker reinforcers for learning. Temporal discounting of reward value already occurs within a few seconds in animals, which allows investigations of the underlying neurophysiological mechanisms. However, it is difficult to relate these mechanisms to human discounting behavior, which is usually studied over days and months and may engage different brain processes. Our study aimed to bridge the gap by using very short delays and measuring human functional magnetic resonance responses in one of the key reward centers of the brain, the ventral striatum. We used psychometric methods to assess subjective timing and valuation of monetary rewards with delays of 4.0–13.5 s. We demonstrated hyperbolic and exponential decreases of striatal responses to reward predicting stimuli within this time range, irrespective of changes in reward rate. Lower reward magnitudes induced steeper behavioral and striatal discounting. By contrast, striatal responses following the delivery of reward reflected the uncertainty in subjective timing associated with delayed rewards rather than value discounting. These data suggest that delays of a few seconds affect the neural processing of predicted reward value in the ventral striatum and engage the temporal sensitivity of reward responses. Comparisons with electrophysiological animal data suggest that ventral striatal reward discounting may involve dopaminergic and orbitofrontal inputs.
Time is a theoretical but quantifiable construct that provides a common description for the unidirectional changes of successive states and events observed in the physical and biological world. Time is of crucial importance for biological organisms that rely on time-dependent resources. Temporal delays often reduce the subjective value of rewards. Underlying factors include the frequent need for immediate energy supply of individuals, the uncertainty associated with temporal delays, and irrational and emotional factors associated with less tangible, distant rewards. The different reasons for temporal discounting have led to various concepts that range from the uniform reduction of subjective reward value by delays to the engagement of separate systems mediating the evaluation of immediate and delayed rewards (beta and delta dual processes) (Ainslie 1975; Kirby 1997; Laibson 1997; Lowenstein and Prelec 1992; Thaler 1981). Classic accounts of animal learning describe the lower efficacy of late rewards on learning (Holland 1980) possibly deriving from weaker value teaching signals. By contrast, situations involving the deferred consumption of reward require the decision maker to inhibit the natural impulse for immediate consumption, thus linking time to impulsive behavior (Ainslie 1975). Thus temporal delays appear to weaken reward value, engage particular cognitive and emotional processes and determine learning. Thus time appears to constitute a fundamental aspect of reward function, and organisms are sensitive to delays of reward when making decisions.
Human neuroimaging studies demonstrate consistent responses to reward in the ventral striatum (O'Doherty 2004). These signals reflect reward value by coding the quantity and probability of reward (Knutson et al. 2005; Preuschoff et al. 2006; Tobler et al. 2007b). Decreases of these signals with reward delays over hours, days, and months suggest that the coding may be subjective rather than representing the objective, physical value of rewards (Kable and Glimcher 2007). Studies based on the concept of dual process discounting also describe stronger activations for immediate as opposed to delayed rewards (McClure et al. 2004, 2007; Tanaka et al. 2004; Wittmann et al. 2007), and individual differences in ventral striatum activation correlate with subjective preferences for immediate over delayed rewards (Hariri et al. 2006). Temporal discounting is affected by lesions of the ventral striatum (Cardinal et al. 2001). Irrespective of theoretical assumptions and employed methods, these studies identify the ventral striatum as an important structure for processing temporal reward delays.
Neurophysiological single neuron studies afford high temporal and spatial resolution. They report negative influences of delay on reward value signals in dopamine and cortical neurons in parallel with behavioral discounting (Kobayashi and Schultz 2008; Roesch and Olson 2005a,b; Roesch et al. 2006, 2007). These results are compatible with the concept of a single rather than separate temporal discounting systems. As routine invasive studies are only possible in animals, the knowledge gained from these studies should be used to interpret the human imaging responses. However, the experimental conditions of the human temporal discounting studies reported so far differed in several important aspects from those employed in animals. Most human discounting studies identified separate brain systems mediating immediate and delayed rewards, except one investigation assuming scalar reward value coding (Kable and Glimcher 2007). Furthermore the reward delays of days, weeks, and months are well beyond the range of a few seconds used in animals, and even the shortest tested delays of minutes are impractical with animals (McClure et al. 2007). Although hypothetical and real monetary rewards may produce similar discounting (Johnson and Bickel 2002), any reward paid out after long delays as a sum over many trials constitutes a less direct and motivating event as a reward delivered after every trial. Thus the conceptual and methodological differences of the human imaging studies done so far constrain considerably the explanatory power of animal data.
The present study aimed to narrow the gap between human and animal temporal discounting studies by addressing a number of questions that are particularly relevant for animal experiments. Using time courses in the range of seconds rather than minutes, days or months, we searched for influences of reward delay on scalar, blood-oxygen-level-dependent brain responses (BOLD) to reward predicting stimuli and rewards in the ventral striatum rather than investigating the activations of separate delay systems. The ventral striatum constitutes the key component of the brain's reward value system. It receives inputs from other well-known reward structures, including dopamine neurons and orbitofrontal cortex, and often shows the strongest reward BOLD response among all brain structures (Kable and Glimcher 2007; O'Doherty 2004). We presented Pavlovian conditioned reward-predicting stimuli as the most direct and parsimonious way for eliciting and interpreting brain responses without potentially confounding choice behavior combined with quantitative psychometric tests for time perception and reward valuation. We used indicators for monetary reward in every trial rather than hypothetical or very distant payoffs. Different intertrial interval schedules helped us to distinguish reward delay from reward rate discounting.
Participants performed in the temporal discounting task during scanning using functional magnetic resonance imaging (fMRI). Specific visual stimuli predicted one of four delays of 4, 6, 9, or 13.5 s after which a picture of a £ 20 UK note appeared as reward (Fig. 1A) or £ 5 for magnitude tests. Participants were informed that they would receive a percentage of the displayed sum as cash money immediately after scanning. We employed two intertrial interval (ITI) schedules. The fixed ITI schedule used a constant mean ITI of 9.375 s irrespective of reward delay (Fig. 1B, top); thus shorter delays resulted in higher, trial-specific reward rates (reward/unit time). By contrast, the adjusted ITI schedule compensated longer delays by shorter ITIs to obtain constant mean cycle lengths of 17.5 s and constant reward rates (Fig. 1B, bottom). Overall reward density was the same in the two schedules.
The study used additional behavioral tasks outside the scanner. We employed a modified peak interval procedure (PIP) to assess subjective time perception and pleasantness ratings to indicate learned reward value. The intertemporal choice task served to measure the subjective value of delayed rewards at choice indifference. Here participants chose between an adjustable immediate reward and a fixed delayed reward (£ 20) using an iterative convergent staircase procedure [parameter estimation by sequential testing (PEST)]. To determine behavioral discounting, we fit the indifference values to hyperbolic and exponential functions based on the PIP estimated subjective delays.
We performed two analyses to establish the relationships between BOLD (blood-oxygen-level-derived) responses to reward delay predicting stimuli and the subjective valuation of delayed rewards. In the first analysis, we regressed BOLD responses on intertemporal choice indifference values for each delay in each participant, rather than on fitted functions. For the second analysis, we first fit the BOLD responses to hyperbolic and exponential functions using PIP estimated subjective delays. Then we correlated the individual discounting factors between BOLD and behavioral responses.
We analyzed responses to the reception of reward in a similar way, and in addition determined their relationship to the subjective uncertainty in reward timing measured in the PIP.
Fifteen right-handed healthy individuals (mean age: 26.7 yr; range: 22–34 yr; 7 females) participated in both the behavioral and scanning tests. A further 13 individuals (mean age: 22 yr, range: 18–28 yr; 5 females) participated in a magnitude control test. All participants were right-handed and had normal or corrected-to-normal vision in the scanner. Participants were screened to ensure they satisfied MRI safety requirements. The female participants were not pregnant. We excluded persons with prior neurological or psychiatric illness but were not allowed to inquire about first-degree relatives. All participants reported to be healthy and had no recent or current medication except four women using contraceptives. All participants were current or former university students and gave informed written consent, and we knew many of them personally; thus they were unlikely to be drug addicts without being systematically confirmed. Smoking, alcohol consumption, toxic substance exposure and menstrual cycle were not monitored. The Local Research Ethics Committee of the Cambridgeshire Health Authority approved the study.
The experiment employed three different tasks on 15 participants, namely the temporal discounting task used before and during scanning, the peak interval procedure for assessing the subjective perception of the duration of the reward delay outside the scanner, and the intertemporal choice task for assessing the subjective valuation of the delayed rewards outside the scanner. We employed a separate group of 13 additional participants to specifically investigate the effects of reward magnitude on discounting. In all tasks artificial, visual images were presented as conditioned stimuli indicating different reward delays. Rewards consisted of pictures of British monetary bank notes or numbers representing monetary values in similar ranges. Participants were instructed that a percentage of each monetary amount shown would be paid out at the end of the session. This percentage was 1% but was not indicated to the participants to prevent calculations during scanning. Throughout the training and scanning, the total points accumulated were displayed and updated at the time of reward delivery. In error trials, a red square appeared in the middle of the screen, and the trial was repeated later within the same block. Stimulus delivery on a computer monitor and operant reactions were controlled using purpose written software in Matlab 7.01 and Cogent 2000 (Mathworks, Natick, MA; Wellcome Department of Imaging Neuroscience, London, UK).
TEMPORAL DISCOUNTING TASK.
This task involved Pavlovian conditioned predictors of reward delay and a conditional motor reaction for discriminating the delay predicting stimuli. In each trial, one of four possible conditioned stimuli appeared at the center of the monitor. Within 1 s, participants pressed one of four buttons with their right hand. Each button was specifically associated with the reward delay predicting stimulus. Participants received the reward after the delay. Stimuli were counterbalanced across participants. Each stimulus terminated after delays of 4, 6, 9, or 13.5 s, respectively, and was replaced by a 1-s presentation of a 20 £ note as reward at the center of the monitor (Fig. 1A). Reward delay was defined as the interval between stimulus onset and reward onset. Incorrect or late lever presses resulted in presentation of a red square and went unrewarded. Data from incorrect trials were disregarded.
We used two different ITI schedules in separate trial blocks (Fig. 1B), indicated by different background colors on the monitor and employing two different sets of visual stimuli. Within each ITI schedule, trial types varied pseudorandomly. In the fixed ITI schedule, the ITI lasted for 7.375 s plus a duration drawn from a truncated Poisson distribution with mean of 2.0 s and maximum of 8.0 s, irrespective of reward delay. Cycle length was defined as sum of: stimulus duration +1 s reward display + ITI to next stimulus onset. Mean cycle length across all four delay trial types was 17.5 s. In the adjusted ITI schedule, cycle length across trial types was constant with 17.5 s mean (15.5 s + Poisson-mean of 2.0 s and 8.0 s maximum) by compensating different stimulus durations by ITIs. Overall reward density (£/s) was identical in both ITI schedules.
We tested the effects of reward magnitude on temporal discounting in the separate group of 13 participants by using 5 and 20 £ notes and three reward delays, 4, 8, and 12 s. The ITI was 12 s fixed +3 s mean (varying according to Poisson distribution truncated at 9 s), irrespective of the reward delay.
The test assessed in an objective manner the subjective time perception of reward delays (Roberts 1981). We tested the main group of 15 participants, and the separate group of 13 participants for reward magnitude effects. In addition to the subjective valuation of delayed rewards, temporal delays themselves are perceived and processed in a subjective manner with variations among individuals (Meck 2005), and a comprehensive view on timing processes should incorporate both objective and subjectively estimated delays. In unrewarded PIP test trials, the stimulus outlasted the normal reward time by three times the stimulus-reward interval. In addition to pressing one of the four specific buttons associated with the predicted delays of 4, 6, 9, or 13.5 s, participants pressed a fifth button to indicate the expected time of reward (PIP button). We replaced the usual multiple button press of standard PIPs by the single press for convenience of responding and reduction of behavioral errors. Subsequently, pressing a sixth button could terminate a proportion of PIP trials. In 70% of trials, the sixth button was active only if the usual delay to reward had elapsed, in the remaining 30% of trials, it would terminate the trial even if the usual delay to reward had not yet elapsed. This arrangement prevented participants from using the active or inactive status of the sixth button as indicator for their response (Rakitin et al. 1998).
INTERTEMPORAL CHOICE TASK.
Microeconomic theory refers to the subjective value of outcomes as utility, and temporal discounting of utility would establish a utility function of values and assess its change with delays (Kreps and Porteus 1978). For reasons of simplicity, our study followed the psychological tradition and measured the single scalar variable of value. We used the adjusting amount procedure together with the PEST procedure (Luce 2000) to assess in an objective manner the subjective value of rewards delivered after different delays (Richards et al. 1997) in the main group of 15 participants. The additional 13 participants tested only with reward magnitude did not undergo the PEST procedure; we used pleasantness ratings instead to assess their subjective reward valuation. The PEST procedure resembled a staircase method in psychophysics. It employed the same stimuli and rewards as previously learned in the discounting task, using both fixed and adjusted ITI schedules. In each trial, participants were presented with one of the previously learned stimuli predicting delayed £ 20 and an alternative stimulus predicting an immediate amount. The immediate amount started at 50% of the maximum amount, iteratively changed value to produce preference reversals while halving the step size on every reversal and thus approached the choice indifference probability of P = 0.5. Participants chose by differential button press between a standard conditioned stimulus and an adjusted amount of reward presented as a one decimal real number shown immediately after button press. The immediate amount was adjusted until participants chose the immediate and delayed options with equal probability of P = 0.5 each (choice indifference). Thus the immediate amount at choice indifference determined the subjective value of the £ 20 delivered at each delay. By adjusting the immediate rather than the delayed reward, we obtained, at choice indifference, a direct readout of the subjective value of the delayed reward as close as possible to the stimulus and choice. Trials with button press latencies >1.0 s were discarded and repeated. We used the intertemporal choice task with the PEST procedure during one behavioral training session no more than 1 wk before scanning and immediately after scanning in the same session. In each participant, we fit the immediate reward amounts at choice indifference across the delays with different functions and obtained the discounting factors by minimizing the mean squared errors. Employed functions were: hyperbolic V = A/(1 + kD); exponential: V = Ae−kD; V = value, A = amount = £ 20; D = delay (in s); k is discounting factor). The goodness of fit was expressed by the “squared correlation coefficient” R2 = 1 − (error sum of squares)/(total sum of squares) (which may become negative when fitting an imposed function).
All participants received full trial-and-error training <1 wk before scanning in one session using the temporal discounting task (30 trials for each delay for each ITI schedule), the PIP task (10–20 trial/delay for each ITI schedule), and the amount-adjusting procedure (1 PEST procedure/delay for each ITI schedule; main 15 participants only). For scanning, participants were placed on a moveable bed in the scanner with light head restraint to limit head movement during image acquisition. Participants viewed the computer monitor through a mirror fitted on top of the head coil and performed the temporal discounting task.
All participants rated the pleasantness of visual stimuli four times (before and after the training and scanning sessions) on a scale ranging from 1 = very unpleasant to 5 = very pleasant. We evaluated ratings statistically by repeated-measures ANOVA. An interaction analysis between trial type and time (before training and after scanning) assessed changes in pleasantness ratings induced by the conditioning procedure.
We acquired gradient echo T2*-weighted echo-planar images (EPIs) with BOLD contrast on a Siemens Trio 3.0 Tesla scanner (32 slice/volume, 2-s repetition time). Scanning in each participant was split into in three sessions of approximately equal duration, each session consisted of three blocks of fixed ITI and three blocks of adjusted ITI trials, and each block contained randomly interspersed three to four trials of the each of the four delays. Block order was counterbalanced across participants. Thus each participant performed 30 trials for each of the four delays for each of the two ITI schedules. Depending on individual performance, 739–830 V were collected per session, together with 10 “dummy” volumes at the start of the scanning session. Scan onset times varied randomly relative to stimulus onset times. A high-resolution, structural, spoiled, gradient recalled acquisition weighted structural image was also acquired for each participant. Signal dropout in basal frontal and medial temporal structures due to susceptibility artifact was reduced by using a tilted plane of acquisition (30° to the anterior commissure-posterior commissure line, rostral > caudal). Scanning parameters were: echo time, 30 ms; field-of-view, 192 × 192 mm. The in-plane resolution was 3x3 mm; with a slice thickness of 3 mm and an interslice gap of 25%. High-resolution structural scans were coregistered to their mean EPIs and averaged together to permit anatomical localization of the functional activations at the group level.
Analysis of BOLD responses
Statistical Parametric Mapping (SPM2; Functional Imaging Laboratory, London, UK) served to spatially realign functional data, normalize them to a standard EPI template and smooth them using an isometric Gaussian kernel with a full width at half-maximum of 8 mm. Time series in each block were high-pass filtered (to maximum of 1/120 Hz), and serial autocorrelations were estimated using a first-order autoregression model (AR-1). Functional data were analyzed by constructing a set of stick functions at the event-onset times for each of the four trial types for error trials and at the time of reward. The stick function regressors were convolved with a canonical hemodynamic response function (HRF) and its temporal derivative.
BOLD RESPONSES TO REWARD DELAY PREDICTING STIMULI.
A general linear model served to compute trial type specific parameter estimates, in particular regression slopes (betas), reflecting the strength of covariance between the measured brain activation and the modeled canonical response function for a given condition, at each voxel for each participant (Friston et al. 1995). Our standard general linear model (GLM) used a multiple linear regression described by y = α + β1*FD1 + β2*FD2 + β3*FD3 + β4*FD4 + β5*AD1 + β6*AD2 + β7*AD3 + β8*AD4 + β9*R + b10*E + β11*M1 + β12*M2 + β13*M3 + β14*M4 + β15*M5 + β16*M6 + ε with y as BOLD response, α as y intercept, β's slope parameter estimates, FD1-4 stimuli following fixed ITIs 1–4, AD1-4 stimuli following adjusted ITIs 1–4, R as reward, E as behavioral error, M1-6 motion artifacts 1–6, ε residual. Reward, errors, and motion artifacts were modeled as regressors of no interest.
The analysis of BOLD responses to the reward delay predicting stimuli involved a second step in which we determined contrast estimates as linear combinations of (slope parameter estimates β1–β8) multiplied by (discounted reward value at each delay, mean-corrected within the 4 delays of fixed and adjusted ITIs), using the behavioral indifference values from each individual participant. Effects of interest were expressed as regression slope coefficients betas and percentages of signal change and calculated relative to an implicit baseline. Using random-effects analysis, the relevant contrasts were entered into a series of one-way t-test, simple regressions or repeated-measures ANOVAs with nonsphericity correction where appropriate.
Additional analyses served to further characterize the neural mechanisms of temporal discounting. In separate time course analyses, we made no assumptions about the shape of activations and used 16 finite impulse responses per trial, each response being separated from the next by one scan (2 s). We fit the peaks of BOLD responses to reward predicting stimuli to hyperbolic and exponential functions by the least mean squared errors method. To relate brain activation to behavioral discounting, we correlated the neural with the behavioral discounting factors.
To quantify the neural involvement in temporal discounting, we performed ROC analysis to calculate the probability with which an ideal observer could distinguish between any two different reward delays on the basis of BOLD responses to the stimuli, separately for discounters and nondiscounters (Chandrasekaran et al. 2007). We calculated for each trial the mean percentage signal change of the ventral striatal BOLD response identified by regression with the general linear model. We computed the probability of BOLD response equal to or higher than criterion for each combination of two distributions and plotted these probabilities against each other in two dimensions. The area under the ROC curve reflected the probability of discriminating between two delay predicting stimuli in the interval of P = 0.5 (chance) and P = 1.0 (perfect discrimination). We used a permutation test with 5,000 iterations to define statistical significance as the probability for the original ROC value being below or above a given percentile of the probability distribution of shuffled ROCs. For example, a P < 0.05 indicated an ROC below the 2.5th or above the 97.5th percentile of the shuffled distribution.
We used the BOLD responses to predict the classification of each individual participant as a behavioral discounter and a nondiscounter with Fisher's linear discriminant analysis (Krzanowski 1988). We trained a classifier on the data obtained with the fixed ITI schedule and tested the classification on data from the adjusted ITI schedule, and vice versa. The significance of discrimination into the two groups as opposed to random classification was assessed with the χ2 test (P < 0.05).
All paired and unpaired two sample nonparametric comparisons used two-tailed values from Wilcoxon and Mann-Whitney tests on behavioral and BOLD data from groups of individual participants.
BOLD RESPONSES TO THE REWARD.
We used a similar general linear model as for the responses to the reward predicting stimuli but used the following regressors: y = α + β1*FR1 + β2*FR2 + β3*FR3 + β4*FR4 + β5*AR1 + β6*AR2 + β7*AR3 + β8*AR4 + β9*S + b10*E + β11*M1 + β12*M2 + β13*M3 + β14*M4 + β15*M5 + β16*M6 + ε with y as BOLD response, α as y intercept, β's slope parameter estimates, FR1-4 rewards preceding fixed ITIs 1–4, AR1-4 rewards preceding adjusted ITIs 1–4, S as stimuli, E as behavioral error, M1-6 motion artifacts 1–6, ε residual. Reward, errors, and motion artifacts were modeled as regressors of no interest. The remaining analysis was identical to that used for the stimulus responses, although we fitted the BOLD responses only to exponential functions.
SELECTION OF REGION OF INTEREST (ROI).
We selected the ventral striatum as the prime a priori ROI for coding reward value, including its decrease with temporal delays (Elliott et al. 2000; Kable and Glimcher 2007; Knutson et al. 2005; Martin-Soelch et al. 2003; McClure et al. 2004, 2007; Tanaka et al. 2004; Tobler et al. 2007b; Yacubian et al. 2006). The ventral striatum includes the nucleus accumbens, the ventral caudate nucleus and putamen rostral to the anterior commissure. It was defined anatomically according to Rorden and Brett (2000), Martinez et al. (2003), and Murray et al. (2008). We report activations above a threshold of P < 0.05 with small volume (ventral striatum) correction for multiple comparisons using false discovery rate (Benjamini and Hochberg 1995) as implemented in the Pickatlas Toolbox (Maldjian et al. 2003). Reported voxels conform to Montreal Neurological Institute (MNI) coordinate space with the right-hand side of the image corresponding to the right side of the brain.
Performance in the PIP indicated that participants underestimated the shorter delays of 4, 6, and 9 s slightly, and the delay of 13. 5 s by ∼1–2 s (Fig. 2A). The subjectively perceived delays, rather than their actual values, were then used as independent variables for the analysis of behavioral and brain responses.
Subjective pleasantness ratings for the eight pretrained, delay-predicting stimuli suggested mostly monotonic decreases in perceived reward value across delays (Fig. 2B; pooled fixed and adjusted ITI schedules: rho = –0.91, P = 0.02; fixed ITIs: rho = −1.0, P = 0.08; adjusted ITIs: rho = −0.8, P = 0.17; Spearman rank correlation). The ratings were not significantly affected in individual participants by brain scanning (P > 0.28 before versus after scanning, ANOVA; P > 0.1 for all post hoc two-sample comparisons, t-test).
The intertemporal choice task comprised a choice between a conditioned stimulus predicting the standard £ 20 after the previously learned delay (4, 6, 9, or 13.5) and an immediate reward of varying magnitude. We assessed choice preference as probability of choosing the immediate reward over the alternative, delayed reward while varying the size of the immediate reward according to the adjusting amount procedure. For all delays, choice preference for the immediate reward increased by stepping up its amount (Fig. 2C). The amount of the immediate reward in the PEST procedure converged regularly to identify the value of each delayed reward at choice indifference (probability P = 0.5 of choosing the immediate reward; Fig. 2D). Indifference values decreased monotonically across the four delays and fit both hyperbolic and exponential functions (Fig. 2, E and F; Table 1). The decreases in indifference values correlated well with the pleasantness ratings across the four delays (pooled fixed and adjusted ITI schedules: rho = 0.22, P = 0.02; fixed ITIs: rho = 0.25, P = 0.05; adjusted ITIs: rho = 0.18, P = 0.18; Spearman rank test on individual participants).
Discounting factors k and correlation coefficients R2 in the 15 participants ranged from k = 0.00 to k = 0.35 and from R2 = 0.46 to R2 = 0.57 for hyperbolic and from k = 0.00 to k = 0.22 and from R2 = 0.45 to R2 = 0.59 for exponential functions (Table 1). The differences in correlation coefficients R2 statistically failed to reach significance (P > 0.01) in comparisons between hyperbolically and exponentially fitted functions, fixed and adjusted ITI schedules for both hyperbolic and exponential functions, and actual imposed and PIP estimated, slightly shorter delays.
For further analysis, we separated the seven strongest discounters from seven nondiscounters by median split of Spearman's rank coefficients rho of correlation between indifference values and delays (15 participants; means from fixed and adjusted ITIs). As with the whole population of participants, there were only insignificant differences in R2's between hyperbolic and exponential functions, and between fixed and adjusted ITI schedules for hyperbolic and exponential functions (Table 2).
These data suggest different degrees of reward value discounting at delays of only a few seconds within the investigated group of human participants. The discounting occurred despite constant reward rates in the adjusted schedule, suggesting delay rather than overall rate (amount per time) as the crucial factor determining the subjective temporal valuation of monetary rewards at delays of seconds.
BOLD responses to value predicting stimuli
GROUP AND SUBGROUP ANALYSES.
We regressed BOLD responses to the four stimuli against the behavioral indifference values of each participant measured in the intertemporal choice task, irrespective of any particular discounting model. The regression identified one large group and two small groups of voxels in the striatum in which BOLD responses decreased across the four delays according to the individual indifference values. Subsequent analysis revealed statistical significance with small volume correction in the large ventral striatal group of voxels (circle in Fig. 3 A; P < 0.05). The activation was significant for both one common set of regressors for the four delays in the two ITI schedules (8 regressors; P < 0.05) and for two separate sets of regressors for the two ITI schedules (2 × 4 regressors; P < 0.05). These data suggest that predictive reward value signals in the human ventral striatum decreased substantially when rewards were delayed by a few seconds, closely paralleling the discounting of subjective reward value measured by pleasantness ratings and behavioral indifference values.
In two control analyses, we investigated the nature of decreasing ventral striatal BOLD responses with increasing delays. First, we challenged the role of decreasing outcome values. We regressed BOLD responses to the difference between constant, undiscounted outcomes of £ 20 and individual, discounted indifference values at each delay. The regression identified a similar ventral striatal region as the discounted indifference values alone (yellow circle in Fig. 3B; P < 0.07, small volume correction ventral striatum), suggesting that discounted outcome values indeed provide better descriptors of ventral striatal activation than constant, undiscounted outcomes. Second, we assessed the influence of individual differences in behavioral discounting on BOLD responses. We regressed BOLD responses to the difference between indifference values averaged across all participants minus the individual indifference values and found a mild activation in the ventral striatum (blue circle in Fig. 3B; P < 0.05, uncorrected). Nevertheless, the data supported the notion that individual behavioral discounting provided a slightly better descriptor of ventral striatal activation than averaged indifference values. These control tests suggested that the decreases of ventral striatal BOLD responses in individual participants were indeed related to the temporal discounting of outcome value with increasing delays.
To analyze time courses, we used again the previous grouping of participants into seven discounters and seven nondiscounters based on Spearman's rank correlation. Analyses for both ITI schedules demonstrated that the temporal peaks of ventral striatal BOLD responses decreased progressively with increasing delays in the seven participants showing significant behavioral discounting. By contrast, the seven nondiscounters showed only insignificantly different peaks (Fig. 3C). Whereas this analysis used the peak voxel BOLD response, an analysis using 16 voxels centered on the peak voxel showed similarly decreased responses with increasing delays in the discounters, but only smaller and nonsystematic changes across delays in nondiscounters. Thus while bearing in mind the relatively small sample sizes of these subgroups of participants, there appeared to be differences in the decrease of BOLD responses with delays between these two subpopulations.
Regressions of peak BOLD responses to hyperbolic and exponential functions in both ITI schedules suggested differential decreases of BOLD responses with delays in discounters but not in nondiscounters (Fig. 3D). Every single comparison for different discounting functions, ITI schedules and PIP estimated versus actual delays demonstrated higher discounting factors k and higher correlation coefficients R2 in discounters compared with nondiscounters, taking data from means (Table 3) and groups of individual discounters versus nondiscounters (P < 0.01 to P < 0.07; Mann-Whitney test). Nonparametric correlations using Kendall's tau showed similarly differential decreases in BOLD responses (discounters: median tau = −0.667 for fixed and –1.0 for adjusted ITI schedules; nondiscounters: all tau = 0.0; P < 0.015 and P < 0.003, respectively; Mann-Whitney test). The correlation coefficients R2 of individual participants were slightly but significantly higher for exponential compared with hyperbolic functions (P = 0.005 and P = 0.003 for fixed and adjusted ITI schedules, respectively; P = 0.002 for both schedules combined; Wilcoxon test). Analyzing the BOLD responses relative to the PIP estimated or actual, imposed delays resulted in comparable degrees of discounting (Table 3). The data shown in Fig. 3, C and D, suggested that BOLD responses to reward delay predicting stimuli in the ventral striatum declined monotonically with increasing delay. Furthermore, the decreases of BOLD responses reflected the difference in behavioral discounting between the two groups in our sample of 15 participants.
The fixed ITI schedule varied both delay and rate of reward, whereas the adjusted ITI schedule varied the delay but not the rate of reward. We used the regression of BOLD responses on behavioral indifference values that identified the ventral striatal region shown in Fig. 3A to compare discounting between the two ITI schedules. Discounting differed only insignificantly between fixed and adjusted ITI schedules in all 15 participants, as shown when data from hyperbolic and exponential fitting were combined (P = 0.4) or evaluated separately (Table 4), and when evaluations from discounters and nondiscounters were separated (P = 0.41 and P = 0.35, respectively). As reward rate constituted by design the single difference between the two ITI schedules, the similarity of BOLD responses between the two ITI schedules suggests that rate coding was not a major factor explaining the decrease of ventral striatal BOLD responses with reward delay.
We next asked to which extent the temporal discounting in the ventral striatum (Fig. 3, C and D) might allow an ideal observer to discriminate between reward delays based on the BOLD responses. We used ROC analyses on averaged BOLD responses across all delay combinations and across the two ITI schedules and found significantly higher probabilities of criterion brain activations with shorter compared with longer delays in discounters (ROC area under the curve P = 0.85; P < 0.01 permutation test) but not in nondiscounters (ROC = 0.57; NS; Fig. 4A). Differences between two individual delays were frequently significant in discounters (Fig. 4B) but rarely so in nondiscounters (Fig. 4C), resulting in overall significantly higher ROC values in discounters compared with nondiscounters (P = 0.03; Mann-Whitney test). Separate analyses of the two ITI schedules revealed also significant discriminations of reward delays and differences between discounters and nondiscounters (4 of 6 ROC areas under the curve at P < 0.05 or P < 0.01 in discounters with each ITI schedule, but only 1 of 6 and 2 of 6 ROCs at P < 0.05 in nondiscounters).
For further assessing the discrimination between stimuli predicting different reward delays, we performed linear discriminant analysis (Krzanowski 1988). We trained a classifier on data from the fixed ITI schedule pooled across all two-sample comparisons and used the obtained parameters for assessing discrimination in the adjusted ITI schedule. The results showed good discrimination in all 15 participants (P < 0.01; χ2 test). The reverse order, training on adjusted ITI schedule and testing on fixed ITI schedule data, led to a similar result (P < 0.01). Taken together with the ROC analyses, these data suggest good discrimination of BOLD responses to stimuli predicting delays ranging from 4 to 13.5 s.
The analyses separating discounters from nondiscounters suggested that the decreases of BOLD responses reflected the behavioral discounting of delayed rewards. We investigated the strength of this relationship by analyses on individual participants. The decay of BOLD responses across increasing delays differed substantially between individual participants with different degrees of behavioral discounting. Thus weak discounters showed no decay in BOLD responses, whereas participants with intermediate and strong behavioral discounting showed progressively steeper decreases of BOLD responses (Fig. 5A).
For more quantitative analyses, we performed Pearson correlations between discounting factors for BOLD and behavioral responses in individual participants, separately for hyperbolic and exponential functions. The correlations were significant with discount factors averaged across the two ITI schedules (P < 0.01; Fig. 5B). The BOLD-behavioral correlation was significantly better for exponential than hyperbolic fits (P < 0.002, z test). Similar correlation coefficients R2 and significances were obtained when discount factors were correlated separately for the two individual ITI schedules (Fixed ITI: hyperbolic R2 = 0.37, P < 0.05; exponential R2 = 0.36, P < 0.05. Adjusted ITI: hyperbolic R2 = 0.30, P < 0.05; R2 = 0.31, P < 0.05). Differences in correlations between the two ITI schedules were insignificant for both hyperbolic and exponential fitting (P = 0.4 and P = 0.53, respectively; z score).
These results suggest that the degrees of hyperbolic and exponential decreases of BOLD responses to reward delay predicting stimuli matched behavioral discounting not only between the two categorical groups of discounters versus nondiscounters shown in the preceding text but also at the level of individual participants.
RELATIONSHIPS TO OBJECTIVE AND SUBJECTIVE REWARD MAGNITUDES.
The magnitude of reward influences behavioral discounting rates and lower compared with higher magnitudes are associated with steeper discounting (Kirby and Marakovic 1995). We investigated the effects of reward magnitude on the decreases of BOLD responses in the separate group of 13 participants, using three delays of 4, 8, and 12 s and presenting £ 5 notes in random alternation with £ 20 notes for each delay. Although these participants did not undergo the PEST procedure, their pleasantness ratings suggested significantly stronger temporal discounting for £ 5 compared with £ 20 [hyperbolic: k(£5) = 0.09, k(£20) = 0.04; P < 0.02, Wilcoxon test; exponential: k(£5) = 0.06, k(£20) = 0.04; P < 0.01). Their BOLD responses showed higher hyperbolic and exponential mean discounting factors k for the £ 5 note (k = 0.227, R2 = 0.63 and k = 0.132, R2 = 0.86, respectively) compared with the £ 20 note (k = 0.075, R2 = 0.72 and k = 0.055, R2 = 0.69, respectively). These differences were significant in group comparisons for each discounting function with discounters and nondiscounters pooled (P = 0.04; Wilcoxon test). For further analysis, we separated this group into six discounters and six nondiscounters based on median split of Kendall's tau obtained from the pleasantness ratings averaged over £ 5 and £ 20. The group of discounters showed substantially steeper, delay related decreases of BOLD responses for £ 5 compared with £ 20 (hyperbolic: P = 0.035; exponential: P = 0.068; Wilcoxon test; Fig. 6, A and B). Nondiscounters showed no BOLD decreases with £ 20 and only slight decreases with £ 5 (hyperbolic: P = 0.46; exponential: P = 0.34). Thus compatible with notions on behavioral discounting, lower compared with higher reward magnitudes appeared to be associated with steeper decreases of BOLD responses to delay predicting stimuli at delays of seconds.
A previous human imaging study suggested an influence of financial status on BOLD responses during learning, possibly through variations in the subjective valuation of reward (Tobler et al. 2007a). Our next analysis was based on the stronger effects of lower reward magnitude on BOLD decreases reported in the preceding text and the well-known reduction in subjective reward value through decreasing marginal utility with higher personal finances (Kreps 1990). We determined the assets in the main group of 15 participants and investigated the potential influence on BOLD responses. Indeed regression analysis revealed steeper decreases of ventral striatal BOLD responses in participants with higher compared with lower assets for delays of 13.5 s (P < 0.0001, R2 = 0.73; Pearson coefficient) but less so for 4-s delays (P = 0.056, R2 = 0.25; comparison 4 vs. 13.5 s: P = 0.07; z test; Fig. 6, C and D). The subgroup of seven discounters showed a similarly stronger positive effect of assets on discounting with 13.5- compared with 4-s delays, although the nondiscounters showed a substantial effect at 4 s without obvious explanation. Behavioral discounting assessed in the intertemporal choice task with the PEST procedure showed a similar relationship to assets, which, however, failed to reach significance (P = 0.3). The data indicate a possible, moderate influence of assets on the decrease of BOLD responses, suggesting steeper decreases in richer participants.
BOLD responses to reward
Given the relationships of BOLD stimulus responses to behavioral discounting, we asked whether BOLD responses to the delivery of reward might reflect similar relationships to subjective reward valuation. We regressed BOLD responses to the reward itself on individual behavioral indifference values and on linear, hyperbolic and exponential functions across the four reward delays. Each of these four regressions identified equally well an area in the ventromedial caudate nucleus in which peaks of reward responses increased with increasing delays in both the fixed and the adjusted ITI schedules (P < 0.05, ventral striatum small volume corrected; Fig. 7A). Correlation coefficients R2 differed slightly but insignificantly between fixed and adjusted ITI schedules (P = 0.73; Wilcoxon test; Fig. 7B). Thus BOLD responses to reward appeared to covary with temporal delays.
For analyzing the stimulus responses, we had split the group of participants into seven discounters and seven nondiscounters according to their subjective reward valuation at the time of the stimulus and found good differentiation of BOLD responses to the stimuli. We investigated whether this split might also affect the BOLD responses at the time of reward. However, the increases in BOLD reward responses with increasing delays occurred irrespective of the participants’ discounting at the time of the stimulus (Fig. 7C). Increases in discounters were only insignificantly stronger than in nondiscounters, with both ITI schedules (P > 0.5). Thus although the relatively small sample size limits further conclusions, the presence of increases of reward BOLD responses in nondiscounters (Fig. 7C) contrasted with the absence of decreases of stimulus BOLD responses in the same participants (Fig. 3D).
Furthermore, there were no appreciable correlations between exponential BOLD and behavioral discounting factors for both ITI schedules combined or separately (averaged fixed and adjusted ITI: beta = −0.2, R2 = 0.02, P = 0.60; fixed ITI: beta = −0.3, R2 = 0.05, P = 0.45; adjusted ITI: beta = −0.1, R2 = 0.007, P = 0.78). This result indicates an absence of correlation of BOLD responses to the reward with behavioral discounting across individual participants and contrasts strongly with the significant correlations of stimulus responses with behavioral discounting (Fig. 5B).
The differences in delay related response changes between stimuli and reward suggested that the reward responses might not be governed by temporal discounting in the same way as the stimulus responses. In searching for alternative explanations, we explored temporal variations in reward occurrence and identified a group of voxels in the ventral striatum that showed higher reward responses with increasing statistical variance of subjective timing in the PIP task (Fig. 8A; P < 0.05, ventral striatum small volume corrected). Individual contrast estimates correlated well with variance in both fixed and adjusted ITI schedules in Pearson correlations (P < 0.02) and Kendal's tau test (P < 0.05) without significant differences between the two ITI schedules (P > 0.5, z test; Fig. 8B) nor between discounters and nondiscounters (P = 0.98). This ventral striatal focus overlapped largely with the region activated by reward delay predicting stimuli (Figs. 3, A and B, and 6, A and B).
These data suggest that BOLD responses to reward increased with the delay at which the reward occurred after a predictive stimulus, even in nondiscounters. However, the increase was apparently unrelated to the discounting of reward value and might be related to the temporal uncertainty about the moment of reward delivery.
This human temporal discounting study employed delays of a few seconds typical for animal experiments. The short time courses would allow us to relate the human brain processes to mechanisms investigated at the level of single neurons. Psychophysical measures demonstrated good behavioral discounting within this time frame. BOLD responses in the ventral striatum to reward predicting stimuli decreased with increasing delay length. BOLD responses correlated with subjective reward valuations and conformed slightly better to exponential than hyperbolic models. The discounting occurred with Pavlovian reward predictors irrespective of choice, became steeper with lower objective and subjective reward magnitudes, and reflected the delay rather than rate of reward. In contrast to value discounting at the time of the stimuli, the BOLD responses to the reward itself reflected the temporal uncertainty associated with delayed rewards. Taken together, these data demonstrate substantial and differential influences of reward delays on human BOLD responses in a key reward structure, the ventral striatum. The occurrence of BOLD response decreases within the time frames of single neuron studies may reflect the known temporal sensitivities of dopamine and orbitofrontal neurons projecting to the ventral striatum.
Our behavioral task comprised Pavlovian reward predictors without choice and thus contrasted with previously employed choices between delayed rewards (Hariri et al. 2006; Kable and Glimcher 2007; McClure et al. 2004, 2007; Tanaka et al. 2004; Wittmann et al. 2007). The differential time estimates in the peak interval procedure suggested good discrimination of reward delays. Earlier work, and the current intertemporal choice data, showed appropriate valuation of rewards based on Pavlovian predictors in humans (Gottfried et al. 2003; Tobler et al. 2007b) and animals (Waelti et al. 2001). The current BOLD responses during reward discounting without choice suggest neural reward valuation preceding, and irrespective of, overt choices in the ventral striatum.
Our regression model for identifying decreases of ventral striatal BOLD responses employed the measured indifference values of intertemporal choices rather than fitted discounting functions. This approach took direct advantage of the measured behavioral data, avoided assumptions of particular discounting models and provided a less noisy and more accurate basis for the regression of BOLD reward value signals. The results demonstrated discounting by better correlations of BOLD responses with the measured indifference values compared with physical reward values. By fitting the BOLD responses to hyperbolic and exponential functions, we found good relationships between brain activation and behavioral reactions across individual participants.
The limited signal-to-noise ratio of BOLD responses in reward structures usually precludes straightforward single subject analyses typical for visual studies. Nevertheless, our BOLD data fit better with individual rather than averaged behavioral indifference values. BOLD responses of participants with significant behavioral discounting showed graded time courses, good fits to hyperbolic and exponential functions and good delay discrimination in signal detection measures (ROC P = 0.85, classifier P < 0.01). All of these measures were nondifferential in nondiscounters (Figs. 3,C and D, and 4), although data from subgroup sizes of seven participants should be considered as preliminary. The considerable variations in BOLD decreases correlated well with individual behavioral discounting factors (Fig. 5). Furthermore BOLD decreases were steeper in more wealthy participants who possibly attached lower outcome value to the £ 20 (marginal utility, Fig. 6B). Previous individualized analyses showed similar variations in individual BOLD responses and good correlations with behavioral discounting at delays of days and months (Kable and Glimcher 2007). Taken together the individualized data analyses have provided rich information on key aspects of temporal discounting.
BOLD responses to reward predicting stimuli
Hyperbolic and exponential functions described the current behavioral discounting almost equally well, irrespective of ITI schedules and of actual or PIP estimated delays. Hyperbolic discounting accounts for reported preference reversals with distant reward choices and adequately describes behavioral discounting in humans (Hariri et al. 2006; Kable and Glimcher 2007; Myerson and Green 1995; Rachlin and Green 1972), even with short delays of seconds (Reynolds and Schiffbauer 2004). Hyperbolic discounting holds also in rhesus monkeys (Hayden and Platt 2007; Kobayashi and Schultz 2008) and other animals (Ho et al. 1999; Richards et al. 1997) but is steeper than in humans (Stevens and Hauser 2004). Hyperbolic functions describe human BOLD decreases well with delays of days, weeks, and months (Kable and Glimcher 2007), similar to discounting related decreases in primate dopamine neurons with delays of seconds (Kobayashi and Schultz 2008). By contrast, our discounting-related decreases of BOLD responses were described slightly but significantly better by exponential than hyperbolic functions for delays in the range of seconds, although both functions provided statistically valid descriptions. As with behavior, BOLD decreases occurred equally well with actual as with estimated subjective reward delays, possibly reflecting the minor differences between the two measures and indicating robust discounting irrespective of actual or subjective time processes. The exponential discounting might correspond to the immediate exponential decay component (beta) of the dual process model, whereas the slow (delta) component may not be engaged at all in these time ranges (McClure et al. 2007). Taken together, these data suggest that both exponential and hyperbolic discounting functions provide adequate descriptions for rapid behavioral and neural discounting in the range of seconds. Better description by one of these models may be due to the specific behavioral situation rather than representing fundamental mechanistic or conceptual differences at these short time ranges, in line with previous reasoning (Schweighofer et al. 2006).
The current behavioral indifference values and BOLD responses revealed steep decreases in time ranges of seconds that resembled discounting in human studies over days, months and years (Hariri et al. 2006; Kable and Glimcher 2007; Krishnan-Sarin et al. 2007; McClure et al. 2004; Reynolds and Schiffbauer 2004; Wittmann et al. 2007). Although these comparisons suggest similarities in many aspects of temporal discounting across different delay ranges, the degree of discounting over a few seconds found previously (Tanaka et al. 2004) and currently would lead to very low subjective values after delays of months, both for exponential and hyperbolic functions. As this was not observed, the steepness of discounting appears to be scaled to the predicted range of delays. Adaptive brain processes may adjust the discounting factors to the delay range valid in each situation and produce good discrimination among values of delayed rewards within these time ranges. Although the issue was raised before (McClure et al. 2007), investigations comparing different delay ranges in the same participants are still lacking.
The current experiment used explicit indicators of monetary gain that occurred in every trial and were paid out immediately after the experiment. These outcomes contrasted with hypothetical, unpaid monetary rewards or sums of money represented by gift certificates or credit cards to be paid out after delays of weeks or months (Kable and Glimcher 2007; McClure et al. 2004; Wittmann et al. 2007). The long delays used in these studies make more explicit outcomes in each trial unfeasible, and the differences between hypothetical and actually promised rewards do not seem much to influence behavioral temporal discounting (Johnson and Bickel 2002). By contrast, explicit monetary outcomes after very short delays represent quite direct positive reinforcers for humans. Both explicit monetary and liquid outcomes are well discounted in humans in time ranges of seconds and minutes (McClure et al. 2007; Tanaka et al. 2004; this study), although discounting is less steep in humans compared with animals including macaque monkeys (Kobayashi and Schultz 2008). These results suggest convergence between human and animal studies and allow us to use data from invasive neurophysiological studies on animals to understand more of the neural mechanisms underlying human discounting (see following text).
Behavioral studies suggest that lower reward magnitudes produce steeper temporal discounting (Kirby and Marakovic 1995). The current study shows correspondingly steeper delay related decreases of BOLD responses for rewards of £ 5 compared with £ 20. We also found steeper decreases in moderately richer compared with poorer participants, which might reflect the lower subjective reward value and marginal utility with higher assets. Thus the steeper BOLD decreases with smaller objective and possibly subjective reward values may provide a neural correlate for the observed influence of reward magnitude on discounting behavior.
In our fixed ITI schedule, the rate of reward (reward/time unit) decreased with increasing reward delay, and the observed changes of BOLD responses to the stimuli and reward might reflect a decrease in reward rate rather than an increase in delay. To address the potential confound of rate, we used the adjusted ITI schedule that compensated increasing delays by decreasing ITI durations and thus produced a constant reward rate. Comparisons of BOLD responses between the two ITI schedules would reveal the influence of reward rate. This procedure is feasible with delays in the range of seconds or minutes and has been used before in neurophysiological animal experiments (Kobayashi and Schultz 2008; Roesch et al. 2006). Interestingly, none of our ventral striatal activations showed significant differences between the two ITI schedules (Figs. 3, C and D, 7, and 8), indicating that these BOLD responses reflected the influence of reward delay rather than rate. However, these tests were not designed to investigate reward rate without the confound of delay, and future experiments are required to identify a brain structure coding reward rate irrespective of stimulus-reward delay.
Discounting as component of reward value coding
Our study focused on the ventral striatum because of its strong inputs from reward coding neurons in the dopaminergic substantia nigra and ventral tegmental area (Schultz et al. 1993), orbitofrontal cortex (Padoa-Schioppa and Assad 2006), and amygdala (Paton et al. 2006). The ventral striatum itself contains neurons that code reward information (Carelli et al. 2000; Schultz et al. 1992). Dopamine and orbitofrontal neurons show decreases in reward value coding during temporal discounting (Kobayashi and Schultz 2008; Roesch and Olson 2005b; Roesch et al. 2006, 2007). As synaptic activity may constitute a principal source of BOLD signals (Logothetis 2008; Logothetis et al. 2001), the observed BOLD responses may be due to combined inputs from extrastriatal reward signals and from local, synaptically connected neurons. All human reward discounting studies identified BOLD response discounting in the ventral striatum (Hariri et al. 2006; Kable and Glimcher 2007; McClure et al. 2004, 2007; Tanaka et al. 2004; Wittmann et al. 2007). Additional human brain structures showing hyperbolic or immediate (beta) exponential BOLD decreases comprised more dorsal parts of striatum, medial prefrontal and orbitofrontal cortex, and anterior and posterior cingulate gyrus, whereas late discounting (delta) involved lateral prefrontal cortex, inferior and posterior parietal cortex, and posterior insula (McClure et al. 2004, 2007; Tanaka et al. 2004; Wittmann et al. 2007). Together these findings on reward value discounting in the ventral striatum underline the role of this structure as a key component of the brain's reward system.
The current temporal discounting-related, ventral striatal BOLD responses should be compared with the behavioral deficits in temporal discounting seen after lesions or pharmacological alterations of the ventral striatum or its input systems. The steepness of temporal discounting increases in rats after lesions of the nucleus accumbens or orbitofrontal cortex (Cardinal et al. 2001; Kheramin et al. 2002) and covaries inversely with dopamine D2 and serotonin receptor stimulation (Bizot et al. 1999; Kheramin et al. 2004; Mobini et al. 2000; Wade et al. 2000). The reduction of learning with longer reward delays is exacerbated in rats with lesions of nucleus accumbens (Cardinal and Cheung 2005). Temporal discounting is steeper in human impulsivity disorders (Ainslie 1975) associated with orbitofrontal lesions (Bechara et al. 2000), attention deficit disorder (Luman et al. 2005; Solanto 2002), and dopamine receptor polymorphisms related to pathological gambling (Perez de Castro et al. 1997). The presently reported decreases of ventral striatal BOLD responses provide a neural mechanism underlying temporal discounting in the ventral striatum inferred from lesion and psychopharmacological studies.
The coding of reward is often described by the key parameters defining the value of a reward, namely its magnitude and probability and their combination (formally the expected value of the probability distribution of a reward option). Neurons in the striatum and some of its input structures code these parameters (Cromwell and Schultz 2003; Tobler et al. 2005). Although these parameters define adequately the objective, physical value of a reward, additional factors such as individually weighted preferences and temporal delays determine the subjective value of rewards for the individual decision maker. The previous and current human and animal data on temporal discounting suggest that reward value is coded in subjective terms in the ventral striatal BOLD responses and in single dopamine and orbitofrontal neurons (Kobayashi and Schultz 2008; Roesch and Olson 2005b; Roesch et al. 2007). In this way, reward delay would constitute one component parameter for the coding of subjective reward value (Kable and Glimcher 2007).
BOLD responses to reward
The present BOLD responses at the time of the reward increased with increasing delays, similar to the reward responses of primate dopamine neurons after delays in similar time ranges (Kobayashi and Schultz 2008). As the BOLD responses increased in all our participants irrespective of their individual degree of behavioral discounting, they apparently did not reflect the subjective reward value due to the degree of discounting. Alternatively, the learned reward prediction might have been only partial with longer delays due to the dependency of conditioning on stimulus-reward intervals (Holland 1980). Ventral striatal BOLD responses capture reward prediction errors (McClure et al. 2003; O'Doherty et al. 2003). Thus the BOLD reward response might have reflected a graded reward prediction error defined by the difference between the partial prediction and the delivered full reward. In addition, the peak interval procedure revealed variations in the precision of temporal prediction of reward between individual participants, in particular with longer delays. The higher imprecision with longer delays likely reflects scale invariant time estimation (Gibbon 1977; Rakitin et al. 1998). A closer look showed a good correlation between the strength of the BOLD reward response and the temporal precision across individuals, irrespective of the ITI schedule. It is possible that reward occurrence at a poorly predicted moment elicited a temporal reward prediction error that might explain the observed BOLD response. Time-sensitive dopamine prediction error responses show similar reward responses after delays and might contribute to the BOLD response (Fiorillo et al. 2008). Taken together the current data revealed substantial differences in temporal relationships of BOLD responses between delay predicting stimuli and reward and suggest distinctively different influences of delay on these two reward-related events.
Grant support was provided by the Wellcome Trust and the Cambridge MRC-Wellcome Behavioral and Clinical Neuroscience Institute.
We thank C. Harris and S. Kobayashi for helpful comments.
↵* L. Gregorios-Pippas and P. N. Tobler contributed equally to this study.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Copyright © 2009 the American Physiological Society
- Ainslie 1975.↵
- Bechara et al. 2000.↵
- Benjamini and Hochberg 1995.↵
- Bizot et al. 1999.↵
- Cardinal and Cheung 2005.↵
- Cardinal et al. 2001.↵
- Carelli et al. 2000.↵
- Chandrasekaran et al. 2007.↵
- Cromwell and Schultz 2003.↵
- Elliott et al. 2000.↵
- Fiorillo et al. 2008.↵
- Friston et al. 1995.↵
- Gibbon 1977.↵
- Gottfried et al. 2003.↵
- Hariri et al. 2006.↵
- Hayden and Platt 2007.↵
- Ho et al. 1999.↵
- Holland 1980.↵
- Johnson and Bickel 2002.↵
- Kable and Glimcher 2007.↵
- Kheramin et al. 2004.↵
- Kheramin et al. 2002.↵
- Kirby 1997.↵
- Kirby and Marakovic 1995.↵
- Knutson et al. 2005.↵
- Kobayashi and Schultz 2008.↵
- Kreps and Porteus 1978.↵
- Krishnan-Sarin et al. 2007.↵
- Krzanowski 1988.↵
- Laibson 1997.↵
- Logothetis 2008.↵
- Logothetis et al. 2001.↵
- Loewenstein and Prelec 1992.
- Luce 2000.↵
- Luman et al. 2005.↵
- Maldjian et al. 2003.↵
- Martin-Soelch et al. 2003.↵
- Martinez et al. 2003.↵
- McClure et al. 2003.↵
- McClure et al. 2007.↵
- McClure et al. 2004.↵
- Meck 2005.↵
- Mobini et al. 2000.↵
- Murray et al. 2008.↵
- Myerson and Green 1995.↵
- O'Doherty 2004.↵
- O'Doherty et al. 2003.↵
- Padoa-Schioppa and Assad 2006.↵
- Paton et al. 2006.↵
- Perez-de-Castro et al. 1997.↵
- Preuschoff Bossaerts and Quartz 2006.
- Rachlin and Green 1972.↵
- Reynolds and Schiffbauer 2004.↵
- Rakitin et al. 1998.↵
- Richards et al. 1997.↵
- Roberts 1981.↵
- Rorden and Brett 2000.↵
- Roesch et al. 2007.↵
- Roesch and Olson 2005a.↵
- Roesch and Olson 2005b.↵
- Roesch et al. 2006.↵
- Schultz et al. 1993.↵
- Schultz et al. 1992.↵
- Schweighofer et al. 2006.↵
- Solanto 2002.↵
- Stevens and Hauser 2004.↵
- Tanaka et al. 2004.↵
- Thaler 1981.↵
- Tobler et al. 2007a.↵
- Tobler et al. 2005.↵
- Tobler et al. 2007b.↵
- Wade et al. 2000.↵
- Waelti et al. 2001.↵
- Wittmann et al. 2007.↵
- Yacubian et al. 2006.↵