# Reward Value Coding Distinct From Risk Attitude-Related Uncertainty Coding in Human Reward Systems

## Abstract

When deciding between different options, individuals are guided by the expected (mean) value of the different outcomes and by the associated degrees of uncertainty. We used functional magnetic resonance imaging to identify brain activations coding the key decision parameters of expected value (magnitude and probability) separately from uncertainty (statistical variance) of monetary rewards. Participants discriminated behaviorally between stimuli associated with different expected values and uncertainty. Stimuli associated with higher expected values elicited monotonically increasing activations in distinct regions of the striatum, irrespective of different combinations of magnitude and probability. Stimuli associated with higher uncertainty (variance) elicited increasing activations in the lateral orbitofrontal cortex. Uncertainty-related activations covaried with individual risk aversion in lateral orbitofrontal regions and risk-seeking in more medial areas. Furthermore, activations in expected value-coding regions in prefrontal cortex covaried differentially with uncertainty depending on risk attitudes of individual participants, suggesting that separate prefrontal regions are involved in risk aversion and seeking. These data demonstrate the distinct coding in key reward structures of the two basic and crucial decision parameters, expected value, and uncertainty.

## INTRODUCTION

Every day we make decisions about which outcomes to pursue, but we don't even know how the brain processes the most simple parameters that determine our decisions. Pascal (1948) used the emerging probability theory to postulate a formal description of decision-making. He conjectured that humans tend to choose the option with the highest expected (mean) value of the probability distribution of outcomes (expected value as sum of all probability-weighted values of the distribution, the first moment of a distribution). However, most realistic choices involve some degree of uncertainty of the outcome, and individuals need to take the uncertainty into account when making decisions. Uncertainty can be expressed by the variance of the probability distribution (variance as sum of probability-weighted differences from expected value, the 2nd moment). Variance reflects the spread of a distribution and indicates how far each possible value is away from the expected value. Variance is perceived as “risk” and refers to how much the decision-maker is uncertain or risks to gain, not to gain, or to lose relative to the expected (mean) value when the probabilities are known (Kreps 1990; Real 1991). Probability by itself is not a good, monotonic measure for uncertainty. For example, in a two-outcome situation (reward vs. no reward), uncertainty is maximal at *P* = 0.5 and decreases toward higher and lower probabilities as it becomes increasingly certain that something or nothing will be obtained, respectively. Modern economic decision theories, such as expected utility theory and prospect theory, build on the basic terms of expected value and uncertainty and incorporate them into the scalar decision variables of utility and prospect, respectively (Huang and Litzenberger 1988; Kahneman and Tversky 1979; Kreps 1990; Von Neuman and Morgenstern 1944).

Just as we need to understand the function of the retina before investigating visual perception, we need to understand the neural processing of expected value and variance that constitute the most basic input variables for economic decision-making. As expected value is the summed product of magnitude and probability, a neural signal for expected value should reflect the product irrespective of its components. Previous studies report distinct neural signals for magnitude and probability in striatum and orbitofrontal cortex (Breiter et al. 2001; Critchley et al. 2001; Delgado et al. 2003; Dreher et al. 2006; Elliott et al. 2003; Knutson et al. 2001, 2005; O'Doherty et al. 2001; Volz et al. 2003) but without describing a common signal related to expected value irrespective of the two components (Tobler et al. 2005). The mathematical decomposition of expected utility into expected value and variance (Huang and Litzenberger 1988; Stephens and Krebs 1986) and the variations in risk attitude among different behavioral situations (Caraco et al. 1980, 1990) suggest that some brain structures might process uncertainty separately from expected value.

Various brain structures appear to be engaged in situations involving uncertainty. The altered risk sensitivity and gambling in humans and animals after brain lesions suggest that the orbitofrontal cortex is involved in processing information about the uncertainty of outcomes (Bechara et al. 1994, 2000; Hsu et al. 2005; Miller 1985; Mobini et al. 2002; Sanfey et al. 2003). The anterior cingulate is active with choice conflicts during financial risk-taking (Kuhnen and Knutson 2005), the amygdala, orbitofrontal, and dorsolateral prefrontal cortex are engaged in ambiguous situations with unknown probabilities (Hsu et al. 2005; Huettel et al. 2006), and the midbrain and striatum are involved in the coding of variance combined with magnitude or probability (Dreher et al. 2006) and in classification learning (Aron et al. 2004). However, neural signals reflecting the simple uncertainty of reward well separated from the coding of other decision variables such as expected value have not been located.

## METHODS

### Participants

Sixteen right-handed healthy participants (mean age: 27 yr; range: 20–41 yr; 8 females) were investigated. Participants were preassessed to exclude prior histories of neurological or psychiatric illness. All participants gave informed consent, and the study was approved by the Joint Ethics Committee of the National Hospital for Neurology and Neurosurgery (UK).

### Behavioral procedure

Participants were placed on a moveable bed in the scanner with light head restraint to limit head movement during image acquisition. Participants viewed a computer monitor through a mirror fitted on top of the head coil. To study the processing of economic parameters independent of choice, subjects performed in a simple conditioning paradigm in the scanner. We determined individual risk attitudes in a separate choice task outside the scanner (see following text). At the beginning of a trial in the main paradigm, single visual stimuli appeared for 1.5 s in one of the four quadrants of the monitor. Outcomes appeared 1 s after the stimulus for 0.5 s below the stimulus on the monitor such that outcome and stimulus presentation co-terminated. Intertrial intervals varied between 1 and 8 s according to a Poisson distribution with a mean of 3 s. In each trial, we randomly presented one of twelve visual stimuli, each predicting reward with a specific magnitude and probability. We used four levels of reward magnitude, which varied between 100 and 400 points in steps of 100, and five levels of reward probability, which varied between *P* = 0.0 and *P* = 1.0 in steps of 0.25. The stimuli and the rewarded versus unrewarded outcomes alternated randomly within the boundaries defined by the probabilities (48 trials for *P* = 1.0; e.g., 36 rewarded and 12 unrewarded trials for *P* = 0.75), thus producing a measured mean of reward identical to the expected value. Throughout the experiment, the total points accumulated were displayed and updated in rewarded trials at the time of reward delivery. Four percent of the total points were predictably paid out as British pence at the end of the experiment.

The visual stimuli were specific combinations of attributes drawn from two visual dimensions, shape and color, indicating reward magnitude and probability, respectively. For example (Fig. 1*B*), four orange circles could predict 400 points with *P* = 0.5, whereas two dark red circles could predict 200 points with *P* = 1.0. Both stimuli were associated with different combinations of magnitude and probability but the same expected value (200 points). We counterbalanced the meaning of dimensions (shape or color of stimuli) and the direction in which they changed (for shape: number of circles per stimulus; for color: relative level of yellow or red) across participants. Stimulus delivery was controlled using Cogent 2000 software (Wellcome Department of Imaging Neuroscience, London, UK) as implemented in Matlab 6.5 (Mathworks, Natick, MA).

The first two moments of the 12 probability distributions associated with the 12 respective stimuli were calculated according to the following formulae: expected value (EV) = Σ_{i} (*m _{i}* ×

*p*); variance = [Σ

_{i}_{i}(

*m*− EV)

_{i}^{2}]/

*n,*which is equivalent to

*p*× (

*m*–EV)

_{i}^{2}+ (1 −

*p*) × (0 –EV)

^{2}.

In the formulae, *m* is magnitude of reward, *p* is probability of reward, *n* is number of elements (outcomes associated with each stimulus), and *i* is index i = 1…*n.* Our probability distributions have *n* = 1, 2, or 4 elements for *P* = 0.0 or 1.0, *P* = 0.5, and *P* = 0.25 or 0.75, respectively.

The conditioning procedure comprised a training and a testing phase. In the training phase, participants learned the meaning of the stimuli and how to perform the task while each stimulus was presented in eight consecutive trials. Earnings in the training phase did not contribute to the monetary earnings of participants, but accumulated points were nevertheless displayed. Participants were in the scanner during the training phase while structural scans were taken. Functional data were acquired in the test phase, comprising two sessions, each with 24 randomly alternating presentations of each stimulus. The task remained the same as during the training phase, but outcomes contributed to total earnings. In both training and testing phase, stimuli appeared in one of the four quadrants of the screen. The quadrant of stimulus appearance varied randomly between trials. Participants were instructed to press one of four buttons corresponding to the spatial quadrant of stimulus presentation. If they failed to press the correct button within 900 ms, the trial was aborted, a red “X” appeared, and 100 points were subtracted from the accumulated earnings. Error trials were repeated, and reported results correspond to correct trials in the testing phase.

### Data acquisition and analysis

Participants rated the pleasantness of visual stimuli before and after the experiment on a scale ranging from 5 = very pleasant to −5 = very unpleasant. We evaluated ratings statistically by repeated-measures ANOVA. An interaction analysis between trial type and time (before and after the experiment) tested for changes in pleasantness ratings induced by the conditioning procedure. In addition, we tested preference of participants among two concurrently presented stimuli, both before and after the experiment. Pairs of stimuli either had the same or, in control trials, different expected value. The preference tests served to assess risk attitudes within the range of magnitudes and probabilities used. Participants chose between stimuli associated with low and high uncertainty but the same expected value. Each time the participant chose the more certain stimulus, the factor of risk aversion increased by one, whereas choosing the more uncertain stimulus decreased it by one (*n* = 4 choices). A positive average factor indicated risk aversion, a negative factor indicated risk-seeking, and a zero factor risk neutrality.

We acquired gradient echo T2*-weighted echo-planar images (EPIs) with blood-oxygen-level-dependent (BOLD) contrast on a Siemens Sonata 1.5 Tesla scanner (slices/volume, 33; repetition time, 2.97 s). Depending on performance of participants, 405–500 volumes were collected per session, together with five “dummy” volumes at the start of the scanning session. Scan onset times varied randomly relative to stimulus onset times. A T1-weighted structural image was also acquired for each participant. Signal dropout in basal frontal and medial temporal structures due to susceptibility artifact was reduced by using a tilted plane of acquisition (30° to the anterior commissure-posterior commissure line, rostral > caudal). Imaging parameters were: echo time, 50 ms; field-of-view, 192 mm. The in-plane resolution was 3 × 3 mm; with a slice thickness of 2 mm and an interslice gap of 1 mm. High-resolution T1-weighted structural scans were coregistered to their mean EPIs and averaged together to permit anatomical localization of the functional activations at the group level.

Statistical Parametric Mapping (SPM2; Functional Imaging Laboratory, London, UK) served to spatially realign functional data, normalize them to a standard EPI template and smooth them using an isometric Gaussian kernel with a full width at half-maximum of 10 mm. We used a standard rapid event-related fMRI approach in which evoked hemodynamic responses to each trial type are estimated separately by convolving a canonical hemodynamic response function with the onsets for each trial type and regressing these trial regressors against the measured fMRI signal (Dale and Buckner 1997; Josephs and Henson 1999). This approach makes use of the fact that the hemodynamic response function summates in an approximately linear fashion over time (Boynton et al. 1996). By presenting trials in strictly random order and using randomly varying intertrial intervals, it is possible to separate out fMRI responses to rapidly presented events without waiting for the hemodynamic response to reach baseline after each single trial (Dale and Buckner 1997; Josephs and Henson 1999). Functional data were analyzed by constructing a set of stick functions at the event-onset times for each of the 12 trial types. Rewarded and unrewarded trial types were modeled separately. The stick function regressors were convolved with a canonical hemodynamic response function (HRF). In separate time course analyses, we made no assumptions about the shape of activations and used eight finite impulse responses per trial, each response separated from the next by one scan (2.97 s). Participant-specific movement parameters were modeled as covariates of no interest.

The general linear model served to compute trial type-specific betas, reflecting the strength of covariance between the brain activation and the canonical response function for a given condition at each voxel for each participant (see Friston et al. 1994 for detailed descriptions). The effects of interest (betas, percent of signal change) were calculated relative to an implicit baseline. Although the numbers of stimuli were counterbalanced during the experiment, the numbers of rewarded versus unrewarded trials varied due to the different reward probabilities. To compensate for different trial numbers in the general linear model, we equalized the weights of the less-frequent events by multiplication. Using random-effects analysis, the relevant contrasts of parameter estimates were entered into a series of one-way *t*-test, simple regressions or repeated-measures ANOVAs with nonsphericity correction where appropriate. We used thresholding strategy as described previously (O'Doherty et al. 2002, 2003). For each analysis, in a priori brain regions identified in previous neuroimaging studies of reward processing (O'Doherty et al. 2002, 2003), including striatum and prefrontal cortex, we report activations above a threshold of *P* < 0.001 (uncorrected) with a minimum cluster size of 5 voxels in all participants. For time course plots, we also used MarsBaR (Brett et al. 2002), making no assumptions about the shape of activations, and applying eight finite impulse responses per trial, each response separated from the next by one scan (2.97 s). The dependent measure in time course plots is percentage signal change measured within spheres of 10 mm around peak voxels. Reported voxels conform to MNI (Montreal Neurological Institute) coordinate space, with the right side of the image corresponding to the right side of the brain.

## RESULTS

We used functional magnetic resonance imaging (fMRI) to investigate how human reward structures process expected reward value separately from uncertainty. We used an orthogonal design that fully dissociated reward magnitude, probability, expected value, and uncertainty. We used all-or-none binary probability distributions in which the probability of obtaining a reward varied between *P* = 0 and *P* = 1. Different conditioned stimuli were associated with different reward magnitudes and probabilities, and thus expected values as their product (Fig. 1, *A* and *B*). Our aim was to investigate the basic parameters as potential inputs to neural decision processes rather than the decision processes themselves. Therefore we used imperative situations in which we had full control over these parameters rather than behavioral choices.

The general linear model used for data analysis comprised separate regressors for the different stimuli associated with magnitude, probability, expected value, and uncertainty. Subsequent tests assumed that expected value-related brain activations covary with expected value without discriminating between different combinations of magnitude and probability that yield the same expected value. To identify uncertainty-related brain activations, the analysis assumed that uncertainty is highest for *P* = 0.5, where receiving or not receiving a reward is equally likely, and decreases toward lower and higher probabilities (variance as inverted U function of probability, Fig. 1*C*). Consequently, brain activations reflecting uncertainty would follow variance as inverted U function of probability. By contrast, a straightforward value signal would covary monotonically with the full range of probabilities from *P* = 0.0 to 1.0 (Fig. 1*C*). The distinction between the inverted U and the linear covariations with probability guided us in searching for separate brain signals for uncertainty and expected value.

Previous neurophysiological studies guided our selection of investigated brain structures. Dopamine responses covary differentially with expected value and uncertainty (Fiorillo et al. 2003; Tobler et al. 2005). As fMRI primarily reflects afferent inputs to an area (Logothetis et al. 2001) and as the main projection areas of dopamine neurons are the striatum and prefrontal cortex (Lynd-Balta and Haber 1994; Williams and Goldman-Rakic 1998), we searched for striatal and prefrontal regions showing differential relations to expected value and uncertainty. We used the results from the linear regressions to search for brain activity that increased phasically at the time of conditioned stimuli when reward magnitude, probability, expected value, or uncertainty increased.

### Behavioral performance

We measured the pleasantness of stimuli before and after conditioning. Pleasantness ratings did not vary before conditioning [ANOVA: *F*(1,11) < 0.77, *P* > 0.67; regression: *r* < 0.12, *P* > 0.29] but did afterward [*F*(1,11) = 10.01, *P* < 0.0001], as a function of magnitude and probability (for both, *r* > 0.55, *P* < 0.0001; Fig. 1, *D* and *E*; Table 1). After the experiment, pleasantness ratings were higher with higher expected value (*r* = 0.53, *P* < 0.0001; Fig. 1*F*) but did not vary within pairs of stimuli that had the same expected value but different magnitude-probability combinations (*t*_{63} = −0.14, *P* > 0.89). Reaction time was significantly shorter for the highest compared with lowest expected reward value and showed weak negative correlation with probability (590 vs. 601 ms, *t*_{2368} = 2.3, *P* < 0.05; *r* = −0.85, *P* < 0.08) without changing significantly with magnitude. In separate choice tests, stimuli with higher expected value were preferred more often following conditioning compared with before (83 vs. 50%, *t*_{47} = 4.85, *P* < 0.0001). Thus participants discriminated the stimuli according to magnitude and probability and combined these two parameters in a manner that indicated they used expected value for choice preferences.

Preference tests in separate choice trials revealed that 6 of the 16 participants were risk averse, 7 were risk seeking, and 3 were risk neutral [average preference factors (mean ± SE): risk averters 1.8 ± 0.3; risk seekers −2 ± 0.2; scale ranging from +4 to –4, see methods]. We measured the sensitivity (slope) of pleasantness ratings to variance for each participant and regressed this measure against individual risk attitude. We found a significant negative correlation with risk aversion (Fig. 1*G*), indicating that the more the participants were risk averse, the more they found uncertain outcomes unpleasant. These results suggest that the pleasantness ratings according to variance reflected individual risk attitudes.

### Coding of expected value in striatum

To first locate activations reflecting magnitude and probability, we regressed responses to reward-predicting stimuli separately against increases in these two parameters. We found significant correlations with brain activity for both parameters in caudate and ventro-medial putamen (Fig. 2, *A* and *G*). Similar increases were seen in the time courses of activations averaged across all participants (Fig. 2, *B* and *H*) and with regressions of average activations onto magnitude and probability (Fig. 2, *C* and *I*). These correlations were similar when the data were analyzed separately in rewarded and unrewarded trials (Fig. 2, *D–F* and *J–L*). In addition, a medial prefrontal region showed increasing activations only with probability but not magnitude, as described before (Knutson et al. 2005).

Having established striatal regions coding magnitude and probability, we aimed to locate striatal activations coding expected value. We regressed brain activation against expected value and found significant correlations in the medial and posterior striatum (Fig. 3, *A* and *E*), which were also apparent in the averaged time courses (Fig. 3, *B* and *F*). The activations increased significantly with expected value (Fig. 3, *C* and *G*) but not with variance (Fig. 4*D*). The inverse, decreasing activations with increasing expected value, was not found in the striatum (*P* > 0.05). Regressions of brain activation against expected value differed significantly from regressions against reaction time in ventral striatum (Fig. 3*H*) but not in dorsal striatum (higher activation with both expected value and slower reactions; Fig. 3*D*). These data suggest separate coding between expected value and the motivational effects associated with increasing rewards.

Besides covarying with magnitude and probability, the coding of expected value would require that neural changes with one parameter compensate opposite changes in the other parameter, such that activations with different magnitude-probability combinations resulting in the same multiplicative product would be indistinguishable. In applying this test, we found similar striatal activations for the same expected values resulting from different magnitude-probability combinations (Fig. 4*A*). For example, activations differed insignificantly between stimuli predicting 100 reward points with *P* = 1.0 and 200 points with *P* = 0.5 (expected value 100 points), but activations were higher than for stimuli associated with an expected value of 50 points and lower than for stimuli associated with an expected value of 150 points. Despite the constancy for different magnitude-probability combinations, the striatal regions activated with expected value (Fig. 3, *A* and *E*) were sensitive to individual variations in magnitude and probability (Fig. 4*B*). Taken together, activations in the striatum seemed to combine reward magnitude and probability multiplicatively into a common signal of expected value but were unrelated to variance.

As expected value covaries with both magnitude and probability, we found, not surprisingly, that the striatal region coding expected value overlapped partly with regions coding magnitude and probability (Figs. 2, *A* and *G*, and 4*C*). The lack of coding of expected value in the larger, nonoverlapping parts may be due to lack of sensitivity to one of the parameters or to lack of multiplicative coding, an issue requiring further experimentation.

In addition, a dorsolateral prefrontal region showed partly overlapping coding of magnitude, probability and expected value using the regression model but lesser coding when time courses were plotted and when multiplicative combinations were tested (Fig. 5). Parts of medial and orbital frontal cortex showed graded coding with expected value in the regression model (Table 2, bottom). Decreasing activations with increasing expected value were found only in the insula.

### Uncertainty coding in orbitofrontal cortex

To locate activations reflecting uncertainty separate from expected value, we used the general linear model and tested for changes in variance, separate from expected value, across the full range of probabilities for two levels of magnitude (0–100 and 0–200 points, Fig. 1, *B* and *C*). Variance followed an inverted U function across increasing probabilities, whereas expected value increased monotonically with probability (Fig. 1*C*).

We found significant correlations of averaged brain activation with variance in the lateral orbitofrontal cortex (Fig. 6*A*), which occurred also in time courses (Fig. 6*B*). The activations correlated with variance (Fig. 6*C*) but not expected value (Fig. 6*D*). We then aimed to relate the activations to risk sensitivity and regressed the goodness of fit of uncertainty-related brain activation in all participants against their individual risk attitudes. We found positive correlations with risk aversion in lateral orbitofrontal cortex (Fig. 7, *A–C*) and, significantly different, with risk-seeking more medially (Fig. 7, *D–F*).

These data suggest that activations in orbitofrontal cortex process uncertainty irrespective of changes in expected value. Moreover, uncertainty coding appears to be differentially related to the risk attitude of participants in medial and lateral orbitofrontal regions.

### Combined coding of expected value and uncertainty in prefrontal cortex

Whereas the results in the preceding text demonstrate discrete coding of expected value and uncertainty, we aimed to find brain regions that code both parameters in combination. A region in middle prefrontal cortex showed increased activation with expected value irrespective of risk attitude (Fig. 8, *B, C, F,* and *G*). However, activations in the same voxels decreased differentially with variance in risk-averse participants (Fig. 8, *D* and *H*) but increased with variance in risk seekers (Fig. 8, *E* and *I*). The difference in slope was statistically significant (*P* < 0.0001). Using this same regression model, we found two other prefrontal regions that showed selective uncertainty coding depending on individual risk attitude in the voxels coding reward value. An anterior superior frontal gyrus region showed decreases with variance only in risk-averse participants (Fig. 9, *A–G*), whereas a caudal inferior frontal gyrus region showed increased activation with variance only in risk seekers (Fig. 9, *F–J*), suggesting that different value-coding prefrontal areas subserve the evaluation of risk depending on individual risk attitudes.

## DISCUSSION

This study shows that the two basic economic decision parameters, expected value and uncertainty of reward, were coded in distinct structures of the human brain. The coding of expected value involved the striatum and, to a lesser extent, parts of frontal cortex. The responses covaried with expected value irrespective of different combinations of magnitude and probability, although some regions of striatum and frontal cortex coded specifically only magnitude or probability. These activations were unrelated to reward uncertainty. By contrast, the coding of reward uncertainty as measured by variance involved regions in the orbitofrontal cortex. Uncertainty responses correlated with individual risk attitudes without reflecting reward value. Although expected value and uncertainty appear to be coded mostly separately from each other, some prefrontal regions showed value-related activations that covaried with uncertainty depending on individual risk attitudes. Taken together the data suggest that crucial parameters for reward-directed decision-making were coded in the prime reward structures of the human brain.

The coding of expected value in some striatal regions occurred irrespectively of different multiplicative combinations of reward magnitude and probability. This was unlikely due to insensitivity of these regions to magnitude or probability as these regions showed increasing responses when these parameters varied independently (Fig. 4*B*). Neither was expected value coding due to simple coincidence or conjunction of magnitude and probability coding. To achieve expected value coding irrespective of magnitude-probability combinations would require closely matching response gains, so that response reduction with one parameter is compensated by response augmentation with the other parameter. Unmatched gains for magnitude and probability responses would not lead to unchanged brain responses when decrease in one parameter together with increase in the other results in the same expected value. The required matching of response gains for magnitude and probability in regions in which both variables are processed make the coding of expected value a remarkable achievement of neural coding.

Apart from the activations reflecting expected value we confirmed previous results indicating separate, regionally distinct relationships of striatal activations to reward magnitude (Breiter et al. 2001; Delgado et al. 2003; O'Doherty et al. 2001) and probability (Dreher et al. 2006), with the exception of a block design study that failed to find magnitude relationships (Elliott et al. 2003). A previous study found covariations with magnitude in nucleus accumbens but not with probability or expected value (Knutson et al. 2005). However, that study used an anticipatory delay between cues and outcomes in a contingent action-outcome design including loss trials, which may preclude direct comparisons with the present study. Thus it had been unclear until now whether these separate reward parameters might be coded in combination as expected value in parts of the human brain and specifically in the striatum. The present results suggest that fMRI activations reflecting reward magnitude, probability, and expected value occur in separate striatal regions and well separated from uncertainty coding.

Activations in ventromedial prefrontal regions increased with reward probability. Previous imaging studies found also no relation of medial prefrontal responses to variations in reward magnitude, irrespective of probability being kept constant or varied (Knutson et al. 2003, 2005). These results together suggest a preferential relation of ventromedial prefrontal activation with reward probability rather than magnitude. The preferential ventromedial prefrontal coding of reward probability contrasts with the distinct relationships to both reward magnitude and probability in the striatum. Thus our findings confirm that some reward structures process the basic reward components of magnitude and probability separately. It would be interesting to ask what the function of such independent coding might be. In the St. Petersburg Paradox, individuals typically refuse to pay all their finite possessions for options associated with infinite magnitude and expected value, but at near-zero probability (Bernoulli 1954). Thus they remain sensitive to independent variations in the components of expected value, and the presently observed separate coding of probability and magnitude may support such sensitivity.

The short trial duration of 1.5 s might have compromised the separation of activations in relation to the cues and rewards. However, we analyzed rewarded trials separately from unrewarded trials and found comparable results. The separations suggest that the observed relationships to reward magnitude, probability, and expected value reflect predominantly responses to the specific cues rather than the rewards. The similar activations in rewarded and unrewarded trials would rule out major contributions of reward prediction error coding that should differ across the different degrees of positive and negative reward prediction errors in probability schedules (McClure et al. 2003; O'Doherty et al. 2003). Despite the motivating influences of expected value on behavioral reaction times, we found no correlation of expected value coding to this behavioral parameter, suggesting that the activations did not reflect simple motivational factors suggested to play a role in reward processing in monkey premotor cortex (Roesch and Olson 2003). Although penalty and perception of outcome control can influence striatal reward processes (Tricomi et al. 2004), our experiments held these variables constant and the described activations should not be due to them.

Phasic responses of dopamine neurons are consistently stronger to stimuli associated with higher reward magnitude, probability, and expected value (Fiorillo et al. 2003; Tobler et al. 2005). Conversely, striatal output neurons show equal proportions of both increasing and decreasing responses during expectation and receipt of increasing reward magnitudes (Cromwell and Schultz 2003), although probability and expected value remain to be investigated. The striatum forms the primary target region of dopamine projections (e.g., Lynd-Balta and Haber 1994), and hemodynamic responses measured by fMRI primarily reflect input activity (Logothetis et al. 2001). Accordingly, the presently observed increasing magnitude-related striatal activations resemble more closely possible inputs from dopamine neurons rather than local striatal activity. Moreover, the similarity between the currently observed striatal activations and phasic dopamine responses extends to probability and expected value. It is thus conceivable that the observed striatal activations are partly driven by dopaminergic inputs, although dilatory effects on the vascular system (e.g., Hughes et al. 1986) cannot be entirely excluded. In addition, nondopaminergic inputs to the striatum or intrinsic computations within the striatum might be responsible for the nonhomogeneous, differential coding of magnitude and probability separate from expected value.

A major current finding consists of separate regions in the striatum and lateral orbitofrontal cortex that show distinct activations with expected value and uncertainty. Expected value and uncertainty of choice options are important parameters that determine behavioral preferences. They often vary independently when individual risk attitudes change over situations and time (Caraco et al. 1980, 1990; Stephens and Krebs 1986). It is therefore advantageous for agents to have an independent neuronal representation of both to choose according to individual risk preference while retaining sensitivity to variations in expected value. Thus by independently representing expected value and uncertainty, the currently observed striatal and orbitofrontal activations could make independent contributions to decisions involving risky choices.

The presently observed orbitofrontal activations with uncertainty relate well to the behavioral alterations in risky situations induced by lesions in orbitofrontal cortex (Bechara et al. 1994, 2000; Hsu et al. 2005; Miller 1985; Mobini et al. 2002; Sanfey et al. 2003). Our findings may also help to explain the altered orbitofrontal activations during risky decisions in drug addicts (Bolla et al. 2005; Ersche et al. 2005). The present results do not necessarily exclude a role of the striatum in coding uncertainty at longer time scales than tested presently. Dopamine neurons show a slower, more sustained uncertainty signal (Fiorillo et al. 2003) that might induce striatal uncertainty-related activations (Aron et al. 2004). Other regions coding uncertainty could include the posterior cingulate and parietal cortex (Huettel et al. 2006; McCoy and Platt 2005), although the present study failed to find substantial uncertainty-related activations in these regions.

The lateral orbitofrontal cortex showed stronger uncertainty-related activity with increasing individual risk aversion, whereas medial orbitofrontal activations correlated with increasing risk-seeking. Thus uncertainty responses are differentially modulated by individual risk attitudes in the two orbitofrontal regions. Individual risk attitudes are crucial in determining the utility of uncertain rewards. Expected utility theory postulates that the utility of a reward decreases with increasing uncertainty for a risk-averse individual but increases for a risk seeker. The negative and positive influences of uncertainty increase with increasing degrees of individual risk-avoiding and -seeking behavior, respectively. The differential orbitofrontal relationships of uncertainty coding to individual risk attitudes may contribute to the varying influences of uncertain rewards on utility for the individual decision maker.

Different prefrontal regions showed different forms of combined coding of expected value and variance. Taylor series expansion suggests that the expected utility of an option can be approximated by its mean and variance (and additional higher moments) (Huang and Litzenberger 1988; Stephens and Krebs 1986). As a consequence, expected value and uncertainty can separately influence the expected utility of an outcome. Risk-averse individuals aim to maximize expected reward value as well as minimize variance, whereas risk-seekers tend to maximize both expected value and variance. A variety of species, such as bumblebees (Real et al. 1982) and juncos (Caraco and Lima 1985), are sensitive to both expected value and variance. The present activations directly reflect the influence of individual risk attitude on uncertainty coding in voxels that also show expected value coding, both for risk averters and risk seekers. Although these activations may involve separate individual neurons, the close proximity of value and uncertainty coding may suggest an involvement of prefrontal cortex in the computation of an integrated expected utility signal. The selective influence of individual risk aversion on decreasing uncertainty coding contrasts with the selective influence of risk seeking on positive uncertainty coding in a different prefrontal region and may suggest that activations in different prefrontal regions underlie the pronounced differences between risk averters and risk seekers in choice preferences involving uncertain outcomes.

In conclusion, we show that reward structures of the human brain separately encode basic microeconomic reward parameters. Specifically, the striatum carries rather distinct representations of reward magnitude, probability, and expected value. Separate activations in the orbitofrontal cortex increase with reward uncertainty and correlate with individual risk attitudes. The data suggest largely distinct contributions of reward structures to the coding of value and uncertainty of reward-predicting stimuli. The particular prefrontal activations combining expected value and uncertainty into a single response may provide the basis for an expected utility signal. Thereby the presently observed activations may serve as a basis for economic decision-making.

## GRANTS

This study was supported by the Wellcome Trust, the Swiss National Science Foundation and the Roche Research Foundation. R. J. Dolan and W. Schultz are supported by Wellcome Trust Programme Grants, W. Schultz by a Wellcome Trust Principal Research Fellowship.

## Acknowledgments

We thank P. Bossaerts, Y. Christopoulos, L. Gregorios-Pippas, R. Henson, and K. Miyapuram for discussions and/or comments.

## Footnotes

The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “

*advertisement*” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

- Copyright © 2007 by the American Physiological Society