|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Department of Physiology, Kyoto Prefectural University of Medicine, Kyoto, Japan
Submitted 11 July 2007; accepted in final form 8 October 2007
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Prospective coding of future events, such as the coding of visual objects associated with another currently present object, occurs in the cerebral cortex (Rainer et al. 1999
; Sakai and Miyashita 1991
), thalamus (Komura et al. 2001
), and basal ganglia (Lauwereyns et al. 2002
). Neuronal activity during the exploration and exploitation of spatial locations for reward occurs in the striatum (Barnes et al. 2005
). Theories of reinforcement learning have proposed algorithms in which organisms estimate the reward that actions might yield and choose actions by comparing the reward values of options based on the histories of particular actions and rewards (Sutton and Barto 1998
). Reinforcement learning models of the basal ganglia (Doya 2000
; Houk et al. 1995
; O'Doherty et al. 2004
) are supported by the observations that dopamine neurons encode reward-prediction errors (Bayer and Glimcher 2005
; Fiorillo et al. 2003
; Morris et al. 2004
; Nakahara et al. 2004
; Satoh et al. 2003
), and that striatal neuron activity represents reward values of external stimuli (Cromwell and Schultz 2003
; Kawagoe et al. 1998
) as well as reward values that actions might yield (Samejima et al. 2005
). Thus it is suggested that the dorsal striatum may be the brain locus for history-based coding of forthcoming action outcomes, especially reward. On the other hand, it is also often necessary to predict and avoid aversion outcomes. Whether, and if so, the striatum is involved in encoding aversion outcomes remains unclear, although striatal neurons do respond to aversive stimuli (Blazquez et al. 2002
; Ravel et al. 2003
; Yamada et al. 2004
).
The present study addressed two specific questions to understand the roles of the striatum in coding outcomes of forthcoming behavioral responses. First, how are the histories of reward and aversion used for encoding forthcoming outcomes in the striatum during a series of instructed behavioral responses? Second, how are the behavioral responses and their instructed outcomes represented in the striatum? We examined the activity of presumed projection neurons in the striatum while two Japanese monkeys performed a visually guided lever-release task for reward, aversion, and sound outcomes, the occurrences of which could be estimated by their histories in the preceding several trials. Our results support the view that striatal projection neurons encode outcomes of forthcoming behavioral responses, especially reward, based on their histories, and modulated their behavioral response-related activity in an instructed outcome-selective manner.
| METHODS |
|---|
|
|
|---|
We used two Japanese monkeys (Macaca fuscata; monkey DA, male, 5.6 kg and monkey AI, female, 6.0 kg). All animal procedures were approved by the Animal Care and Use Committee of Kyoto Prefectural University of Medicine and conformed to National Institutes of Health guidelines.
Behavioral task
The monkeys sat on a primate chair facing a wood panel and performed a visually guided lever-release task for the following outcomes: a water reward, avoiding an airpuff on their faces, or hearing a beep sound (Fig. 1A; see also Yamada et al. 2004
). They depressed a hold lever at their own pace and were required to fixate their eyes on a fixation point (FP) [a small light-emitting diode (LED) in the center of the panel]. Another larger LED, as an instruction stimulus of response outcome, was turned on to the right of the fixation LED with two different colors presented sequentially (15 and 5° apart in monkey DA and AI, respectively). The location of the large LED was closer to the FP for monkey AI than for monkey DA because it was difficult for monkey AI to maintain her gaze on the FP with the LED at a greater distance from the FP. First, a green LED cued monkeys to wait for a subsequent instruction with the hold lever depressed (wait cue). After 1.2–1.8 s, the green LED changed color and instructed the monkeys of the outcome of the trial (Outcome Instruction). When the outcome instruction was turned off (GO), the monkeys released the hold lever quickly (reaction time <600 ms). When the outcome instruction was red, a drop of reward water (0.2 ml) was delivered into the mouth about 600 ms after the lever release. When it was blue, an airpuff was delivered to the right side of the face (26 psi, 30 ms) only when the lever-release responses were delayed (
600 ms). When it was yellow, a beep sound (1,500 Hz, 200 ms, 70 dB) occurred if the lever was released quickly (<600 ms), signaling an outcome of no reward and no aversion. Trials in which the eye position deviated >4° (monkey DA) or 5° (monkey AI) from the FP were considered to be fixation errors. If a trial was aborted due to errors (fixation break, lever release before GO signal, or delayed lever release), monkeys repeated the trial with the same outcome. The outcome instruction was either preceded (monkey DA) or followed by the wait cue (monkey AI) (Yamada et al. 2004
).
|
Recording
We recorded single-neuron discharges from the caudate nucleus and putamen in the left hemisphere of two monkeys (the region from A13 to A22 of the atlas of Japanese monkey brain in the Horsley–Clark coordinate system). We used epoxy-coated tungsten microelectrodes to record and a template-matching algorithm to isolate single-neuron discharges. Striatal presumed projection neurons and tonically active neurons (TANs, presumed cholinergic interneurons) were identified based on their background discharge rates and action potential waveforms (Aosaki et al. 1995
). Striatal projection neurons showed low spontaneous firing rates (<2 spikes/s) and phasic discharges in relation to one or more behavioral task events (Kimura et al. 1996
). Although we think that most of the "presumed" projection neurons are GABAergic projection neurons, it was still possible that the population of neurons contained very small number of
-aminobutyric acid (GABA) interneurons. Discharges of single neurons were recorded during about 75 trials (25 trials for each outcome). Eye movements were monitored by measuring the corneal reflections of an infrared light beam by a video camera with a time resolution of 4 ms. Recordings started when the monkeys had performed the behavioral task at correct performance rates >80% in each outcome trial. This required 1 mo for monkey DA and 3 mo for monkey AI after the aversion trials were last introduced, following reward and sound outcomes.
Analysis of behavioral measures for task performance
Neuronal and behavioral data were collected during 44,909 trials in monkey DA and 8,887 trials in monkey AI. To examine effects of outcome instructions on task performance, correct performance rates, reaction times for lever-release responses, and number of error trials were compared among the three outcome conditions using multiple two-sample comparisons corrected by Bonferroni method to control the family-wise significance level. To examine effects of the histories of reward, aversion, and sound on task performance, task start times (latency from release of hold lever in the preceding trial to depression of the lever to start the next trial) during all recording sessions, and rate of errors before the outcome instructions in each single recording session were measured. Trials with unusually long task start times (>3 s) were not included (0.3% in monkey DA and 4% in monkey AI). Almost all of the errors were due to fixation breaks before the outcome instructions. Effects of the TLR and TLA on the behavioral measures were examined using the following regression model
![]() | (1) |
![]() | (2) |
Analysis of neuronal discharge rates
Histograms of neuronal discharge rates were constructed before and after particular behavioral events. Average discharge rates were determined separately during nine task epochs in each trial: for 300 ms following the hold-lever depression, for 500 ms preceding the wait cue, for 500 ms following the wait cue, for 500 ms preceding the outcome instructions, for 500 ms following the outcome instructions, for 500 ms preceding the lever release, for 300 ms following the lever release, for 300 ms preceding the outcome, and for 500 ms following the outcome. In each epoch, an increase in discharge rate was regarded as significant when it was higher than the background rate (Wilcoxon two-sample test, P < 0.05), defined in this study as the lowest average discharge rate across nine task epochs and >2.0 spikes/s.
To examine how behavioral responses and their forthcoming outcomes affected neuronal discharge rates, multiple regression analyses were applied to neurons that were active during at least one task epoch other than the outcome epochs. Neuronal discharge rates (F) were fitted by the following regression model including TLR and TLA, and current outcomes
![]() | (3) |
5 or TLA
5. If the regression coefficients bTLR or bTLA were not zero at P < 0.05, neuronal activity was regarded as being significantly modulated by the TLR or TLA. Effects of the TLS, as a control, on the neuronal discharges were also examined instead of TLA by
![]() | (4) |
If bREW, bAVE, or bSOU values were not zero at P < 0.05, the neuronal activity was regarded as depending on current outcomes. To define which types of current outcomes a particular neuron encoded (reward, aversion, sound, or both reward and aversion), the neuronal discharge rates in reward and aversion trials were compared with sound trials (two-sample t-test, P < 0.05).
To examine the effect of the behavioral measures on the neuronal discharge rates, the following analyses were conducted. For neurons that were active during the task epochs before the outcome instructions, we used regression models that incorporated task start times (TSTs) into the above-mentioned models, for example
![]() | (5) |
![]() | (6) |
The number of neurons modulated by TLR, TLA, and TLS were compared by Fisher's exact probability test at P < 0.05.
Correlation between variables and linear regressions
TLR, TLA, and TLS were not independent, but were moderately correlated with each other (Table 1). Thus it was not necessarily appropriate to include all three variables for outcome histories in one formula (Grafen and Hails 2002
). We first examined modulation of neuronal activity by TLR and TLA. Then, the effect of TLS as a control was examined using another model, as described earlier (Eqs. 2 and 4). Furthermore, the stepwise regression, which is one simple way of variable selection (Padoa-Schioppa and Assad 2006
), was also used to compare the effects of each variable (TLR, TLA, TLS, REW, AVE, and SOU) on neuronal discharge rates. The forward stepwise regression was performed at P < 0.05. The numbers of neurons modulated by TLR, TLA, and TLS were not significantly different between above-mentioned models and the stepwise regression. We did not remove any regressors in previous analyses using Eqs. 1 to 6.
|
At the end of all recording experiments, small electrolytic lesions were made both in the caudate nucleus and in the putamen. Direct anodal current (20 µA) was passed for 30 s through tungsten microelectrodes. The monkeys were deeply anesthetized with Nembutal (60 mg/kg, administered intraperitoneally) and were perfused transcardially with 10% formalin in 0.9% NaCl solution. Coronal sections of the striatum, 50 µm in thickness, were stained with cresyl violet. Electrode tracks through the striatum were reconstructed on the histology sections using the electrolytic lesion marks as reference points, and the recording sites of striatal projection neurons were identified.
| RESULTS |
|---|
|
|
|---|
Monkeys learned to release a lever for three different outcomes (Fig. 1A; see also Yamada et al. 2004
): a water reward, a beep sound, or avoiding an aversive airpuff on the face by quicker responses. The monkeys performed single behavioral responses (lever release) for three different outcomes. After several months of task performance, each of the three outcome instructions had significant influences on task performance (Table 2). In monkey DA (where the outcome instruction was preceded by the wait cue), reaction times for the lever-release response were shorter in reward and aversion trials than in sound trials, and those in reward trials were shorter than those in aversion trials (P < 0.01, Bonferroni correction). Rates of release errors before the GO signal were higher in aversion trials than in sound trials. On the other hand, monkey AI (where the wait cue followed the outcome instruction) performed the task with shorter reaction times than monkey DA, and the reaction times in aversion trials were shorter than those in sound trials. The behavioral responses were quicker after the aversion outcome was newly introduced in addition to the reward and sound (reaction times before introduction of aversion trial: 320.4 ± 1.83 ms, average of reward and sound trials ± SE). This indicated that monkey AI changed her strategy to react to the GO signal as quickly as possible after the introduction of aversion trials. Rates of release errors before the GO signal were lower in reward trials than in sound trials. Because aversive face airpuff was applied to the monkeys only when the behavioral response was delayed, monkeys received an airpuff only a few times a day after learning (monkey DA; 2.0%, monkey AI; 0.52%). The observed behavioral measures of task performance under the three outcomes indicated that the instruction served as a predictor of outcomes, although the two monkeys did not perform in the same way.
|
Each of the three instructions associated with their respective outcomes occurred twice within a six-consecutive-trial block (Fig. 1B). Therefore the probability of reward increased monotonically with the increase in the TLR (Fig. 1D). On the other hand, occurrence of large TLR decreased compared with small TLR because it was reset by individual reward trials in less than about six trials (Fig. 1C). Likewise, the probability of aversion trials increased monotonically with the increase in the TLA. The monkeys' task start times became shorter as the TLR increased (Fig. 2, A and B, Table 3). In addition, the number of errors before the instruction of outcomes decreased as the TLR increased (Fig. 2, C and D), whereas the task start times or number of errors increased selectively to the trials with TLR = 1 (Fig. 2, A and D). Thus the reward history (TLR) significantly modulated these behavioral measures of task performance in both of the monkeys. In contrast, the aversion history (TLA) had only small influences on these behavioral measures (Table 3), and the influence of the TLA was not significantly different from that of the TLS (comparison of regression slope, P > 0.05 in all cases).
|
|
We recorded spike discharges from 163 presumed projection neurons in the caudate nucleus and putamen. In 82 of them, the discharge rates increased during one or a few task epochs in each trial: following the hold-lever depression, preceding and following the wait cue, preceding and following the outcome instructions, and preceding and following the lever release. We examined, first, whether and how the TLR and TLA influenced striatal neuron discharges. Figure 3A shows activity of a neuron in the caudate nucleus that exhibited gradually increasing discharges up to the outcome instruction and rapid decline after the instruction. The magnitude of the discharge rate had a positive regression slope with TLR (Fig. 3B, multiple regression analysis: slope = 3.02, P = 0.032, R2 = 0.116), but not with TLA (slope = –1.14, P = 0.43, Fig. 3C). Thus the increase in discharge rates across trials (TLR) seemed to occur in parallel with the increase in reward probability (Fig. 3B). Indeed, the neuronal discharge rates had significant positive regression slope with expected reward probability calculated from the data throughout the neuronal recording period in Fig. 1D (slope = 36.6, P = 0.021, R2 = 0.137). Reward history also influenced the discharge rates during depression of the hold lever, which occurred at the beginning of each trial. A putamen neuron in Fig. 3D exhibited a phasic increase in discharge rate just after the lever depression to start trials. The discharge rate had a positive regression slope with TLR (Fig. 3E, slope = 5.25, P < 0.001, R2 = 0.313), whereas the regression slope with TLA was not significant (Fig. 3F, slope = –0.83, P = 0.529). In 82 neurons that discharged during 182 task epochs, we examined the influence of TLR and TLA on discharge rates using multiple regression analyses. In 30 of the 82 neurons (18 in monkey DA and 12 in monkey AI), TLR had significant influences in at least one task epoch, whereas TLA influenced discharge rates in much smaller number of neurons (6 in monkey DA and 5 in monkey AI; Supplemental Table 1).1 The number of neurons modulated by TLR was larger than that modulated by TLS (11/82, Fisher's exact probability test; P = 0.001), whereas the number of neurons modulated by TLA was not significantly different from that by TLS (P = 0.99). Thus the neuronal discharge rates of striatal neurons were predominantly modulated by reward history compared with aversion history.
|
|
|
Once the outcomes of current trials were instructed, monkeys initiated behavioral responses by releasing the lever to acquire the water, avoid the airpuff, or hear the beep sound. The percentage of neurons showing a significant regression slope with TLR declined after the outcome instruction (3/39 neurons, P < 0.001 in monkey DA and 7/31, P = 0.035 in monkey AI; Fig. 4, A, B, and D). The percentage of neurons modulated by TLR after the outcome instruction was not significantly different from that of neurons modulated by TLS (3/39 vs. 6/39, P = 0.298 in monkey DA and 7/31 vs. 2/31, P = 0.147 in monkey AI). About 70% (48/70) of neurons were never modulated by either one of histories of reward, aversion, and sound. Thus we examined whether the striatal neurons encode instructed outcomes during behavioral performances after the explicit instruction of outcomes. Many neurons exhibited instructed outcome-selective activity, while exhibiting no clear modulation by TLR. For example, a putamen neuron showed burst discharges at the lever release response after the instruction of reward outcome (Fig. 6A, reward vs. sound, P < 0.001). Following the instructions of aversion and sound outcomes, however, the neuron exhibited almost no activation (Fig. 6, C and D, aversion vs. sound, P = 0.696). The discharge rates of this neuron during trials for reward did not show a significant regression slope with TLR (Fig. 6B, slope = 0.111, P = 0.32). This is in sharp contrast to the activity surrounding the depression of the lever at the beginning of the trial, which reflects past information about TLR. We examined the modulation of discharge rates by types of outcome in current trials of 70 neurons that were active during task epochs after the outcome instruction. Significant modulation of discharge rates occurred in a large number of neurons by the outcomes of current behavioral responses, especially for the reward outcome (43/70,
Fig. 8C). The reward outcome–selective neurons showed higher discharges in reward trials than in sound trials, with no significant difference between aversion and sound trials (24/43, Figs. 6 and 7B), although slightly more than half of the neurons (23/43) showed lower discharges in reward trials than in sound trials, with no significant difference between aversion and sound trials (Fig. 7, A and C). The reward preference was consistent across task epochs in most neurons (39/43). Reward outcome–dependent modulation was frequently observed during every task epoch after the outcome instruction (Fig. 8, A and B). In contrast, only a small number of neurons were selectively modulated during trials with aversion (6/70, Fig. 7, D–F) or sound outcome (6/70, Fig. 7, G–I). Discharge rates of 3 neurons were modulated by both reward and aversion outcomes at single task epochs. Thus after outcome instruction, striatal neurons maintained the instructed outcomes and modulated their behavioral response-related activity in an outcome-selective manner, especially the reward outcome.
|
|
|
To test whether neuronal activity reflects behavioral measures of task performance, we examined whether the variation of task start time was correlated with neuronal discharge rates before outcome instruction by using multiple regression analysis. In a neuron that exhibited a burst of discharges at the lever depression (Fig. 3D), average discharge rates in each trial were plotted against task start time in Fig. 9A. The regression analysis showed a significant regression slope of discharge rates with TLR (slope = 5.18, P < 0.001, R2 = 0.316) but not with task start time (slope = –0.01, P = 0.61). The analysis of 44 neurons that were activated before outcome instruction revealed significant modulation of activity in 21 neurons by reward history (TLR) but not by task start time, in 6 neurons by task start time but not by TLR, and in 2 neurons by both (Fig. 9B). The task start time influenced only a small number of neurons modulated by TLR (Fig. 9C). Thus the discharge rates of a subset of striatal neurons were modulated by reward history, but not by the behavioral performance, before the behavioral outcome was instructed.
|
Location of recorded neurons in the caudate nucleus and putamen
The recording sites of 163 neurons in the caudate nucleus and putamen of the two monkeys were histologically reconstructed (Fig. 10). During epochs before outcome instruction, 18 and 26 neurons showed increased discharge rates in the caudate nucleus and putamen, respectively (Fig. 10A). Neurons with increased discharge rates just before occurrence of the outcome instruction (triangles) were more frequently observed in the caudate nucleus (15/18) than in the putamen (3/26, Fisher's exact probability test; P < 0.001). In contrast, neurons that showed an increase in discharge rate during the lever depression to initiate trials (squares) were more frequently observed in the putamen (19/26) than in the caudate nucleus (1/18, P < 0.001). Neurons that exhibited discharges during the wait cue period for the outcome instruction (Fig. 4A) were observed both in caudate nucleus (9/18) and putamen (7/26). About half of the neurons active before the outcome instructions (24/44) had discharges with significant positive or negative regression slopes with TLR (Fig. 4C). Thus the history-based processing occurred in both caudate nucleus and putamen.
|
| DISCUSSION |
|---|
|
|
|---|
History-based coding of reward outcome
Rewards experienced as an outcome of previously attempted actions have strong influences on choosing subsequent actions (Thorndike 1898
). In this study, the monkeys changed their behavioral responses depending on the probability of reward at current trials based on the recent history of reward (Fig. 2 and Table 3) after extensive periods of training. If the outcomes of taking the same actions vary among appetitive, aversive, and other stimuli, as occurred in the present study, estimation of when the next reward trials come becomes critical (i.e., the reward value of the current state). We found that the activity of a subset of neurons in the caudate nucleus and putamen could encode incrementing or decrementing reward probability across trials (Figs. 3B and 5). The reward history–based modulation of striatal neuron activity seemed to reflect estimated probability of reward (Fig. 3B). Alternatively, it might simply represent the number of trials since the last reward trial (TLR). This was not likely, however, because neurons in this study exhibited either gradually increasing or decreasing activity depending on TLR, unlike neurons representing numerical numbers in the parietal cortex (Nieder 2005
; Sawamura et al. 2002
).
In some neurons, discharge rates at the first trials after reward trials (TLR = 1) were much higher than those in other later trials (Fig. 5B). It is possible that this type of activity may also be related to the decrease in reward values specifically at the first trials following reward trials or preceding reward (Simmons and Richmond 2007
). On the other hand, it might also be related to other factors such as the schedule process across trials, marking the start and end of a long-term schedule for reward. Although neuronal activity related to reward prediction across trials was reported in other brain areas (Hikosaka and Watanabe 2004
; Ichihara-Takeda and Funahashi 2006
; Shidara and Richmond 2002
), the activity found in the present study was different from that in other brain areas, in that the presence and absence of reward in current trials was explicitly instructed before actions were initiated in these previous works, whereas in this study probability of individual outcomes could be estimated only implicitly based on their histories in the last several trials.
Striatal neurons might encode reward values of task cues and intended actions in terms of magnitude, kinds, and probability of rewards (Cromwell and Schultz 2003
; Hassani et al. 2001
; Kawagoe et al. 1998
; Samejima et al. 2005
). Samejima et al. (2005)
showed that the discharge rates of striatal neurons represent reward value of free-choice action (action value; probability x volume for each option) estimated by previously performed actions and their outcomes in the last several trials. This supported the reinforcement learning models of the basal ganglia (Doya 2000
; Houk et al. 1995
; O'Doherty et al. 2004
), which proposed that action values represented in the striatum are used for the selection of the highest-value action and that the action values are updated by dopamine signals conveying reward prediction errors. Present observation of history-based coding of forthcoming behavioral outcomes might correspond to representation of state values in the reinforcement learning theories (Sutton and Barto 1998
), which also play a major role in reward-oriented adaptive action selection. Dopaminergic neurons have been shown to carry signals of reward prediction errors (Bayer and Glimcher 2005
; Fiorillo et al. 2003
; Morris et al. 2004
; Nakahara et al. 2004
; Satoh et al. 2003
). In addition, dopaminergic neurons signal errors of reward prediction estimated by the number of preceding unrewarded trials following a reward trial (Nakahara et al. 2004
).
Coding of current outcomes of behavioral responses
After an outcome of behavioral responses was instructed, neuronal discharge rates were modulated by the types of outcome, especially reward, in a large number of presumed projection neurons (Figs. 7 and 8), consistent with previous reports (Hollerman et al. 1998
; Kawagoe et al. 1998
). Significantly fewer neurons (n = 9) were influenced by aversion in their responses to the task events of outcome instruction, wait cue, and lever-release responses (Fig. 8), except in monkey DA, where a considerable number of neurons were modulated by the instruction of aversion outcome (5/16, 31%, Supplemental Table 1). The percentage of the presumed projection neurons modulated by instruction of aversion outcome seemed to be similar to that of 317 tonically active neurons (TANs) in the striatum observed previously in the same monkey (25%; Yamada et al. 2004
). TANs responded selectively either to outcome instruction or to lever release, whereas in projection neurons, discharge rates increased in relation to every task event not only before but also after outcome instructions. Projection neurons showed reward preference throughout task trials. Thus there was a clear difference of outcome-dependent activity between projection neurons and TANs as a population. Projection neurons in monkey AI might have a tendency to be influenced less by current aversion outcome than those in monkey DA (2/34 in monkey AI and 7/39 monkey DA, P = 0.16). This might be because monkey AI made behavioral responses very quickly at trials with every outcome instruction (Table 2). Thus the striatal projection neurons seemed to play a role in detection and discrimination of behavioral outcomes and to signal behavioral responses in a reward-dominant manner. Dominant modulation of neuronal activity by reward outcome over aversion outcomes of behavioral responses was also observed in the lateral prefrontal cortex of monkeys (Kobayashi et al. 2006
), whereas the orbitofrontal cortex encodes relative preference of reward and aversion outcomes (Hosokawa et al. 2007
).
One concern regarding the different influences of reinforcers on striatal neuron activity might be that, even if "aversion outcome" was instructed, airpuffs were actually delivered in only a small percentage of trials (a few times in a day) when lever-release responses were delayed. In other words, monkeys could avoid receiving airpuffs by quicker responses, and thus the instruction of "aversion outcome" might have had a smaller impact than that of "reward outcome" on task performances, although TANs previously observed in the same monkeys responded to instruction of aversion outcome as well as reward outcome (Yamada et al. 2004
). Further study is necessary to clarify how much striatal projection neurons, particularly in its dorsal part, are sensitive to aversion outcome, while the functional role of ventral striatum in encoding aversion outcome was revealed (Setlow et al. 2003
).
Influence of aversion history on activity of striatal neurons
Reward history–dependent coding was dominant over aversion history–dependent coding in the striatum (Fig. 4). There are several potential origins for this dominance of reward history. A simple one is the weaker impact of aversion history on task performance compared with reward history (Table 3), based on the actual, rather small number of airpuffs received, as described earlier for outcomes. However, the behavioral measures of task performance, such as task start time, influenced neuronal discharge rates in only a small number of striatal neurons in our sample (Fig. 9). Another possibility is that it was more important for monkeys to predict when the next reward would be available to obtain it, than to predict the next aversion trials to avoid it (Fig. 2, Table 3). Reward history–dependent coding may be dominant because aversion trials, which could be regarded as costs to obtain reward, had a relatively weak impact on neuronal discharge rates recorded in this study. If aversion trials had a stronger impact with airpuffs occurring every time at aversion trials (Blazquez et al. 2002
), the percentage of neurons with aversion history–based coding might have been higher. Decision making depends on the balance between cost and reward and on relative preferences among different values of reward. Medial frontal cortex (Walton et al. 2002
, 2003
) and orbitofrontal cortex (Padoa-Schioppa and Assad 2006
; Tremblay and Schultz 1999
) seem to be involved in these processes. The striatum may also play major roles in valuation, decision, and selection of actions (Cromwell and Schultz 2003
; Doya 2000
; Graybiel et al. 1994
; Houk et al. 1995
; Kawagoe et al. 1998
; O'Doherty et al. 2004
; Samejima et al. 2005
) in concert with processing in the frontal cortical areas through the cortico-basal ganglia loop circuits.
Functional implication of history- and instruction-based coding of forthcoming behavioral outcomes
The present study revealed that presumed projection neurons in the striatum encode behavioral outcomes, especially reward, in two different manners. Activity of a subset of neurons in the caudate nucleus and putamen encoded incrementing or decrementing reward probability across trials. The reward history–based signals may serve to guide behavior to reward by brisk responses and low rate of errors when reward is impending. When reward is not expected, however, the behavioral responses are sluggish and erroneous (Shidara et al. 1998
). This could underlie response bias, the behavioral observation that subjects are biased toward selecting one particular response over another (Lauwereyns et al. 2002
; Watanabe et al. 2001
). Once an outcome of the current trial became evident, striatal neurons selectively represented the current outcomes before, during, and after behavioral responses. The coding of current behavioral outcomes might play a role both in detecting and in maintaining the instructed outcomes. On the other hand, activity during behavioral responses (lever release) was often late in relation to response onset and seemed too late to play a major role in initiating the responses, such as in Fig. 6A. What could be the roles played by the coding of current response outcomes during and after behavioral responses? A conceivable role is to monitor or provide feedback on recently executed actions and their outcomes by providing downstream brain areas with the signals to evaluate the executed actions and evolve them for the next trials.
Previously, we showed that presumed cholinergic interneurons (TANs) in the caudate nucleus respond to stimuli associated with motivational outcomes, whereas TANs in the putamen are more related to movement-eliciting signals (Yamada et al. 2004
). Consistent with this evidence, presumed projection neurons in the caudate nucleus had a tendency to be predominantly activated surrounding the outcome instruction, whereas those in the putamen were more active during lever depression and release movements. This supports the view that spatially segregated processing of cognition- and action-related signals by neuronal circuits occurs within the dorsal striatum. On the other hand, the local striatal circuits such as those in the putamen processing lever depression and release signals could also become specialized dynamically to process information about past trials before the outcome instruction and about current outcomes after their explicit instruction.
Both the history- and current instruction–based codes of behavioral outcomes might be neural substrates for action planning and learning in the dorsal striatum. Impairments in the coding may lead to deficits in predictive action planning for a distant goal (Dickinson and Balleine 1994
) or habit formation (Yin and Knowlton 2006
), which might underlie some of the core symptoms of Parkinson's disease (Marsden 1984
; Soliveri et al. 1997
).
| GRANTS |
|---|
|
|
|---|
| ACKNOWLEDGMENTS |
|---|
|
|
|---|
| FOOTNOTES |
|---|
1 The online version of this article contains supplemental data. ![]()
Address for reprint requests and other correspondence: H. Yamada, Department of Physiology, Kyoto Prefectural University of Medicine, Kawaramachi-Hirokoji, Kamigyo-ku, Kyoto, Japan (E-mail:hyamada{at}koto.kpu-m.ac.jp)
| REFERENCES |
|---|
|
|
|---|
Barnes TD, Kubota Y, Hu D, Jin DZ, Graybiel AM. Activity of striatal neurons reflects dynamic encoding and recoding of procedural memories. Nature 437: 1158–1161, 2005.[CrossRef][Medline]
Bayer HM, Glimcher PW. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47: 129–141, 2005.[CrossRef][Web of Science][Medline]
Blazquez PM, Fujii N, Kojima J, Graybiel AM. A network representation of response probability in the striatum. Neuron 33: 973–982, 2002.[CrossRef][Web of Science][Medline]
Cromwell HC, Schultz W. Effects of expectations for different reward magnitudes on neuronal activity in primate striatum. J Neurophysiol 89: 2823–2838, 2003.
Dickinson A, Balleine B. Motivational control of goal-directed action. Anim Learn Behav 22: 1–18, 1994.[Web of Science]
Doya K. Complementary roles of basal ganglia and cerebellum in learning and motor control. Curr Opin Neurobiol 10: 732–739, 2000.[CrossRef][Web of Science][Medline]
Fiorillo CD, Tobler PN, Schultz W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299: 1898–1902, 2003.
Grafen A, Hails R. Modern Statistics for the Life Sciences. New York: Oxford Univ. Press, 2002.
Graybiel AM, Aosaki T, Flaherty AW, Kimura M. The basal ganglia and adaptive motor control. Science 265: 1826–1831, 1994.
Hassani OK, Cromwell HC, Schultz W. Influence of expectation of different rewards on behavior-related neuronal activity in the striatum. J Neurophysiol 85: 2477–2489, 2001.
Hikosaka K, Watanabe M. Long- and short-range reward expectancy in the primate orbitofrontal cortex. Eur J Neurosci 19: 1046–1054, 2004.[CrossRef][Web of Science][Medline]
Hollerman JR, Tremblay L, Schultz W. Influence of reward expectation on behavior-related neuronal activity in primate striatum. J Neurophysiol 80: 947–963, 1998.
Hosokawa T, Kato K, Inoue M, Mikami A. Neurons in the macaque orbitofrontal cortex code relative preference of both rewarding and aversive outcomes. Neurosci Res 57: 434–445, 2007.[CrossRef][Web of Science][Medline]
Houk JC, Adams JL, Barto AG. A model of how the basal ganglia generate and use neural signals that predict reinforcement. In: Models of Information Processing in the Basal Ganglia, edited by Houk JC, Davis JL, Beiser DG. Cambridge, MA: MIT Press, 1995, p. 249–270.
Ichihara-Takeda S, Funahashi S. Reward-period activity in primate dorsolateral prefrontal and orbitofrontal neurons is affected by reward schedules. J Cogn Neurosci 18: 212–226, 2006.[CrossRef][Web of Science][Medline]
Kawagoe R, Takikawa Y, Hikosaka O. Expectation of reward modulates cognitive signals in the basal ganglia. Nat Neurosci 1: 411–416, 1998.[CrossRef][Web of Science][Medline]
Kimura M, Kato M, Shimazaki H, Watanabe K, Matsumoto N. Neural information transferred from the putamen to the globus pallidus during learned movement in the monkey. J Neurophysiol 76: 3771–3786, 1996.
Kobayashi S, Nomoto K, Watanabe M, Hikosaka O, Schultz W, Sakagami M. Influences of rewarding and aversive outcomes on activity in macaque lateral prefrontal cortex. Neuron 51: 861–870, 2006.[CrossRef][Web of Science][Medline]
Komura Y, Tamura R, Uwano T, Nishijo H, Kaga K, Ono T. Retrospective and prospective coding for predicted reward in the sensory thalamus. Nature 412: 546–549, 2001.[CrossRef][Medline]
Lauwereyns J, Watanabe K, Coe B, Hikosaka O. A neural correlate of response bias in monkey caudate nucleus. Nature 418: 413–417, 2002.[CrossRef][Medline]
Marsden CD. Which motor disorder in Parkinson's disease indicates the true motor function of the basal ganglia? Ciba Found Symp 107: 225–241, 1984.[Medline]
Morris G, Arkadir D, Nevet A, Vaadia E, Bergman H. Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron 43: 133–143, 2004.[CrossRef][Web of Science][Medline]
Nakahara H, Itoh H, Kawagoe R, Takikawa Y, Hikosaka O. Dopamine neurons can represent context-dependent prediction error. Neuron 41: 269–280, 2004.[CrossRef][Web of Science][Medline]
Nieder A. Counting on neurons: the neurobiology of numerical competence. Nat Rev Neurosci 6: 177–190, 2005.[CrossRef][Medline]
O'Doherty J, Dayan P, Schultz J, Deichmann R, Friston K, Dolan RJ. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304: 452–454, 2004.
Padoa-Schioppa C, Assad JA. Neurons in the orbitofrontal cortex encode economic value. Nature 441: 223–226, 2006.[CrossRef][Medline]
Rainer G, Rao SC, Miller EK. Prospective coding for objects in primate prefrontal cortex. J Neurosci 19: 5493–5505, 1999.
Ravel S, Legallet E, Apicella P. Responses of tonically active neurons in the monkey striatum discriminate between motivationally opposing stimuli. J Neurosci 23: 8489–8497, 2003.
Sakai K, Miyashita Y. Neural organization for the long-term memory of paired associates. Nature 354: 152–155, 1991.[CrossRef][Medline]
Samejima K, Ueda Y, Doya K, Kimura M. Representation of action-specific reward values in the striatum. Science 310: 1337–1340, 2005.
Satoh T, Nakai S, Sato T, Kimura M. Correlated coding of motivation and outcome of decision by dopamine neurons. J Neurosci 23: 9913–9923, 2003.
Sawamura H, Shima K, Tanji J. Numerical representation for action in the parietal cortex of the monkey. Nature 415: 918–922, 2002.[CrossRef][Medline]
Setlow B, Schoenbaum G, Gallagher M. Neural encoding in ventral striatum during olfactory discrimination learning. Neuron 38: 625–636, 2003.[CrossRef][Web of Science][Medline]
Shidara M, Aigner TG, Richmond BJ. Neuronal signals in the monkey ventral striatum related to progress through a predictable series of trials. J Neurosci 18: 2613–2625, 1998.
Shidara M, Richmond BJ. Anterior cingulate: single neuronal signals related to degree of reward expectancy. Science 296: 1709–1711, 2002.
Simmons JM, Richmond BJ. Dynamic changes in representations of preceding and upcoming reward in monkey orbitofrontal cortex. Cereb Cortex doi: 10.1093/cercor/bhm034.
Soliveri P, Brown RG, Jahanshahi M, Caraceni T, Marsden CD. Learning manual pursuit tracking skills in patients with Parkinson's disease. Brain 120: 1325–1337, 1997.
Sutton RS, Barto AG. Reinforcement Learning. Cambridge, MA: MIT Press, 1998.
Thorndike EL. Animal intelligence: an experimental study of the associate processes in animals. Psychol Rev Monogr Suppl 2: 1–109, 1898.
Tremblay L, Schultz W. Relative reward preference in primate orbitofrontal cortex. Nature 398: 704–708, 1999.[CrossRef][Medline]
Walton ME, Bannerman DM, Alterescu K, Rushworth MF. Functional specialization within medial frontal cortex of the anterior cingulate for evaluating effort-related decisions. J Neurosci 23: 6475–6479, 2003.
Walton ME, Bannerman DM, Rushworth MF. The role of rat medial frontal cortex in effort-based decision making. J Neurosci 22: 10996–11003, 2002.
Watanabe M, Cromwell HC, Tremblay L, Hollerman JR, Hikosaka K, Schultz W. Behavioral reactions reflecting differential reward expectations in monkeys. Exp Brain Res 140: 511–518, 2001.[CrossRef][Web of Science][Medline]
Yamada H, Matsumoto N, Kimura M. Tonically active neurons in the primate caudate nucleus and putamen differentially encode instructed motivational outcomes of action. J Neurosci 24: 3500–3510, 2004.
Yin HH, Knowlton BJ. The role of the basal ganglia in habit formation. Nat Rev Neurosci 7: 464–476, 2006.[CrossRef][Web of Science][Medline]
This article has been cited by other articles:
![]() |
M. Joshua, A. Adler, B. Rosin, E. Vaadia, and H. Bergman Encoding of Probabilistic Rewarding and Aversive Events by Pallidal and Nigral Neurons J Neurophysiol, February 1, 2009; 101(2): 758 - 772. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Joshua, A. Adler, R. Mitelman, E. Vaadia, and H. Bergman Midbrain Dopaminergic Neurons and Striatal Cholinergic Interneurons Encode the Difference between Reward and Aversive Events at Different Epochs of Probabilistic Classical Conditioning Trials J. Neurosci., November 5, 2008; 28(45): 11673 - 11684. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Visit Other APS Journals Online |