JN Fuel your research with LabChart
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J Neurophysiol 95: 567-584, 2006; doi:10.1152/jn.00458.2005
0022-3077/06 $8.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (28)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hikosaka, O.
Right arrow Articles by Nakahara, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hikosaka, O.
Right arrow Articles by Nakahara, H.

REVIEW

Basal Ganglia Orient Eyes to Reward

Okihide Hikosaka1, Kae Nakamura1 and Hiroyuki Nakahara2

1Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, Bethesda, Maryland; and 2Laboratory for Mathematical Neuroscience, RIKEN Brain Science Institute, Saitama, Japan

Submitted 4 April 2005; accepted in final form 23 September 2005

ABSTRACT

Expectation of reward motivates our behaviors and influences our decisions. Indeed, neuronal activity in many brain areas is modulated by expected reward. However, it is still unclear where and how the reward-dependent modulation of neuronal activity occurs and how the reward-modulated signal is transformed into motor outputs. Recent studies suggest an important role of the basal ganglia. Sensorimotor/cognitive activities of neurons in the basal ganglia are strongly modulated by expected reward. Through their abundant outputs to the brain stem motor areas and the thalamocortical circuits, the basal ganglia appear capable of producing body movements based on expected reward. A good behavioral measure to test this hypothesis is saccadic eye movement because its brain stem mechanism has been extensively studied. Studies from our laboratory suggest that the basal ganglia play a key role in guiding the gaze to the location where reward is available. Neurons in the caudate nucleus and the substantia nigra pars reticulata are extremely sensitive to the positional difference in expected reward, which leads to a bias in excitability between the superior colliculi such that the saccade to the to-be-rewarded position occurs more quickly. It is suggested that the reward modulation occurs in the caudate where cortical inputs carrying spatial signals and dopaminergic inputs carrying reward-related signals are integrated. These data support a specific form of reinforcement learning theories, but also suggest further refinement of the theory.

INTRODUCTION

The initiation of body movements can be influenced by many factors, including sensory inputs, memory, attention, and arousal. Another factor that is vital for the animal's survival is reward. Unlike the other factors, which are usually based on the past events and the present context, reward is conceived as a goal that is to be achieved in the future. This is the very nature that endows the animal with the ability to acquire a large variety of voluntary behaviors. Not surprisingly, the relationship between reward and behavior has been a central theme in psychology (Balleine and Dickinson 1998Go) and has now become a main theme of neuroscience. On one hand, the retroactive nature of reward (i.e., reward influences the action that leads to the reward) still challenges experimental and theoretical neuroscientists working on neuronal mechanisms of reinforcement learning (Barto 1994Go; Dayan and Balleine 2002Go; Reynolds and Wickens 2002Go; Schultz et al. 1997Go). On the other hand, the universal feature of reward-oriented behavior among animal species including humans has provoked interdisciplinary interactions between neuroscience and animal ecology (Sugrue et al. 2005Go) as well as human economics (Glimcher and Rustichini 2004Go; Montague and Berns 2002Go).

What is fascinating about these theoretical advances is that they have received various lines of experimental support. For example, studies using trained monkeys revealed that many neurons in what are usually called cognitive or sensorimotor areas are modulated by expected reward. They include the dorsolateral prefrontal cortex (Barraclough et al. 2004Go; Inoue et al. 1985Go; Kobayashi et al. 2002aGo; Leon and Shadlen 1999Go; Watanabe 1996Go; Watanabe et al. 2002Go), posterior parietal cortex (Glimcher 2001Go; Platt and Glimcher 1999Go; Sugrue et al. 2004Go), premotor cortex (Roesch and Olson 2003Go, 2004Go), and dorsal striatum (Cromwell and Schultz 2003Go; Hollerman et al. 1998Go; Kawagoe et al. 1998Go; Lauwereyns et al. 2002aGo,bGo; Takikawa et al. 2002aGo; Tremblay et al. 1998Go; Watanabe et al. 2003Go). Furthermore, other factors that influence behavior, such as learning (Dayan and Balleine 2002Go), memory (Baxter and Murray 2002Go), and attention (Maunsell 2004Go), seem dependent on reward. Similar findings have been reported in functional imaging studies using human subjects (for review see McClure et al. 2004bGo). These data appear to provide a strong background for understanding the neural mechanism of reward-modulated behavior.

Ironically, however, the widespread effects of reward in the brain make it more difficult to pinpoint the mechanism of reward-modulated behavior. Reward-dependent neural activity may not necessarily indicate that the neurons participate in the modulation of behavior based on reward. Part of the brain that may be particularly suited for controlling or generating reward-oriented behavior is the basal ganglia. The basal ganglia receive substantial reward-related information and strongly influence body movements. First, the motor function of the basal ganglia is illustrated by various kinds of movement disorders (e.g., inability to initiate or suppress movements) in basal ganglia dysfunctions (e.g., Parkinson's disease) (Denny-Brown 1962Go). This "motor" function is achieved by the outputs of the basal ganglia to the brain stem motor areas (e.g., superior colliculus) (Grillner et al. 2005Go; Takakusaki et al. 2004Go) and the "movement-related " areas in the cerebral cortex through the thalamus (Hoover and Strick 1999Go; Parent and Hazrati 1995Go). Second, the reward-related information to the basal ganglia is likely to be derived from substantial inputs from the limbic system (e.g., amygdala) to the ventral striatum (e.g., nucleus accumbens) (Haber and McFarland 1999Go; Mogenson et al. 1980Go), dorsal striatum (i.e., caudate nucleus and putamen) (Ragsdale and Graybiel 1988Go; Uno and Ozawa 1991Go), and to dopamine neurons in and around the substantia nigra (Fudge and Haber 2000Go). In particular, dopamine neurons, which appear to carry an essential signal for reward-based learning(Schultz 1998Go), are an important part of the basal ganglia system and project most heavily within the basal ganglia. Third, sequential and parallel inhibitory connections in the basal ganglia are thought to be suitable for the selection and learning of optimal behavior (Hikosaka et al. 1993aGo; Mink 1996Go). Fourth, the basal ganglia are thought to play an important role in learning of sensorimotor procedures or habits (Graybiel 1998Go; Nakahara et al. 2001Go; Packard and Knowlton 2002Go; Salmon and Butters 1995Go). Finally, sensorimotor-cognitive signals originating from the cerebral cortex are funneled through the basal ganglia and are returned to the cerebral cortex (Parent and Hazrati 1995Go). In short, the basal ganglia are located in a perfect position to control motor behaviors based on reward information.

In the present review, we first summarize findings that support that the basal ganglia are involved in reward-oriented behavior. We will then focus on studies using saccadic eye movement, which provides an excellent experimental model. Neural network models based on these studies may be applicable to other motor behaviors. Finally, the results will be discussed in a wider and theoretical framework, especially in relation to reinforcement learning theories.

ROLE OF THE BASAL GANGLIA IN REWARD-ORIENTED BEHAVIOR

The relationship of the basal ganglia with reward-oriented behavior may not be evident at a first glance. Severe movement disorders commonly found in patients with dysfunctions of the basal ganglia, such as Parkinson's disease and Huntington's disease, demonstrate that the basal ganglia play a key role in controlling body movements (DeLong and Georgopoulos 1981Go; Denny-Brown 1962Go). However, anatomical and physiological studies cast doubt on this motor-only view of the basal ganglia. The motor functions appear to be represented in limited parts in the basal ganglia, specifically, the caudal part of the putamen and related areas that receive heavy projections from the motor areas in the cerebral cortex (Flaherty and Graybiel 1995Go; Künzle 1975Go; Takada et al. 1998Go). A significant portion of the basal ganglia receives inputs mainly from association cortices (Parent and Hazrati 1995Go; Selemon and Goldman-Rakic 1985Go; Yeterian and Pandya 1991Go) and appears to be devoted to nonmotor cognitive functions (Brown et al. 1997Go; Middleton and Strick 1994Go; Packard and Knowlton 2002Go). The caudate nucleus, another part of the dorsal striatum, is a key station of the cognitive functions and in addition controls saccadic eye movement (Hikosaka et al. 2000Go). However, the relationship to reward is not evident in this scheme of the basal ganglia.

Reward has been discussed mainly in relation to the ventral striatum and/or the nucleus accumbens (Mogenson et al. 1980Go; Swanson 2000Go). It receives massive inputs from the limbic system, especially the amygdala (Fudge et al. 2002Go) and the orbitofrontal cortex (Haber et al. 1995Go; Selemon and Goldman-Rakic 1985Go). Lesions in the ventral striatum result in deficits in a variety of tasks in which reward-predictive cues are used to guide subsequent responding (Everitt et al. 1991Go, 1999Go; Kelley 2004Go). Neurons in the ventral striatum respond to cues that predict appetitive outcomes (Nicola et al. 2004Go; Setlow et al. 2003Go). These findings, together with anatomical data, gave rise to the view that motor, oculomotor, cognitive, and motivational functions are served separately by independent loop circuits formed by subareas in the basal ganglia and subareas in the cerebral cortex (Alexander et al. 1986Go). However, subsequent anatomical studies have shown that there are indirect pathways that originate from the ventral striatum and reach the dorsolateral striatum through the mutual and partially overlapping connections between the striatum and midbrain dopamine neurons (Haber et al. 2000Go).

Indeed, recent studies using monkeys and humans have revealed that the impact of reward is almost equally as strong in the dorsal striatum as in the ventral striatum. This was shown by manipulating the amount, frequency, or kinds of reward depending on task context while the subject is engaged in a sensorimotor task. Activity of single neurons in the dorsal as well as ventral striatum is strongly influenced by the expected outcome of food or water reward (Hollerman et al., 1998Go) (Fig. 1). For example, many neurons in the caudate and putamen exhibit sustained activity before an expected reward is delivered (Hikosaka et al. 1989cGo), similarly to neurons in the ventral striatum (Schultz et al. 1992Go). Other neurons respond differentially to sensory stimuli that indicate the presence or absence of upcoming reward (Hollerman et al. 1998Go; Kawagoe et al. 1998Go). Such anticipatory and sensory responses are related to the amount of the expected reward (Cromwell and Schultz 2003Go) or to the temporal proximity of reward delivery (Bowman et al. 1996Go). These types of reward-related activity are found in the dorsal striatum as often as in the ventral striatum (Cromwell and Schultz 2003Go; Hollerman et al. 1998Go). In the rat similar reward-predicting activity is found in the ventral striatum (nucleus accumbens) (Miyazaki et al. 1998Go; Nicola et al. 2004Go), but the dorsal striatum has not been examined in this respect.


Figure 1
View larger version (38K):
[in this window]
[in a new window]
 
FIG. 1. Responses of neurons in monkey dorsal striatum to instructions influenced by reward. Visual stimulus presented at the beginning instructed the monkey's behavior (movement: touch a lever with its hand; no movement: maintain its hand on a resting key) and determined the reward outcome (presence or absence of reward). A: sustained response of a caudate neuron in rewarded movement trials. B: sustained response of a caudate neuron restricted to unrewarded movement trials. C: sustained response of a putamen neuron in both rewarded trial types, but absence of response in unrewarded movement trials. Each dot denotes the time of a neuronal impulse and each line of dots shows one trial. Different types of trials alternated semirandomly during the experiment, but are separated for analysis and rearranged according to instruction-trigger intervals [from Hollerman et al. (1998)Go].

 
Similar behavioral tasks have been applied to human imaging studies, typically using monetary rewards. Many of these studies have revealed higher activation of the ventral striatum when higher amounts of reward are expected (Elliott et al. 2000Go; Knutson et al. 2001Go; Ullsperger and von Cramon 2003Go). Some of them also revealed activation of the dorsal striatum (Delgado et al. 2000Go; O'Doherty et al. 2002Go). In some cases significant activation was found preferentially in the dorsal striatum (Haruno et al. 2004Go).

Clearly, the reward-related processes are not localized in the ventral striatum but are prevalent in the entire striatum. One qualitative difference may be that many neurons in the dorsal striatum exhibit sensorimotor or cognitive activities and these activities are modulated by the nature of the expected reward (Cromwell and Schultz 2003Go; Kawagoe et al. 1998Go; Watanabe et al. 2003Go), whereas neurons in the ventral striatum are less selective for sensorimotor events (Schultz et al. 1992Go) and tend to occur just before or after the delivery of reward (Hollerman et al. 1998Go). Thus the dorsal striatum, rather than the ventral striatum, is a place where reward-related information is integrated into specific sensorimotor/cognitive information (O'Doherty et al. 2004Go).

Our main goal of this review is to understand the neuronal mechanism with which behavior is influenced by reward. In this sense the dorsal striatum seems a good choice because a specific body movement can be identified to which the particular part of the dorsal striatum is related. The effect of reward and its neural mechanism can be assessed by examining the reward-dependent changes in the movement as well as in neuronal activity. Simple hand movement can be used to study the reward-related mechanism in the putamen because the putamen is known to control skeletomotor behaviors (DeLong and Georgopoulos 1981Go). Many studies have shown that both reaction time and movement time are significantly shorter when a big reward is expected than when a small reward is expected (Cromwell and Schultz 2003Go; Hollerman et al. 1998Go; Minamimoto et al. 2005Go). On the other hand, saccadic eye movement is a good behavioral measure to study the reward-related mechanism in the caudate nucleus (Takikawa et al. 2002bGo) because the caudate is known to control saccadic eye movement (Hikosaka et al. 2000Go).

In this review we focus on the mechanism of saccadic eye movement and use it as a behavioral measure of the effects of expected reward. It has been well documented that the neural circuits involving the caudate nucleus and the substantia nigra pars reticulata (SNr) exert a powerful control over the generation of saccadic eye movement (Hikosaka et al. 2000Go). Another advantage of using saccadic eye movement is that its mechanism in the brain stem has been studied perhaps more extensively than any other motor behavior (Sparks 2002Go). Furthermore, an important component of reward-modulated behavior is orienting movement (Swanson 2000Go) and an important component of orienting movement is saccadic eye movement (Hayhoe and Ballard 2005Go). The animal must orient its eye, head, and body to the location where reward is available before procuring it (Ewert 1980Go). Saccadic eye movement is particularly important for humans (Johansson et al. 2001Go; Triesch et al. 2003Go) and monkeys (Miyashita et al. 1996Go). As will be shown herein, the initiation of saccadic eye movement is clearly facilitated or suppressed depending on the reward outcome.

BASAL GANGLIA CONTROL SACCADIC EYE MOVEMENTS WITH INHIBITORY CONNECTIONS

In this section we describe the neuronal network in the basal ganglia that controls saccadic eye movement. The basal ganglia control saccadic eye movement with the GABAergic inhibitory connection from the SNr to the superior colliculus (SC) (Fig. 2A). SNr neurons project predominantly to the intermediate layer of the SC (Chevalier et al. 1984Go; Graybiel 1978Go; Hikosaka and Wurtz 1983bGo; Jayaraman et al. 1977Go; May and Hall 1984Go) and have synaptic connections with saccadic burst neurons or buildup neurons (Karabelas and Moschovakis 1985Go). Because most SNr neurons are spontaneously very active, firing tonically and rapidly, SC saccadic neurons are kept inhibited by the output of the basal ganglia (Fig. 2B). Importantly, a majority of SC-projecting SNr neurons decrease or cease firing before a saccade directed to the contralateral side (Hikosaka and Wurtz 1983bGo; Joseph and Boussaoud 1985Go). This leads to a decrease of inhibition (disinhibition) on SC saccadic neurons (Chevalier et al. 1985Go) and gives a strong thrust for SC neurons to fire intensely (Fig. 2B).


Figure 2
View larger version (18K):
[in this window]
[in a new window]
 
FIG. 2. Basal ganglia mechanism for control of saccadic eye movement. A: a simplified view of saccade control system. Many areas in the cerebral cortex send direct excitatory connections to the superior colliculus (SC), promoting the initiation of saccades. Basal ganglia system, which consists of the caudate nucleus (CD), substantia nigra pars reticulata (SNr), and other nuclei (not shown), receives inputs from the cerebral cortex and controls the SC. Excitatory and inhibitory neurons are indicated by open and filled circles, respectively. B: a cardinal saccade mechanism in the basal ganglia. SC is normally inhibited by rapid firing of GABAergic neurons in the SNr. Tonic inhibition can be interrupted by GABAergic inputs from the caudate nucleus. This disinhibition, together with excitatory cortical inputs, allows SC neurons to fire in burst, which leads to a saccade to a contralateral location. Note that there are other parallel pathways in the basal ganglia that may act antagonistically to the disinhibitory pathway shown here.

 
However, the disinhibition alone may not be sufficient to produce a saccadic burst in SC neurons. It is likely that the SNr-induced inhibition acts as a gate for saccadic outputs from the SC. Saccadic SC neurons receive excitatory inputs from many brain areas (Sparks 1986Go; Wurtz et al. 2001Go) including the cerebral cortex, cerebellum, and brain stem. Inputs from the cerebral cortex originate mainly from the so-called cortical eye fields including the frontal eye field (FEF) (Segraves and Goldberg 1987Go; Sommer and Wurtz 2000Go), supplementary eye field (SEF) (Shook et al. 1991Go), and parietal eye field (PEF) in the area LIP (Paré and Wurtz 2001Go). Notably, all these areas contain neurons that fire before a saccade.

In short, SC saccadic neurons are controlled by excitatory and inhibitory inputs from multiple sources. Before saccade, SC neurons receive excitatory inputs from the cortical eye fields while they are freed from the tonic inhibition from SNr neurons (Fig. 2A). The removal of the tonic inhibition (disinhibition) effectively opens the gate for the saccadic output.

SNr neurons cease firing before a saccade because they are inhibited by neurons in the caudate nucleus (CD) (Fig. 2B). Output neurons in the basal ganglia, which are located in the SNr and the internal segment of the globus pallidus (GPi), are innervated directly by projection neurons in the striatum (caudate and putamen) (François et al. 1994Go). This direct innervation is also GABAergic and inhibitory (Fonnum et al. 1978Go; Kita 1993Go; Yoshida and Precht 1971Go). Projection neurons in the caudate are characterized anatomically as medium-spiny neurons (Preston et al. 1980Go). They have deep membrane potentials at rest and exhibit action potentials only occasionally (Wilson and Kawaguchi 1996Go). In the caudate of awake animals, neurons with very low spontaneous activity are considered to be projection neurons (Hikosaka et al. 1989aGo; Kimura et al. 1990Go). Many of them exhibit a burst or a train of spikes in relation to saccadic eye movements, but otherwise are nearly silent (Hikosaka et al. 1989aGo). Electrical stimulation of the saccade-related area in the caudate induces an inhibition of saccade-related SNr neurons (Hikosaka et al. 1993bGo). These results suggest that the saccade-related decrease in firing in SNr neurons is caused by the saccade-related increase in firing of caudate neurons.

The main inputs to the caudate originate from various areas in the association cortex (Selemon and Goldman-Rakic 1985Go; Yeterian and Pandya 1991Go) and the thalamic nuclei related to the oculomotor areas (Takada et al. 1985Go). Saccade-related neurons are found in the part of the caudate that receives inputs mainly from the FEF (Stanton et al. 1988Go), SEF (Parthasarathy et al. 1992Go; Shook et al. 1991Go), and the dorsolateral prefrontal cortex (Selemon and Goldman-Rakic 1985Go; Yeterian and Pandya 1991Go). Therefore saccade-related activity of caudate neurons is probably caused by excitatory inputs from these cortical areas.1

BASAL GANGLIA GUIDE SACCADE TO WHERE REWARD IS AVAILABLE

A feature common to caudate and SNr neurons is that their activity is often strikingly dependent on the behavioral context. For example, some caudate neurons respond to a visual stimulus only when its position must be remembered (Hikosaka et al. 1989bGo); other caudate neurons fire before a saccade only when it is guided by memory (Hikosaka et al. 1989aGo). Such strong context dependency is also found in SNr neurons (Hikosaka and Wurtz 1983aGo). Another feature is that many caudate neurons fire tonically in an anticipatory manner before an expected task-related event occurs, such as before the onset of an expected target or the delivery of an expected reward (Hikosaka et al. 1989cGo; Hollerman et al. 1998Go; Nishino et al. 1984Go; Rolls et al. 1983Go). "Reward" turned out to be a key factor that characterizes the information processing in the basal ganglia, as described below.

To examine the effect of reward on saccadic eye movement, we used saccade tasks in which the amount of reward is unequal among possible target positions. We chose position as a cue for reward for two reasons. First, the goal of saccadic eye movement is to localize an object in space (Hayhoe and Ballard 2005Go; Johansson et al. 2001Go). Second, when an animal forages for food, the most crucial behavior is to localize a place where the food is available (Swanson 2000Go). Positional cues have widely been used in learning tasks such as conditioned place preference task (Carr and White 1983Go; Everitt et al. 1991Go; Spyraki et al. 1982Go).

In our saccade tasks the target was presented randomly at one out of two or four directions, but only one direction was associated with a big reward, whereas the others were associated with a small or no reward (Fig. 3A). The big-reward direction was fixed in a block of 20–60 trials and is changed in the next block (Fig. 3B). Let us call this 1DR (one-direction rewarded) saccade task. We used visual and memory versions. In the visual-1DR task, the monkey makes a saccade to the target immediately after its onset (Fig. 3A, top) (Lauwereyns et al. 2002bGo). In the memory-1DR task, the target position was cued and the monkey has to make saccade to the cued position after a time delay based on memory (Fig. 3A, bottom) (Kawagoe et al. 1998Go).


Figure 3
View larger version (24K):
[in this window]
[in a new window]
 
FIG. 3. Saccade tasks with positional bias of reward that we call one direction-rewarded (1DR) saccade tasks. A: 2 versions: visual-1DR task (top) and memory-1DR task (bottom). In both tasks the monkey first fixates at the central spot of light and then makes a saccade to the target after the fixation point goes off. Target is chosen pseudorandomly out of 2 directions (as shown here) or 4 directions. In the visual-1DR task (top) the target comes on at the same time as the fixation point goes off and thus the saccade is made to the visible target. In the memory-1DR task (center) the target is illuminated briefly (target cue) while the monkey is fixating and the monkey must withhold saccade until the fixating point goes off; thus the saccade is made to the remembered target. Within a block of 20–60 trials of 1DR tasks, the saccade to one particular direction is followed by a large amount of reward, whereas the saccade to any of the other directions is followed by a small amount of reward or no reward. Even for the small or no reward direction, the monkey must make a saccade correctly to the target; otherwise, the trial is repeated until the saccade is made correctly. B: one set of experiments consists of several blocks of trials during which the big-reward direction is alternated (in the 2-direction condition, as shown here as R and L, which are indicated in A) or randomized (in the 4-direction condition). C: changes in saccade latency with the changes in the reward condition in the visual-1DR task [from Watanabe and Hikosaka (2005)Go]. Mean saccade latencies in one monkey are plotted across trials in 2 blocks in which saccades were followed by small and big rewards, respectively. Two cycles are shown repeated to facilitate visual impression.

 
Saccadic parameters changed dramatically in 1DR tasks. In the visual-1DR task, latencies were much shorter when saccades were followed by a big reward than when they were followed by a small reward (Lauwereyns et al. 2002bGo; Watanabe and Hikosaka 2005Go). In the schematic example shown in Fig. 3A, top, the latencies of rightward saccades are shorter in the reward condition R (rightward saccades associated with bigger rewards than leftward saccades) than in the condition L (leftward saccades associated with bigger rewards). Conversely, the latencies of leftward saccades are shorter in condition L than in condition R. During the experiment the reward conditions R and L were alternated usually every 20 trials with no external instruction (Fig. 3B), and the saccade latency changed reliably (Fig. 3C). One interesting finding common to all monkeys tested was that saccade latency decreased quickly during a small-to-big reward transition and increased more slowly during a big-to-small reward transition (Fig. 3C) (Watanabe and Hikosaka 2005Go). Similarly, in the memory-1DR task, latencies were shorter and peak velocities were higher when the saccades were big-rewarded than when small-rewarded (Takikawa et al. 2002bGo). The variations of saccadic velocity, latency, and amplitude were smaller in the big-reward condition (Takikawa et al. 2002bGo). Thus saccadic eye movement provides a solid and reliable behavioral measure to study the neural mechanisms of reward-oriented behavior. A series of experiments, reviewed below, suggest that the basal ganglia play a key role in the reward-dependent modulation of saccades. 2

In the visual-1DR task many caudate projection neurons exhibit tonic activity (Fig. 4) (Lauwereyns et al. 2002bGo). In each trial a central spot came on and the monkey had to fixate it. The caudate neurons then started firing (Fig. 4A). The firing rate increased gradually until the fixation point went off (and a target appeared). A unique feature of the activity was that it occurred selectively or preferentially when one particular position (usually contralateral to the recording site) was rewarded (Takikawa et al. 2002aGo). Among caudate neurons that exhibited precue tonic activity, a majority showed the selectivity to the reward position. This kind of position selectivity is different from visual receptive field because the neuronal activity occurs before any target comes on. It is different from saccade movement field (Sparks et al. 1976Go; Wurtz and Goldberg 1972Go) or memory field (Funahashi et al. 1989Go; Williams and Goldman-Rakic 1995Go) because saccade direction is not specified during the fixation period. This reward-position–selective activity had not been reported in other brain areas, but subsequently was found in the SNr (Sato and Hikosaka 2002Go) and SC (Ikeda and Hikosaka 2003Go), which receive inputs from the caudate.


Figure 4
View larger version (33K):
[in this window]
[in a new window]
 
FIG. 4. Pretarget activity of a caudate projection neuron selective for reward position. A: action potentials of a left caudate neuron (shown by dots in a raster display) while the monkey performed the visual-1DR task. Raster is aligned on the onset of the fixation point (fix on) and the onset of the target (target on). Changes in stimulus configuration are shown at top. After the fixation period of 1.5 s, a target came on at a right or left position pseudorandomly and the monkey had to make a saccade to it. In the first block of 20 trials (indicated by a pink bar) a reward (water) was given only if the target was presented on the right and the saccade was made to it. In the second block a reward was given only if the target was on the left (indicated by a green bar). These 2 types of blocks were subsequently repeated. Caudate neuron exhibited sustained activity before target onset while the monkey was fixating at the center spot, although the activity was much stronger when the reward was expected after saccade to the right target (which was contralateral to the neuron). B: correlated changes of pretarget activity of caudate neurons (top) and saccade latency (bottom). Mean pretarget activity of caudate neurons (top) was higher in the block of trials when the reward was expected on the contralateral position (indicated by blue dots) than on the ipsilateral position (red dots). Mean latency of contralateral saccades (bottom), which the caudate neurons would preferentially control, was shorter when they were preceded by higher pretarget activity and when they were followed by reward [from Lauwereyns et al. (2002)].

 
What is the function of the reward-position–selective activity? A simple "thought experiment " using the disinhibition scheme indicates that the activity influences saccade behavior profoundly (Fig. 5). It is known that the caudate–SNr connection is ipsilateral (Tulloch et al. 1978Go), the SNr–SC connection is predominantly ipsilateral (Beckstead et al. 1981Go), and the SC–brain stem connection is predominantly contralateral (Grantyn et al. 1979Go). Therefore a change in caudate neuronal activity affects contralateral saccades predominantly (Itoh et al. 2003Go). Figure 5 shows the caudate–SNr–SC pathways in both hemispheres.


Figure 5
View larger version (18K):
[in this window]
[in a new window]
 
FIG. 5. A hypothetical scheme showing how pretarget activity of caudate neurons might determine saccade bias toward reward position. Caudate–SNr–SC connections in the right and left hemispheres are shown to visualize the contrast in their excitability. Red circles and lines with (+): excitatory neurons; blue circles and lines with (–): inhibitory neurons. Thickness of the line represents the firing rate of the neuron. Color saturation of the circle represents the excitability of the neuron. Top: pretarget period. If a reward is expected at a right position (shown by an apple), left caudate neurons become active and right caudate neurons are inactive, which occurs while the monkey is fixating. On the left side of the brain, SNr neurons are inhibited and thus SC neurons are disinhibited; on the right side, SNr neurons remain active and thus SC neurons are inhibited. Bottom: posttarget period. Bottom, left: if a target then appears on the left, cortical areas on the right side are activated (indicated by yellow) and this signal is sent to right SC neurons with excitatory connections. Because the right SC neurons have been tonically inhibited by SNr neurons, cortical inputs can only slowly activate the SC neurons and thus the saccade to the left (nonreward position) is delayed. Bottom, right: if a target appears on the right, cortical areas on the left side are activated and this signal is sent to left SC neurons. Because the left SC neurons have been freed from the SNr-induced inhibition, cortical inputs can quickly activate the SC neurons and thus the saccade to the right (reward position) is quickened.

 
Suppose the animal is engaged in the visual-1DR task in which a big reward is associated with saccades to the right (Fig. 5, top). First, consider a left caudate neuron with typical reward-position–selective activity (i.e., preferring a big reward on the contralateral side). The caudate neuron should be very active because the big-reward side (i.e., right) is contralateral to the neuron. This signal is transmitted to the left SNr, suppressing its neuronal activity (Sato and Hikosaka 2002Go). The decrease in activity of SNr neurons then leads to a disinhibition of left SC neurons (Ikeda and Hikosaka 2003Go). The SC neurons should then be more excitable, although they may not fire. Second, consider a right caudate neuron with typical reward-position–selective activity. Because the big-reward side (i.e., right) is ipsilateral to the neuron, the caudate neuron should be less active. SNr neurons on the right side are then not inhibited and consequently right SC neurons are kept inhibited by SNr neurons. What we see here is a strong bias in excitability between SC neurons on the left and right sides.

Note that this bias occurs before any motor instruction (i.e., target) comes in (Fig. 5, top). If a target appears on the right side (white dot in Fig. 5, bottom, right), the visual signal is registered mainly in the left cortical areas (yellow circle), which then activates the left SC. Because the excitability of the left SC neurons has been elevated by the reward-position–selective activity derived from the left caudate, the SC neurons respond quickly to the cortical input and consequently a saccade occurs quickly to the right target. This is where a big reward is available. In contrast, if a target appears on the left side (Fig. 5, bottom, left), the visual signal is registered in the right cortical areas, which then activates the right SC. Because the excitability of the right SC neurons has remained depressed, they respond slowly to the cortical input and consequently a saccade occurs slowly to the left target where a small or no reward is available. This is exactly what happened in the visual-1DR task (Fig. 4B).

When the big-reward side is changed to the left in the next block of 1DR, the situation is completely reversed. Right caudate neurons exhibit higher reward-position–selective activity and therefore left saccades occur quickly, whereas left caudate neurons exhibit weaker reward-position–selective activity and therefore right saccades occur slowly.

These data suggest that the reward-position–selective activity in caudate neurons contributes to the emergence of the reward-dependent bias of saccade latency, that is, quicker saccade to a big reward and slower saccade to a small reward. This hypothesis is supported by the findings indicating that the firing rates of reward-dependent caudate neurons are correlated with saccade velocity and latency (Itoh et al. 2003Go; Watanabe et al. 2003Go). The causal role of caudate neurons is supported by our recent experiment showing that the reward-dependent bias of saccade latency is attenuated by the blockade of dopamine-D1 receptors in the caudate (Nakamura and Hikosaka 2004Go).

NEURONAL ACTIVITY IN THE BASAL GANGLIA IS MODULATED BY EXPECTED REWARD

In the preceding sections we have indicated that caudate neurons create inclination of saccades to the rewarded position before any instruction is given. In many situations, there is also a preparatory process before an action (saccade) during which some instructions or cues are presented to indicate what kind of movement is required or how rewarding the movement will be. To examine the effect of reward expectation on the preparatory process, we modified the memory-guided saccade task (Hikosaka and Wurtz 1983aGo) such that only one particular direction is associated with a big reward. We call it the memory-1DR task (Fig. 3A, bottom). Because the action (saccade to the cue position) as well as the reward outcome becomes predictable at the presentation of the target cue, it is very important to study the neuronal responses to the cue. If neurons are related to the preparation of reward-oriented action, their responses to the target cue should change depending on the expected reward outcome. Indeed, many caudate and SNr neurons exhibit such reward-predictive responses.

Kawagoe et al. (1998)Go indeed found that visual responses of caudate projection neurons to the cue were strongly modulated by the expected reward, as exemplified in Fig. 6A. The neuron responded most strongly to the cue on the left (L), which was contralateral to the neuron, when the reward was given equally for different target positions (Fig. 6A, rightmost column, indicated by ALL). However, the neuron's responses to the cues changed dramatically in four blocks of the memory-1DR task (Fig. 6A, left four columns), in each of which reward was given for only one direction out of four. Compared with the case where an equal amount of reward was given for all four targets (ALL), the neuron's response in the 1DR task was enhanced when the cue indicated an upcoming reward (e.g., cued direction: U; rewarded direction: U), whereas it was depressed when the cue indicated no upcoming reward (e.g., cued direction: U; rewarded direction: R, L, or D). The reward modulation was so strong that the neuron's original direction selectivity was shifted or even reversed. Overall, statistically significant modulation by the expected reward was observed in about 80% of visually responsive caudate neurons.


Figure 6
View larger version (52K):
[in this window]
[in a new window]
 
FIG. 6. Responses of a right caudate neuron (A) and a left dopamine neuron (B) to visual cue in the memory-1DR and memory-ADR tasks. In one block of the memory-1DR task, only one particular direction was associated with reward. Because 4 target directions were used (unlike in Fig. 4), the experiment was done in 4 blocks, which are shown in 4 left columns with different reward directions (indicated by R, U, L, D). In one block of memory-ADR task (indicated by ALL), all directions were associated with reward. Neurons were recorded from the right caudate and the left substantia nigra pars compacta (SNc). In the histogram/raster display (bin width: 20 ms), the neuronal discharge aligned on cue onset is shown separately for different cue directions; the cue directions were pseudorandomized. Reward direction is indicated by a dotted circle. Caudate neuron responded to the left (L) cue most strongly in ADR (ALL), but the response was greatly enhanced and depressed in 1DR when the cue indicated reward and no reward, respectively. Dopamine neuron showed no response to any cue in ADR. In 1DR, it responded to the cue with an excitation if it indicated an upcoming reward and with an inhibition if it indicated no reward. Horizontal and vertical bars at right-bottom in each A and B indicate 1s and 50 spikes/s. [A from Kawagoe et al. (1998)Go.]

 
Similar reward-dependent modulation was found among SNr neurons that usually fire tonically and rapidly (Sato and Hikosaka 2002Go). About half of the task-related SNr neurons exhibited decreases in activity, whereas the other half exhibited increases in activity. The decrease in activity is likely to be caused by the direct GABAergic action of caudate neurons (Fig. 2). The increase in activity may arise from the indirect effect mediated by GPe neurons and possibly by subthalamic nucleus (STN) neurons as well (not shown). The decreasing type neurons were usually selective in space (responding to contralateral cues) and in reward (activity decreased strongly in the rewarded trials) than the increasing type neurons. These results are in agreement with the hypothesis that the basal ganglia contribute to the selection of action by removing the tonic inhibition selectively on the action to be selected (using the direct pathway) while increasing the inhibition on a wide variety of actions (using the indirect pathway) (Mink 1996Go).

How have the caudate and SNr neurons come to acquire activity dependent on expected reward? One possibility is that caudate neurons receive signals from other brain areas that have already been modulated by expected reward, as described in the INTRODUCTION. A second possibility is that the reward modulation first occurs in the basal ganglia. In this case, the cerebral cortex would receive reward-modulated signals from the basal ganglia through the thalamus (Middleton and Strick 2000Go). A recent study by Pasupathy and Miller (2005)Go seems consistent with this idea. In the following we will examine the hypothesis that the reward modulation occurs, at least partly, in the basal ganglia depending on the inputs from dopamine neurons.

The caudate is heavily innervated by axons releasing "modulators" such as dopamine (DA) and acetylcholine (ACh), which might carry reward-related signals (Chiara et al. 1994Go; Graybiel 1990Go). That dopamine is essential for normal functions of the basal ganglia is demonstrated by the fact that Parkinson's disease is caused by the loss of dopamine innervation in the striatum (putamen and caudate) (Selby 1968Go). A pure dopamine deficit can be induced experimentally by MPTP (1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine) and this model has been used widely. Human patients with MPTP-induced parkinsonism showed severe deficits in saccadic eye movements (Hotson et al. 1986Go). After a local and unilateral injection of MPTP into the caudate, monkeys exhibited contralateral hemineglect (Miyashita et al. 1995Go) and eye movements during visual search became restricted in the hemifield ipsilateral to the MPTP injection (Kato et al. 1995Go). Moreover, saccades became infrequent, small in amplitude, and slow; the visual search area became smaller; and the duration of eye fixation became longer. These results indicate that dopamine in the caudate is capable of modulating saccadic eye movements, but do not indicate in what context it does so.

The relation of dopamine to reward was originally proposed based on the "self-stimulation" experiments (Olds and Milner 1954Go). If an electrode is placed in a particular part of the brain and the animal itself is allowed to stimulate the part of the brain by pressing a key, the animal may press the key continually and obsessively. This effect is stronger if the electrode is placed closer to the ascending axon bundle of midbrain dopamine neurons (Corbett and Wise 1980Go). The effect is blocked if dopamine transmission is blocked in the striatum by injecting a dopamine antagonist (Mora et al. 1976Go; Phillips et al. 1976Go). A second line of evidence comes from the studies on drug addiction (Berke and Hyman 2000Go; Wise 1996Go). A variety of drugs, which induce addiction, effectively increase the extracellular concentration of dopamine in the striatum (Berke and Hyman 2000Go), especially the ventral striatum or nucleus accumbens.

More direct evidence for the relationship of dopamine to reward is based on single-unit studies on midbrain dopamine neurons in trained animals. Schultz and colleagues demonstrated that dopamine neurons respond to the delivery of reward (Schultz et al. 1993Go). A key finding was that the response was correlated with the difference between the expected reward and the actual reward. Thus the response to a reward is stronger if it is not expected (Mirenowicz and Schultz 1994Go). The response is negative (i.e., a decrease in firing) if the expected reward is not given (Hollerman and Schultz 1998Go). In short, dopamine neurons appear to encode "reward prediction error" (Schultz 1998Go). This signal corresponds nicely to a principal factor in learning theories that account for classical conditioning (Rescorla and Wagner 1972Go) as well as in reinforcement learning theory, which was developed more recently (Barto 1995Go; Dayan and Balleine 2002Go; Houk et al. 1995Go; Schultz and Dickinson 2000Go) (see RELATION TO REINFORCEMENT LEARNING THEORIES).

Dopamine neuronal activity in the memory-1DR task followed this principle (Kawagoe et al. 2004Go). In an animal trained extensively in 1DR, dopamine neurons responded to the cue with a phasic increase in firing if the cue indicated an upcoming reward; they responded to the cue with a phasic decrease in firing if the cue indicated no reward (Fig. 6B). Dopamine neurons showed no spatial selectivity: they responded to the cue at any position equally as long as it indicates an upcoming reward or as long as it indicates no reward. Dopamine neurons exhibited no response to the cue when all positions were equally rewarded (Fig. 6B, ALL). Dopamine neurons thus represented the difference between the expected value of reward cue and the actual value of reward cue, consistent with an extension of the "reward prediction error" theory (Tobler et al. 2003Go). Before the cue is presented, the likelihood of obtaining reward in the current trial is 25% (1 out of 4), but it increases to 100% after the reward-indicating cue, but decreases to 0% after a nonreward-indicating cue. Dopamine neurons' "positive" and "negative" responses are correlated respectively with the increase and decrease in reward prediction. If all positions are equally rewarded, the likelihood of reward is 100% before the cue and this is not changed by the presentation of the cue; thus dopamine neurons show no response.

Here we see an intriguing relationship between caudate projection neurons and dopamine neurons (Fig. 6) (Kawagoe et al. 2004Go). In both caudate and dopamine neurons, the visual responses in 1DR task are strongly modulated by the expected reward. They are different in that caudate projection neurons, not dopamine neurons, show spatial selectivity. These results suggest mutual interactions between caudate projection neurons and dopamine neurons. In the next section we first consider the influence of dopamine activity on caudate activity and later the influence of caudate activity on dopamine activity.

POSSIBLE MECHANISM OF REWARD-BASED LEARNING IN THE CAUDATE

As mentioned earlier, the saccade-related neurons in the caudate receive inputs from the cortical eye fields and dorsolateral prefrontal cortex where neurons typically exhibit spatially selective visual or saccadic activities. Interactions between spatial signals from the cerebral cortex and reward-related signals from dopamine neurons may induce reward-modulated spatial signals in caudate neurons.

Physiological studies have suggested that dopamine inputs can modify cortically induced excitatory postsynaptic potentials (EPSPs) (Reynolds and Wickens 2002Go). It could occur instantaneously by changing the properties of many voltage-dependent ion channels (Nicola et al. 2000Go) or by modulating synaptic plasticity through the mechanisms of long-term potentiation (LTP) or long-term depression (LTD) (Calabresi et al. 1996Go; Lovinger et al. 2003Go; Mahon et al. 2004Go; Reynolds and Wickens 2002Go). Earlier in vitro studies on striatal projection neurons showed that LTD is common, whereas LTP is induced only when Mg is absent or dopamine is applied (Calabresi et al. 1992Go). More recent in vivo studies indicated that LTP is commonly induced if the recorded striatal projection neuron is depolarized (Charpier and Deniau 1997Go) and D1 dopamine receptors are activated (Reynolds et al. 2001Go). LTD is also induced in vivo if dopamine receptors are not activated (Reynolds and Wickens 2000Go). These findings can be interpreted by the following rules as previously proposed (Schultz 1998Go; Wickens and Kötter 1995Go), although there may be other interpretations.

RULE 1. The synaptic efficacy of the cortical input to a caudate neuron increases with LTP every time the cortical input elevates the caudate neural activity in conjunction with an increase in the dopamine input.

RULE 2. The synaptic efficacy of the cortical input to a caudate neuron decreases with LTD every time the cortical input elevates the caudate neural activity in conjunction with a decrease in the dopamine input.

Let us consider the activity of a model caudate neuron that receives two kinds of inputs from the cerebral cortex carrying visual signals from the left (L) and right (R) cues, respectively (Fig. 7). The cortical inputs make synapses probably on different sets of dendritic spines (X and Y) on the caudate neuron because the innervation of single corticostriatal axons is extremely sparse (Zheng and Wilson 2002Go). On the other hand, dopamine neurons are likely to have synaptic contacts diffusely with both spine X and spine Y, as suggested by anatomy (Hedreen and DeLong 1991Go). Suppose that reward is associated with L cue, but not R cue (Fig. 7). If the target cue appears on the left in one trial, the visual signal is fed into the caudate neuron through spine X (Fig. 7, top, left, indicated by a thick red line), which induces firing in the caudate neuron. At the same time dopamine neurons in the SNc fire because the cue indicates an upcoming reward (indicated by a thick green line). Thus Rule 1 holds for spine X and thus the efficacy of the L–X synapse should be increased. If the target cue appears on the right, the visual signal is fed into the caudate neuron through spine Y (Fig. 7, top, right), but the dopamine input is reduced because the cue indicates no reward (indicated by a thin gray line). Thus Rule 2 holds for spine Y and thus the efficacy of R–Y synapse should be decreased.


Figure 7
View larger version (24K):
[in this window]
[in a new window]
 
FIG. 7. A hypothetical scheme showing reward-dependent plasticity of cortico-caudate synapses. In one condition of the memory-1DR task (1) caudate neuron receives 2 kinds of visual input (L and R) from the cerebral cortex at dendritic spines (X and Y); (2) input L represents a left visual cue signal, whereas input R represents a right visual cue signal; (3) input L indicates an upcoming reward, whereas input R indicates no reward; and (4) L and R inputs come in randomly one at a time. Dopamine neurons in the SNc, which are presumed to make synapses on caudate neurons nonselectively, respond to input L with a burst of spikes (top, left) and respond to input R with a pause of spikes (top, right). After several trials the efficacy of the L–X synapse increases (bottom, indicated by a larger red circle), whereas the efficacy of the R–Y synapse decreases (bottom, indicated by a smaller red circle). Activation of caudate neuron in response to the same input L is now enhanced (bottom, left, indicated by thick blue line), whereas activation in response to input R is diminished (bottom, right, indicated by thin light blue line). Bias persists for some trials because of the plasticity, even when reward is associated equally with R and L inputs (bottom, indicated by thin green lines). Bias will be reversed in the next block when input R, not input L, indicates an upcoming reward. Red lines: excitatory connections; blue circles and lines: inhibitory neurons; green circles and lines: dopamine neuron, which is modulatory. Thickness of the line represents the firing rate of the neuron, except that the dotted green line indicates a decrease in dopamine neuron firing.

 
If trials are repeated in the same condition (rewarded on left, unrewarded on right), the L–X synapse is strengthened (indicated by a large red circle), whereas the R–Y synapse is weakened (indicated by a small red circle). Now, the same cue on the left induces a stronger response (Fig. 7, bottom, left, indicated by a thick blue line), whereas the same cue on the right induces a weaker response (Fig. 7, bottom, right, indicated by a thin light-blue line) in the caudate neuron. As this would require, gradual changes of visual responses were observed during several trials after the reward direction was changed in the memory-1DR task (Kawagoe et al. 1998Go). In short, the model caudate neuron changes the responses to the visual cues such that it responds to whichever cue indicates an upcoming reward (Fig. 7) as the actual caudate neuron did (Fig. 6A) (Nakahara et al. 2002Go).

Note that, unlike the neuron shown in Fig. 6A, there are caudate neurons that maintained their direction preference (usually contralateral) even when the reward direction is changed, although their selectivity is enhanced or depressed depending on the expected reward (Kawagoe et al. 1998Go). Suppose such a neuron is in the left caudate; it may then have input R exclusively or preferentially. The neuron's response to input R changes depending on the reward condition, but it may not respond to input L even if it indicates an upcoming reward. The contralateral preference of caudate neurons is thus maintained as a result of the asymmetry of cortical inputs.

What is not explained by the model is the presence of caudate neurons that respond to a no-reward–indicating cue more strongly than to a reward-indicating cue (Kawagoe et al. 1998Go). One solution would be to change the rules (Rules 1 and 2) such that the directions of plasticity are reversed; LTP and LTD may be deployed in the opposite conditions for these negatively modulated caudate neurons (Nakahara et al. 2002Go). Different dopamine receptor types might mediate opposite effects in different groups of caudate neurons. In fact, D1 and D2 receptors are expressed preferentially in direct-pathway neurons and indirect-pathway neurons, respectively (Gerfen et al. 1990Go; Surmeier et al. 1996Go). It is possible that the positively and negatively modulated caudate neurons correspond to the two types caudate neurons belonging to the direct and indirect pathways. These ideas remain to be tested experimentally.

The proposed model is supported by recent experiments conducted by in our laboratory. First, we found that electrical stimulation of the caudate can induce changes in saccade latency similar to those induced by reward modulation (Nakamura and Hikosaka 2003Go). While the monkey was engaged in a visually guided saccade task, the saccade-related area of the caudate was electrically stimulated only after a saccade and only when the saccade was directed to a particular position. After repeating the trials, the latency became gradually shorter for the saccade that was associated with electrical stimulation and the effect remained even after the stimulation was stopped. Second, the reward modulation of saccade behavior can be changed by blockade of dopamine receptors (Nakamura and Hikosaka 2004Go). A D1 antagonist or a D2 antagonist was injected into the region of the caudate where saccade-related neurons are clustered while the monkey performed the visual-1DR task. D1 antagonist attenuated but D2 antagonist enhanced the reward modulation of saccade behavior. These results suggest that 1) the reward modulation of saccadic eye movement, at least partly, originates from the caudate; 2) the dopamine input to caudate projection neurons is responsible for the reward modulation; and 3) the dopamine effect is mediated by D1 and D2 receptors in differential manners.

Thus far we have discussed how dopamine inputs might modify the cortical inputs to caudate projection neurons. However, the completely opposite relationship might also be present: Activity of dopamine neurons may be modulated by signals from caudate neurons. A majority of caudate projection neurons project to the substantia nigra (Lynd-Balta and Haber 1994bGo; Parent and Hazrati 1994Go). Although their axons make synapses mainly on GABAergic neurons in the pars reticulata (SNr), some make synapses on dopamine neurons in the pars compacta (SNc) and the overlying tegmental area (Hedreen and DeLong 1991Go). Furthermore, many SNr neurons emit axon collaterals that then make synapses on nearby dopamine neurons (Tepper et al. 1995Go; Van den Pol et al. 1985Go). It is then likely that the reward-modulated activity of caudate neurons modulates activity of dopamine neurons. Because both caudate projection neurons and SNr neurons are GABAergic, the direct caudate–dopamine effect is inhibitory, whereas the indirect caudate–SNr–dopamine effect is disinhibitory. In addition, signals from the caudate to dopamine neurons may be transmitted through the indirect pathways involving the GPe and STN (Iribe et al. 1999Go; Smith et al. 1998Go). The responses of dopamine neurons to reward-predicting and no-reward–predicting stimuli, which are a phasic increase and a phasic decrease in firing, might be explained by these caudate-originated signals.

Note, however, that dopamine neurons in less experienced monkeys continue to respond to rewards, but not to their predictors, in the memory-1DR task. The response to the predictor (cue) starts to appear after the monkey experiences many sessions of the memory-1DR task, but initially very slowly within a given block (Takikawa et al. 2004Go). These results suggest that the dopamine neuronal response to predictor is acquired with long-term learning. We speculate that the caudate contributes to the learning in dopamine neurons or send learned signals to dopamine neurons through the direct and indirect connections described above.

It should be noted, however, that the mutual relationship between caudate and dopamine neurons alone may not fully account for the emergence of the reward-predicting information. In fact, both caudate and dopamine neurons receive inputs from many brain areas outside the basal ganglia. For example, caudate neurons receive inputs from the thalamus, such as the parafascicular nucleus (Sadikot et al. 1992Go), mediodorsal nucleus (Parent et al. 1983Go), and nucleus ventralis anterior (Nakano et al. 1990Go), in addition to the cerebral cortical areas. Dopamine neurons receive inputs from the lateral hypothalamus (Fadel and Deutch 2002Go), amygdala (Fudge and Haber 2000Go; Lee et al. 2005Go), nucleus accumbens (Lynd-Balta and Haber 1994bGo), pedunculopontine nucleus (Kobayashi et al. 2002bGo; Lavoie and Parent 1994Go; Pan and Hyland 2005Go), superior colliculus (Dommett et al. 2005Go), bed nucleus of stria terminalis (Fudge and Haber 2001Go; Georges and Aston-Jones 2001Go), and habenula (Christoph et al. 1986Go). The reward-predicting information might be transmitted from these areas to the basal ganglia. Alternatively, it might be generated in the basal ganglia based on different kinds of information derived from these areas.

RELATION TO REINFORCEMENT LEARNING THEORIES

A prominent problem in reinforcement learning is that a reward may occur after some time delay, not immediately after a single action but after a sequence of several actions. To maximize reward acquisition in such a case, a mechanism must be devised to learn to account for such delayed rewards and choose a sequence of actions accordingly. A prominent model of reinforcement learning uses, as a learning signal, reward prediction error, which is the difference between the expected reward (with delayed rewards taken into account) and the actual reward (Sutton and Barto 1998Go). This model is called "temporal difference model" or "TD model," whereas the reward prediction error is called TD error. The TD model is particularly attractive because dopamine neuronal responses are found to be similar to the simulated TD errors (Hollerman and Schultz 1998Go; Mirenowicz and Schultz 1994Go; Waelti et al. 2001Go).

The TD hypothesis can also explain dopamine responses in the memory-1DR task (Fig. 6B). In this task, there were four possible target positions in each trial, one rewarded and the other three nonrewarded. Then, the reward probability was 1/4 before the target cue appeared. Once the cue came on, the reward probability became either 1 (after reward-indicating cue) or 0 (after no-reward-indicating cue). Reward prediction error was then either 3/4 or –1/4. Indeed, this was the time when dopamine neurons burst or stop firing. These results are consistent with the hypothesis that dopamine neurons represent TD error (Houk et al. 1995Go; Montague et al. 1996Go; Nakahara et al. 2002Go; Schultz et al. 1997Go; Suri and Schultz 1998Go).

Because TD error is computed roughly as the difference between the expected reward and the actual reward, different brain areas may provide dopamine neurons with the expected reward signal and the actual reward signal. As shown previously (Fig. 6A), activity of many caudate neurons is modulated strongly by expected reward and therefore may provide dopamine neurons with the expected reward signal. Although this hypothesis has been put forward by many researchers, the detailed comparison between caudate and dopamine neuronal responses in the memory-1DR task (Fig. 6) provides a stronger support for the hypothesis.

The TD model typically has two components, called actor and critic (Barto 1995Go). The critic mechanism learns to predict future reward outcome, whereas the actor mechanism learns to choose an action in each trial. TD error is computed after each rewarded or nonrewarded trial and both critic and actor mechanisms adjust, albeit gradually, their states by using the TD error. It has been hypothesized that the critic mechanism is composed of the mutual connections of caudate neurons and dopamine neurons, whereas the actor mechanism is composed of other caudate neurons that also receive dopamine inputs but project to the SNr or GPi and then influence motor outputs (Barto 1995Go; Houk 1995Go; Montague et al. 1996Go; Schultz et al. 1997Go). The "actor " role of the caudate has been suggested by human neuroimaging studies (O'Doherty et al. 2004Go; Tricomi et al. 2004Go). Support for this hypothesis are the findings in studies using saccade tasks: 1) activity of reward-dependent caudate neurons is correlated with parameters of saccadic eye movement in individual trials (Itoh et al. 2003Go; Watanabe et al. 2003Go) and 2) the reward-position–dependent activity of caudate neurons is likely to cause a bias of saccadic eye movement toward a reward position (as shown in Figs. 4 and 5).

Thus our experimental evidence and argument are largely consistent with the primary role of the basal ganglia in the TD model. Yet, we also see some departures as well. First, note that in the TD model the critic weighs most on the immediate reward and progressively less on rewards in the more distant future; a reward in a distant future is discounted (Schultz et al. 1997Go). This mechanism can adapt to many situations in which rewards come after various time delays. This is particularly true if the probabilistic distribution of the reward delays is unchanged across trials. In the real world, however, the availability of reward may change over time in a certain pattern, either deterministically or probabilistically. In this case, the same sensory input may indicate different reward probabilities depending on its position in the pattern. The TD model, in its original proposal, did not address this issue and is not ideal for adapting to such a change of reward availability.

Interestingly, we found that dopamine neurons are capable of adapting to such a probabilistic context (Nakahara et al. 2004Go) (Fig. 8). In most of the experiments of 1DR tasks, the target position was pseudorandomized such that the conditional probability of reward in the current trial is lower if reward has been given in a more recent trial (Fig. 8, A and B). It was found that dopamine neurons can discriminate the differences in the conditional probability of reward. Dopamine neurons' excitatory response to a reward-indicating cue is stronger if reward has been given in a more recent trial (Fig. 8C, black), which occurs because reward is less likely and therefore more "surprising" if it occurs soon after a rewarded trial. Conversely, dopamine neurons' inhibitory response to a no-reward–indicating cue is weaker if reward has been given in a more recent trial (Fig. 8, gray), a phenomenon that can be explained by dopamine neurons' coding of reward prediction error using the conditional probability of reward (Fig. 8, D and E). This indicates that dopamine neurons can gain access to rule or context beyond sensory input and can produce reward prediction error signals differentially even to the same sensory input. A different type of context dependency has recently been reported, that is, dopamine neurons can encode relative reward values even when the mean value is changed (Tobler et al. 2005Go).


Figure 8
View larger version (26K):
[in this window]
[in a new window]
 
FIG. 8. Contextual TD model. A: diagram of pseudorandomization schedule that determined the sequence of trials in the memory-1DR task (of 4 direction version). Cue position was chosen pseudorandomly: within each subblock of 4 trials, each of all 4 directions was chosen randomly but always once. This pseudorandom schedule introduced a specific probabilistic structure of the occurrence of a rewarded trial. One way to examine the specific probability structure is to consider the probability of reward, conditional to postreward trial number (PRN). PRN is the number of consecutive nonrewarded trials since the last rewarded trial, and is indicated in the diagram. B: probability of reward, conditional to PRN. Solid line indicates theoretical values, whereas dashed line indicates empirical values from the experiment. C: population averages of dopamine neuronal responses to the reward-indicating cue (black) and to the nonreward-indicating cue (gray) are shown with respect to PRN. Note that these dopamine neuronal responses are obtained from a monkey that had extensively experienced the task. D: schematic diagram of a contextual TD model. Contextual TD model can use context information to produce reward prediction, whereas conventional TD model (not shown) cannot use the contextual information. E: TD errors in the contextual TD model (solid lines) are shown for the reward-indicating cue (black) and the nonreward-indicating cue (gray); TD errors in the conventional TD model are shown by dashed lines [modified from Nakahara et al. (2004)Go].

 
Note, however, that the coding of conditional probability is acquired after the monkey has experienced many trials of the 1