Journal of Neurophysiology

Predictive Reward Signal of Dopamine Neurons

Wolfram Schultz

This article has a correction. Please see:


Schultz, Wolfram. Predictive reward signal of dopamine neurons. J. Neurophysiol. 80: 1–27, 1998. The effects of lesions, receptor blocking, electrical self-stimulation, and drugs of abuse suggest that midbrain dopamine systems are involved in processing reward information and learning approach behavior. Most dopamine neurons show phasic activations after primary liquid and food rewards and conditioned, reward-predicting visual and auditory stimuli. They show biphasic, activation-depression responses after stimuli that resemble reward-predicting stimuli or are novel or particularly salient. However, only few phasic activations follow aversive stimuli. Thus dopamine neurons label environmental stimuli with appetitive value, predict and detect rewards and signal alerting and motivating events. By failing to discriminate between different rewards, dopamine neurons appear to emit an alerting message about the surprising presence or absence of rewards. All responses to rewards and reward-predicting stimuli depend on event predictability. Dopamine neurons are activated by rewarding events that are better than predicted, remain uninfluenced by events that are as good as predicted, and are depressed by events that are worse than predicted. By signaling rewards according to a prediction error, dopamine responses have the formal characteristics of a teaching signal postulated by reinforcement learning theories. Dopamine responses transfer during learning from primary rewards to reward-predicting stimuli. This may contribute to neuronal mechanisms underlying the retrograde action of rewards, one of the main puzzles in reinforcement learning. The impulse response releases a short pulse of dopamine onto many dendrites, thus broadcasting a rather global reinforcement signal to postsynaptic neurons. This signal may improve approach behavior by providing advance reward information before the behavior occurs, and may contribute to learning by modifying synaptic transmission. The dopamine reward signal is supplemented by activity in neurons in striatum, frontal cortex, and amygdala, which process specific reward information but do not emit a global reward prediction error signal. A cooperation between the different reward signals may assure the use of specific rewards for selectively reinforcing behaviors. Among the other projection systems, noradrenaline neurons predominantly serve attentional mechanisms and nucleus basalis neurons code rewards heterogeneously. Cerebellar climbing fibers signal errors in motor performance or errors in the prediction of aversive events to cerebellar Purkinje cells. Most deficits following dopamine-depleting lesions are not easily explained by a defective reward signal but may reflect the absence of a general enabling function of tonic levels of extracellular dopamine. Thus dopamine systems may have two functions, the phasic transmission of reward information and the tonic enabling of postsynaptic neurons.


When multicellular organisms arose through the evolution of self-reproducing molecules, they developed endogenous, autoregulatory mechanisms assuring that their needs for welfare and survival were met. Subjects engage in various forms of approach behavior to obtain resources for maintaining homeostatic balance and to reproduce. One class of resources is called rewards, which elicit and reinforce approach behavior. The functions of rewards were developed further during the evolution of higher mammals to support more sophisticated forms of individual and social behavior. Thus biological and cognitive needs define the nature of rewards, and the availability of rewards determines some of the basic parameters of the subject's life conditions.

Rewards come in various physical forms, are highly variable in time and depend on the particular environment of the subject. Despite their importance, rewards do not influence the brain through dedicated peripheral receptors tuned to a limited range of physical modalities as is the case for primary sensory systems. Rather, reward information is extracted by the brain from a large variety of polysensory, inhomogeneous, and inconstant stimuli by using particular neuronal mechanisms. The highly variable nature of rewards requires high degrees of adaptation in neuronal systems processing them.

One of the principal neuronal systems involved in processing reward information appears to be the dopamine system. Behavioral studies show that dopamine projections to the striatum and frontal cortex play a central role in mediating the effects of rewards on approach behavior and learning. These results are derived from selective lesions of different components of dopamine systems, systemic and intracerebral administration of direct and indirect dopamine receptor agonist and antagonist drugs, electrical self-stimulation, and self-administration of major drugs of abuse, such as cocaine, amphetamine, opiates, alcohol, and nicotine (Beninger and Hahn 1983; Di Chiara 1995; Fibiger and Phillips 1986; Robbins and Everitt 1992; Robinson and Berridge 1993; Wise 1996; Wise and Hoffman 1992; Wise et al. 1978).

The present article summarizes recent research concerning the signaling of environmental motivating stimuli by dopamine neurons and evaluates the potential functions of these signals for modifying behavioral reactions by reference to anatomic organization, learning theories, artificial neuronal models, other neuronal systems, and deficits after lesions. All known response characteristics of dopamine neurons will be described, but predominantly the responses to reward-related stimuli will be conceptualized because they are the best understood presently. Because of the large amount of data available in the literature, the principal system discussed will be the nigrostriatal dopamine projection, but projections from midbrain dopamine neurons to ventral striatum and frontal cortex also will be considered as far as the present knowledge allows.


Functions of rewards

Certain objects and events in the environment are of particular motivational significance by their effects on welfare, survival, and reproduction. According to the behavioral reactions elicited, the motivational value of environmental objects can be appetitive (rewarding) or aversive (punishing). (Note that “appetitive” is used synonymous for “rewarding” but not for “preparatory.”) Appetitive objects have three separable basic functions. In their first function, rewards elicit approach and consummatory behavior. This is due to the objects being labeled with appetitive value through innate mechanisms or, in most cases, learning. In their second function, rewards increase the frequency and intensity of behavior leading to such objects (learning), and they maintain learned behavior by preventing extinction. Rewards serve as positive reinforcers of behavior in classical and instrumental conditioning procedures. In general incentive learning, environmental stimuli acquire appetitive value following classically conditioned stimulus-reward associations and induce approach behavior (Bindra 1968). In instrumental conditioning, rewards “reinforce” behaviors by strengthening associations between stimuli and behavioral responses (Law of Effect: Thorndike 1911). This is the essence of “coming back for more” and is related to the common notion of rewards being obtained for having done something well. In an instrumental form of incentive learning, rewards are “incentives” and serve as goals of behavior following associations between behavioral responses and outcomes (Dickinson and Balleine 1994). In their third function, rewards induce subjective feelings of pleasure (hedonia) and positive emotional states. Aversive stimuli function in opposite directions. They induce withdrawal responses and act as negative reinforcers by increasing and maintaining avoidance behavior on repeated presentation, thereby reducing the impact of damaging events. Furthermore they induce internal emotional states of anger, fear, and panic.

Functions of predictions

Predictions provide advance information about future stimuli, events, or system states. They provide the basic advantage of gaining time for behavioral reactions. Some forms of predictions attribute motivational values to environmental stimuli by association with particular outcomes, thus identifying objects of vital importance and discriminating them from less valuable objects. Other forms code physical parameters of predicted objects, such as spatial position, velocity, and weight. Predictions allow an organism to evaluate future events before they actually occur, permit the selection and preparation of behavioral reactions, and increase the likelihood of approaching or avoiding objects labeled with motivational values. For example, repeated movements of objects in the same sequence allow one to predict forthcoming positions and already prepare the next movement while pursuing the present object. This reduces reaction time between individual targets, speeds up overall performance, and results in an earlier outcome. Predictive eye movements ameliorate behavioral performance through advance focusing (Flowers and Downing 1978).

At a more advanced level, the advance information provided by predictions allows one to make decisions between alternatives to attain particular system states, approach infrequently occurring goal objects, or avoid irreparable adverse effects. Industrial applications use Internal Model Control to predict and react to system states before they actually occur (Garcia et al. 1989). For example, the “fly-by-wire” technique in modern aviation computes predictable forthcoming states of airplanes. Decisions for flying maneuvers take this information into account and help to avoid excessive strain on the mechanical components of the plane, thus reducing weight and increasing the range of operation.

The use of predictive information depends on the nature of the represented future events or system states. Simple representations directly concern the position of upcoming targets and the ensuing behavioral reaction, thus reducing reaction time in a rather automatic fashion. Higher forms of predictions are based on representations permitting logical inference, which can be accessed and treated with varying degrees of intentionality and choice. They often are processed consciously in humans. Before the predicted events or system states occur and behavioral reactions are carried out, such predictions allow one to mentally evaluate various strategies by integrating knowledge from different sources, designing various ways of reaction and comparing the gains and losses from each possible reaction.

Behavioral conditioning

Associative appetitive learning involves the repeated and contingent pairing between an arbitrary stimulus and a primary reward (Fig. 1). This results in increasingly frequent approach behavior induced by the now “conditioned” stimulus, which partly resembles the approach behavior elicited by the primary reward and also is influenced by the nature of the conditioned stimulus. It appears that the conditioned stimulus serves as a predictor of reward and, often on the basis of an appropriate drive, sets an internal motivational state leading to the behavioral reaction. The similarity of approach reactions suggests that some of the general, preparatory components of the behavioral response are transferred from the primary reward to the earliest conditioned, reward-predicting stimulus. Thus the conditioned stimulus acts partly as a motivational substitute for the primary stimulus, probably through Pavlovian learning (Dickinson 1980).

Fig. 1.

Processing of appetitive stimuli during learning. An arbitrary stimulus becomes associated with a primary food or liquid reward through repeated, contingent pairing. This conditioned, reward-predicting stimulus induces an internal motivational state by evoking an expectation of the reward, often on the basis of a corresponding hunger or thirst drive, and elicits the behavioral reaction. This scheme replicates basic notions of incentive motivation theory developed by Bindra (1968) and Bolles (1972). It applies to classical conditioning, where reward is automatically delivered after the conditioned stimulus, and to instrumental (operant) conditioning, where reward delivery requires a reaction by the subject to the conditioned stimulus. This scheme applies also to aversive conditioning which is not further elaborated for reasons of brevity.

Many so called “unconditioned” food and liquid rewards are probably learned through experience, as every visitor to foreign countries can confirm. The primary reward then might consist of the taste experienced when the object activates the gustatory receptors, but that again may be learned. The ultimate rewarding effect of nutrient objects probably consists in their specific influences on basic biological variables, such as electrolyte, glucose, or amino acid concentrations in plasma and brain. These variables are defined by the vegetative needs of the organism and arise through evolution. Animals avoid nutrients that fail to influence important vegetative variables, for example foods lacking such essential amino acids as histidine (Rogers and Harper 1970), threonine (Hrupka et al. 1997; Wang et al. 1996), or methionine (Delaney and Gelperin 1986). A few primary rewards may be determined by innate instincts and support initial approach behavior and ingestion in early life, whereas the majority of rewards would be learned during the subsequent life experience of the subject. The physical appearance of rewards then could be used for predicting the much slower vegetative effects. This would dramatically accelerate the detection of rewards and constitute a major advantage for survival. Learning of rewards also allows subjects to use a much larger variety of nutrients as effective rewards and thus increase their chance for survival in zones of scarce resources.


Cell bodies of dopamine neurons are located mostly in midbrain groups A8 (dorsal to lateral substantia nigra), A9 (pars compacta of substantia nigra), and A10 (ventral tegmental area medial to substantia nigra). These neurons release the neurotransmitter dopamine with nerve impulses from axonal varicosities in the striatum (caudate nucleus, putamen, and ventral striatum including nucleus accumbens) and frontal cortex, to name the most important sites. We record the impulse activity from cell bodies of single dopamine neurons during periods of 20–60 min with moveable microelectrodes from extracellular positions while monkeys learn or perform behavioral tasks. The characteristic polyphasic, relatively long impulses discharged at low frequencies make dopamine neurons easily distinguishable from other midbrain neurons. The employed behavioral paradigms include reaction time tasks, direct and delayed go-no go tasks, spatial delayed response and alternation tasks, air puff and saline active avoidance tasks, operant and classically conditioned visual discrimination tasks, self-initiated movements, and unpredicted delivery of reward in the absence of any formal task. About 100–250 dopamine neurons are studied in each behavioral situation, and fractions of task-modulated neurons refer to these samples.

Initial recording studies searched for correlates of parkinsonian motor and cognitive deficits in dopamine neurons but failed to find clear covariations with arm and eye movements (DeLong et al. 1983; Schultz and Romo 1990; Schultz et al. 1983) or with mnemonic or spatial components of delayed response tasks (Schultz et al. 1993). By contrast, it was found that dopamine neurons were activated in a very distinctive manner by the rewarding characteristics of a wide range of somatosensory, visual, and auditory stimuli.

Activation by primary appetitive stimuli

About 75% of dopamine neurons show phasic activations when animals touch a small morsel of hidden food during exploratory movements in the absence of other phasic stimuli, without being activated by the movement itself (Romo and Schultz 1990). The remaining dopamine neurons do not respond to any of the tested environmental stimuli. Dopamine neurons also are activated by a drop of liquid delivered at the mouth outside of any behavioral task or while learning such different paradigms as visual or auditory reaction time tasks, spatial delayed response or alternation, and visual discrimination, often in the same animal (Fig. 2 top) (Hollerman and Schultz 1996; Ljungberg et al. 1991, 1992; Mirenowicz and Schultz 1994; Schultz et al. 1993). The reward responses occur independently of a learning context. Thus dopamine neurons do not appear to discriminate between different food objects and liquid rewards. However, their responses distinguish rewards from nonreward objects (Romo and Schultz 1990). Only 14% of dopamine neurons show the phasic activations when primary aversive stimuli are presented, such as an air puff to the hand or hypertonic saline to the mouth, and most of the activated neurons respond also to rewards (Mirenowicz and Schultz 1996). Although being nonnoxious, these stimuli are aversive in that they disrupt behavior and induce active avoidance reactions. However, dopamine neurons are not entirely insensitive to aversive stimuli, as shown by slow depressions or occasional slow activations after pain pinch stimuli in anesthetized monkeys (Schultz and Romo 1987) and by increased striatal dopamine release after electric shock and tail pinch in awake rats (Abercrombie et al. 1989; Doherty and Gratton 1992; Louilot et al. 1986; Young et al. 1993). This suggests that the phasic responses of dopamine neurons preferentially report environmental stimuli with primary appetitive value, whereas aversive events may be signaled with a considerably slower time course.

Fig. 2.

Dopamine neurons report rewards according to an error in reward prediction. Top: drop of liquid occurs although no reward is predicted at this time. Occurrence of reward thus constitutes a positive error in the prediction of reward. Dopamine neuron is activated by the unpredicted occurrence of the liquid. Middle: conditioned stimulus predicts a reward, and the reward occurs according to the prediction, hence no error in the prediction of reward. Dopamine neuron fails to be activated by the predicted reward (right). It also shows an activation after the reward-predicting stimulus, which occurs irrespective of an error in the prediction of the later reward (left). Bottom: conditioned stimulus predicts a reward, but the reward fails to occur because of lack of reaction by the animal. Activity of the dopamine neuron is depressed exactly at the time when the reward would have occurred. Note the depression occurring >1 s after the conditioned stimulus without any intervening stimuli, revealing an internal process of reward expectation. Neuronal activity in the 3 graphs follows the equation: dopamine response (Reward) = reward occurred − reward predicted. CS, conditioned stimulus; R, primary reward. Reprinted from Schultz et al. (1997) with permission by American Association for the Advancement of Science.

Unpredictability of reward

An important feature of dopamine responses is their dependency on event unpredictability. The activations following rewards do not occur when food and liquid rewards are preceded by phasic stimuli that have been conditioned to predict such rewards (Fig. 2, middle) (Ljungberg et al. 1992; Mirenowicz and Schultz 1994; Romo and Schultz 1990). One crucial difference between learning and fully acquired behavior is the degree of reward unpredictability. Dopamine neurons are activated by rewards during the learning phase but stop responding after full acquisition of visual and auditory reaction time tasks (Ljungberg et al. 1992; Mirenowicz and Schultz 1994), spatial delayed response tasks (Schultz et al. 1993), and simultaneous visual discriminations (Hollerman and Schultz 1996). The loss of response is not due to a developing general insensitivity to rewards, as activations following rewards delivered outside of tasks do not decrement during several months of experimentation (Mirenowicz and Schultz 1994). The importance of unpredictability includes the time of reward, as demonstrated by transient activations following rewards that are suddenly delivered earlier or later than predicted (Hollerman and Schultz 1996). Taken together, the occurrence of reward, including its time, must be unpredicted to activate dopamine neurons.

Depression by omission of predicted reward

Dopamine neurons are depressed exactly at the time of the usual occurrence of reward when a fully predicted reward fails to occur, even in the absence of an immediately preceding stimulus (Fig. 2, bottom). This is observed when animals fail to obtain reward because of erroneous behavior, when liquid flow is stopped by the experimenter despite correct behavior, or when a valve opens audibly without delivering liquid (Hollerman and Schultz 1996; Ljungberg et al. 1991; Schultz et al. 1993). When reward delivery is delayed for 0.5 or 1.0 s, a depression of neuronal activity occurs at the regular time of the reward, and an activation follows the reward at the new time (Hollerman and Schultz 1996). Both responses occur only during a few repetitions until the new time of reward delivery becomes predicted again. By contrast, delivering reward earlier than habitual results in an activation at the new time of reward but fails to induce a depression at the habitual time. This suggests that unusually early reward delivery cancels the reward prediction for the habitual time. Thus dopamine neurons monitor both the occurrence and the time of reward. In the absence of stimuli immediately preceding the omitted reward, the depressions do not constitute a simple neuronal response but reflect an expectation process based on an internal clock tracking the precise time of predicted reward.

Activation by conditioned, reward-predicting stimuli

About 55–70% of dopamine neurons are activated by conditioned visual and auditory stimuli in the various classically or instrumentally conditioned tasks described earlier (Fig. 2, middle and bottom) (Hollerman and Schultz 1996; Ljungberg et al. 1991, 1992; Mirenowicz and Schultz 1994; Schultz 1986; Schultz and Romo 1990; P. Waelti, J. Mirenowicz, and W. Schultz, unpublished data). The first dopamine responses to conditioned light were reported by Miller et al. (1981) in rats treated with haloperidol, which increased the incidence and spontaneous activity of dopamine neurons but resulted in more sustained responses than in undrugged animals. Although responses occur close to behavioral reactions (Nishino et al. 1987), they are unrelated to arm and eye movements themselves, as they occur also ipsilateral to the moving arm and in trials without arm or eye movements (Schultz and Romo 1990). Conditioned stimuli are somewhat less effective than primary rewards in terms of response magnitude and fractions of neurons activated. Dopamine neurons respond only to the onset of conditioned stimuli and not to their offset, even if stimulus offset predicts the reward (Schultz and Romo 1990). Dopamine neurons do not distinguish between visual and auditory modalities of conditioned appetitive stimuli. However, they discriminate between appetitive and neutral or aversive stimuli as long as they are physically sufficiently dissimilar (Ljungberg et al. 1992; P. Waelti, J. Mirenowicz, and W. Schultz, unpublished data). Only 11% of dopamine neurons, most of them with appetitive responses, show the typical phasic activations also in response to conditioned aversive visual or auditory stimuli in active avoidance tasks in which animals release a key to avoid an air puff or a drop of hypertonic saline (Mirenowicz and Schultz 1996), although such avoidance may be viewed as “rewarding.” These few activations are not sufficiently strong to induce an average population response. Thus the phasic responses of dopamine neurons preferentially report environmental stimuli with appetitive motivational value but without discriminating between different sensory modalities.

Transfer of activation

During the course of learning, dopamine neurons become gradually activated by conditioned, reward-predicting stimuli and progressively lose their responses to primary food or liquid rewards that become predicted (Hollerman and Schultz 1996; Ljungberg et al. 1992; Mirenowicz and Schultz 1994) (Figs. 2 and 3). During a transient learning period, both rewards and conditioned stimuli elicit dopamine activations. This transfer from primary reward to the conditioned stimulus occurs instantaneously in single dopamine neurons tested in two well-learned tasks employing, respectively, unpredicted and predicted rewards (Romo and Schultz 1990).

Fig. 3.

Dopamine response transfer to earliest predictive stimulus. Responses to unpredicted primary reward transfer to progressively earlier reward-predicting stimuli. All displays show population histograms obtained by averaging normalized perievent time histograms of all dopamine neurons recorded in the behavioral situations indicated, independent of the presence or absence of a response. Top: outside of any behavioral task, there is no population response in 44 neurons tested with a small light (data from Ljungberg et al. 1992), but an average response occurs in 35 neurons to a drop of liquid delivered at a spout in front of the animal's mouth (Mirenowicz and Schultz 1994). Middle: response to a reward-predicting trigger stimulus in a 2-choice spatial reaching task, but absence of response to reward delivered during established task performance in the same 23 neurons (Schultz et al. 1993). Bottom: response to an instruction cue preceding the reward-predicting trigger stimulus by a fixed interval of 1 s in an instructed spatial reaching task (19 neurons) (Schultz et al. 1993). Time base is split because of varying intervals between conditioned stimuli and reward. Reprinted from Schultz et al. (1995b) with permission by MIT Press.

Unpredictability of conditioned stimuli

The activations after conditioned, reward-predicting stimuli do not occur when these stimuli themselves are preceded at a fixed interval by phasic conditioned stimuli in fully established behavioral situations. Thus with serial conditioned stimuli, dopamine neurons are activated by the earliest reward-predicting stimulus, whereas all stimuli and rewards following at predictable moments afterwards are ineffective (Fig. 3) (Schultz et al. 1993). Only randomly spaced sequential stimuli elicit individual responses. Also, extensive overtraining with highly stereotyped task performance attenuates the responses to conditioned stimuli, probably because stimuli become predicted by events in the preceding trial (Ljungberg et al. 1992). This suggests that stimulus unpredictability is a common requirement for all stimuli activating dopamine neurons.

Depression by omission of predicted conditioned stimuli

Preliminary data from a previous experiment (Schultz et al. 1993) suggest that dopamine neurons also are depressed when a conditioned, reward-predicting stimulus is predicted itself at a fixed time by a preceding stimulus but fails to occur because of an error of the animal. As with primary rewards, the depressions occur at the time of the usual occurrence of the conditioned stimulus, without being directly elicited by a preceding stimulus. Thus the omission-induced depression may be generalized to all appetitive events.

Activation-depression with response generalization

Dopamine neurons also respond to stimuli that do not predict rewards but closely resemble reward-predicting stimuli occurring in the same context. These responses consist mostly of an activation followed by an immediate depression but may occasionally consist of pure activation or pure depression. The activations are smaller and less frequent than those following reward-predicting stimuli, and the depressions are observed in 30–60% of neurons. Dopamine neurons respond to visual stimuli that are not followed by reward but closely resemble reward-predicting stimuli, despite correct behavioral discrimination (Schultz and Romo 1990). Opening of an empty box fails to activate dopamine neurons but becomes effective in every trial as soon as the box occasionally contains food (Ljungberg et al. 1992; Schultz 1986; Schultz and Romo 1990) or when a neighboring, identical box always containing food opens in random alternation (Schultz and Romo 1990). The empty box elicits weaker activations than the baited box. Animals perform indiscriminate ocular orienting reactions to each box but only approach the baited box with their hand. During learning, dopamine neurons continue to respond to previously conditioned stimuli that lose their reward prediction when reward contingencies change (Schultz et al. 1993) or respond to new stimuli resembling previously conditioned stimuli (Hollerman and Schultz 1996). Responses occur even to aversive stimuli presented in random alternation with physically similar, conditioned appetitive stimuli of the same sensory modality, the aversive response being weaker than the appetitive one (Mirenowicz and Schultz 1996). Responses generalize even to behaviorally extinguished appetitive stimuli. Apparently, neuronal responses generalize to nonappetitive stimuli because of their physical resemblance with appetitive stimuli.

Novelty responses

Novel stimuli elicit activations in dopamine neurons that often are followed by depressions and persist as long as behavioral orienting reactions occur (e.g., ocular saccades). Activations subside together with orienting reactions after several stimulus repetitions, depending on the physical impact of stimuli. Whereas small light-emitting diodes hardly elicit novelty responses, light flashes and the rapid visual and auditory opening of a small box elicit activations that decay gradually to baseline during <100 trials (Ljungberg et al. 1992). Loud clicks or large pictures immediately in front of an animal elicit strong novelty responses that decay but still induce measurable activations with >1,000 trials (Hollerman and Schultz 1996; Horvitz et al. 1997; Steinfels et al. 1983). Figure 4 shows schematically the different response magnitudes with novel stimuli of different physical salience. Responses decay gradually with repeated exposure but may persist at reduced magnitudes with very salient stimuli. Response magnitudes increase again when the same stimuli are appetitively conditioned. By contrast, responses to novel, even large, stimuli subside rapidly when the stimuli are used for conditioning active avoidance behavior (Mirenowicz and Schultz 1996). Very few neurons (<5%) respond for more than a few trials to conspicuous yet physically weak stimuli, such as crumbling of paper or gross hand movements of the experimenter.

Fig. 4.

Time courses of activations of dopamine neurons to novel, alerting, and conditioned stimuli. Activations after novel stimuli decrease with repeated exposure over consecutive trials. Their magnitude depends on the physical salience of stimuli as stronger stimuli induce higher activations that occasionally exceed those after conditioned stimuli. Particularly salient stimuli continue to activate dopamine neurons with limited magnitude even after losing their novelty without being paired with primary rewards. Consistent activations appear again when stimuli become associated with primary rewards. This scheme was contributed by Jose Contreras-Vidal.

Homogeneous character of responses

The experiments performed so far have revealed that the majority of neurons in midbrain dopamine cell groups A8, A9, and A10 show very similar activations and depressions in a given behavioral situation, whereas the remaining dopamine neurons do not respond at all. There is a tendency for higher fractions of neurons responding in more medial regions of the midbrain, such as the ventral tegmental area and medial substantia nigra, as compared with more lateral regions, which occasionally reach statistical significance (Schultz 1986; Schultz et al. 1993). Response latencies (50–110 ms) and durations (<200 ms) are similar among primary rewards, conditioned stimuli, and novel stimuli. Thus the dopamine response constitutes a relatively homogeneous, scalar population signal. It is graded in magnitude by the responsiveness of individual neurons and by the fraction of responding neurons within the population.

Summary 1: adaptive responses during learning episodes

The characteristics of dopamine responses to reward-related stimuli are best illustrated in learning episodes during that rewards are particularly important for acquiring behavioral responses. The dopamine reward signal undergoes systematic changes during the progress of learning and occurs to the earliest phasic reward-related stimulus, this being either a primary reward or a reward-predicting stimulus (Ljungberg et al. 1992; Mirenowicz and Schultz 1994). During learning, novel, intrinsically neutral stimuli transiently induce responses that weaken soon and disappear (Fig. 4). Primary rewards occur unpredictably during initial pairing with such stimuli and elicit neuronal activations. With repeated pairing, rewards become predicted by conditioned stimuli. Activations after the reward decrease gradually and are transferred to the conditioned, reward-predicting stimulus. If, however, a predicted reward fails to occur because of an error of the animal, dopamine neurons are depressed at the time the reward would have occurred. During repeated learning of tasks (Schultz et al. 1993) or task components (Hollerman and Schultz 1996), the earliest conditioned stimuli activate dopamine neurons during all learning phases because of generalization to previously learned, similar stimuli, whereas subsequent conditioned stimuli and primary rewards activate dopamine neurons only transiently while they are uncertain and new contingencies are being established.

Summary 2: effective stimuli for dopamine neurons

Dopamine responses are elicited by three categories of stimuli. The first category comprises primary rewards and stimuli that have become valid reward predictors through repeated and contingent pairing with rewards. These stimuli form a common class of explicit reward-predicting stimuli, as primary rewards serve as predictors of vegetative rewarding effects. Effective stimuli apparently have an alerting component, as only stimuli with a clear onset are effective. Dopamine neurons show pure activations following explicit reward-predicting stimuli and are depressed when a predicted but omitted reward fails to occur (Fig. 5, top).

Fig. 5.

Schematic display of responses of dopamine neurons to 2 types of conditioned stimuli. Top: presentation of an explicit reward-predicting stimulus leads to activation after the stimulus, no response to the predicted reward, and depression when a predicted reward fails to occur. Bottom: presentation of a stimulus closely resembling a conditioned, reward-predicting stimulus leads to activation followed by depression, activation after the reward, and no response when no reward occurs. Activation after the stimulus probably reflects response generalization because of physical similarity. This stimulus does not explicitly predict a reward but is related to the reward via its similarity to the stimulus predicting the reward. In comparison with explicit reward-predicting stimuli, activations are lower and often are followed by depressions, thus discriminating between rewarded (CS+) and unrewarded (CS−) conditioned stimuli. This scheme summarizes results from previous and current experiments (Hollerman and Schultz 1996; Ljungberg et al. 1992; Mirenowicz and Schultz 1996; Schultz and Romo 1990; Schultz et al. 1993; P. Waelti and W. Schultz, unpublished results).

The second category comprises stimuli that elicit generalizing responses. These stimuli do not explicitly predict rewards but are effective because of their physical similarity to stimuli that have become explicit reward predictors through conditioning. These stimuli induce activations that are lower in magnitude and engage fewer neurons, as compared with explicit reward-predicting stimuli (Fig. 5, bottom). They are frequently followed by immediate depressions. Whereas the initial activation may constitute a generalized appetitive response that signals a possible reward, the subsequent depression may reflect the prediction of no reward in a general reward-predicting context and cancel the erroneous reward assumption. The lack of explicit reward prediction is suggested further by the presence of activation after primary reward and the absence of depression with no reward. Together with the responses to reward-predicting stimuli, it appears as if dopamine activations report an appetitive “tag” affixed to stimuli that are related to rewards.

The third category comprises novel or particularly salient stimuli that are not necessarily related to specific rewards. By eliciting behavioral orienting reactions, these stimuli are alerting and command attention. However, they also have motivating functions and can be rewarding (Fujita 1987). Novel stimuli are potentially appetitive. Novel or particularly salient stimuli induce activations that are frequently followed by depressions, similar to responses to generalizing stimuli.

Thus the phasic responses of dopamine neurons report events with positive and potentially positive motivating effects, such as primary rewards, reward-predicting stimuli, reward-resembling events, and alerting stimuli. However, they do not detect to a large extent events with negative motivating effects, such as aversive stimuli.

Summary 3: the dopamine reward prediction error signal

The dopamine responses to explicit reward-related events can be best conceptualized and understood in terms of formal theories of learning. Dopamine neurons report rewards relative to their prediction rather than signaling primary rewards unconditionally (Fig. 2). The dopamine response is positive (activation) when primary rewards occur without being predicted. The response is nil when rewards occur as predicted. The response is negative (depression) when predicted rewards are omitted. Thus dopamine neurons report primary rewards according to the difference between the occurrence and the prediction of reward, which can be termed an error in the prediction of reward (Schultz et al. 1995b, 1997) and is tentatively formalized asDopamineResponse (Reward)=RewardOccurredRewardPredicted Equation 1This suggestion can be extended to conditioned appetitive events that also are reported by dopamine neurons relative to prediction. Thus dopamine neurons may report an error in the prediction of all appetitive events, and Eq. 1 can be stated in the more general formDopamineResponse (ApEvent)=ApEventOccurredApEventPredicted Equation 2This generalization is compatible with the idea that most rewards actually are conditioned stimuli. With several consecutive, well-established reward-predicting events, only the first event is unpredictable and elicits the dopamine activation.


Origin of the dopamine response

Which anatomic inputs could be responsible for the selectivity and polysensory nature of dopamine responses? Which input activity could lead to the coding of prediction errors, induce the adaptive response transfer to the earliest unpredicted appetitive event and estimate the time of reward?


The GABAergic neurons in the striosomes (patches) of the striatum project in a broadly topographic and partly overlapping, interdigitating manner to dopamine neurons in nearly the entire pars compacta of substantia nigra, whereas neurons of the much larger striatal matrix contact predominantly the nondopamine neurons of pars reticulata of substantia nigra, besides their projection to globus pallidus (Gerfen 1984; Hedreen and DeLong 1991; Holstein et al. 1986; Jimenez-Castellanos and Graybiel 1989; Selemon and Goldman-Rakic 1990; Smith and Bolam 1991). Neurons in the ventral striatum project in a nontopographic manner to both pars compacta and pars reticulata of medial substantia nigra and to the ventral tegmental area (Berendse et al. 1992; Haber et al. 1990; Lynd-Balta and Haber 1994; Somogyi et al. 1981). The GABAergic striatonigral projection may exert two distinctively different influences on dopamine neurons, a direct inhibition and an indirect activation (Grace and Bunney 1985; Smith and Grace 1992; Tepper et al. 1995). The latter is mediated by striatal inhibition of pars reticulata neurons and subsequent GABAergic inhibition from local axon collaterals of pars reticulata output neurons onto dopamine neurons. This constitutes a double inhibitory link and results in net activation of dopamine neurons by the striatum. Thus striosomes and ventral striatum may monosynaptically inhibit and the matrix may indirectly activate dopamine neurons.

Dorsal and ventral striatal neurons show a number of activations that might contribute to dopamine reward responses, namely responses to primary rewards (Apicella et al. 1991a; Williams et al. 1993), responses to reward-predicting stimuli (Hollerman et al. 1994; Romo et al. 1992) and sustained activations during the expectation of reward-predicting stimuli and primary rewards (Apicella et al. 1992; Schultz et al. 1992). However, the positions of these neurons relative to striosomes and matrix are unknown, and striatal activations reflecting the time of expected reward have not yet been reported.

The polysensory reward responses might be the result of feature extraction in cortical association areas. Response latencies of 30–75 ms in primary and associative visual cortex (Maunsell and Gibson 1992; Miller et al. 1993) could combine with rapid conduction to striatum and double inhibition of substantia nigra to induce the short dopamine response latencies of <100 ms. Whereas reward-related activity has not been reported for posterior association cortex, neurons in dorsolateral and orbital prefrontal cortex respond to primary rewards and reward-predicting stimuli and show sustained activations during reward expectation (Rolls et al. 1996; Thorpe et al. 1983; Tremblay and Schultz 1995; Watanabe 1996). Some reward responses in frontal cortex depend on reward unpredictability (Matsumoto et al. 1995; L. Tremblay and W. Schultz, unpublished results) or reflect behavioral errors or omitted rewards (Niki and Watanabe 1979; Watanabe 1989). The cortical influence on dopamine neurons would even be faster through a direct projection, originating from prefrontal cortex in rats (Gariano and Groves 1988; Sesack and Pickel 1992; Tong et al. 1996) but being weak in monkeys (Künzle 1978).


Short latencies of reward responses may be derived from adaptive, feature-processing mechanisms in the brain stem. Nucleus pedunculopontinus is an evolutionary precursor of substantia nigra. In nonmammalian vertebrates, it contains dopamine neurons and projects to the paleostriatum (Lohman and Van Woerden-Verkley 1978). In mammals, this nucleus sends strong excitatory, cholinergic, and glutamatergic influences to a high fraction of dopamine neurons with latencies of ∼7 ms (Bolam et al. 1991; Clarke et al. 1987; Futami et al. 1995; Scarnati et al. 1986). Activation of pedunculopontine-dopamine projections induces circling behavior (Niijima and Yoshida 1988), suggesting a functional influence on dopamine neurons.


A massive, probably excitatory input to dopamine neurons arises from different nuclei of the amygdala (Gonzalez and Chesselet 1990; Price and Amaral 1981). Amygdala neurons respond to primary rewards and reward-predicting visual and auditory stimuli. The neuronal responses known so far are independent of stimulus unpredictability and do not discriminate well between appetitive and aversive events (Nakamura et al. 1992; Nishijo et al. 1988). Most responses show latencies of 140–310 ms, which are longer than in dopamine neurons, although a few responses occur at latencies of 60–100 ms.


The monosynaptic projection from dorsal raphé (Corvaja et al. 1993; Nedergaard et al. 1988) has a depressant influence on dopamine neurons (Fibiger et al. 1977; Trent and Tepper 1991). Raphé neurons show short-latency activations after high-intensity environmental stimuli (Heym et al. 1982), allowing them to contribute to dopamine responses after novel or particularly salient stimuli.


A few, well-known input structures are the most likely candidates for mediating the dopamine responses, although additional inputs also may exist. Activations of dopamine neurons by primary rewards and reward-predicting stimuli could be mediated by double inhibitory, net activating input from the striatal matrix (for a simplified diagram, see Fig. 6). Activations also could arise from pedunculopontine nucleus or possibly from reward expectation-related activity in neurons of the subthalamic nucleus projecting to dopamine neurons (Hammond et al. 1983; Matsumura et al. 1992; Smith et al. 1990). The absence of activation with fully predicted rewards could be the result of monosynaptic inhibition from striosomes, cancelling out simultaneously activating matrix input. Depressions at the time of omitted reward could be mediated by inhibitory inputs from neurons in striatal striosomes (Houk et al. 1995) or globus pallidus (Haber et al. 1993; Hattori et al. 1975; Y. Smith and Bolam 1990, 1991). Convergence between different inputs before or at the level of dopamine neurons could result in the rather complex coding of reward prediction errors and the adaptive response transfer from primary rewards to reward-predicting stimuli.

Fig. 6.

Simplified diagram of inputs to midbrain dopamine neurons potentially mediating dopamine responses. Only inputs from caudate to substantia nigra (SN) pars compacta and reticulata are shown for reasons of simplicity. Activations may arise by a double inhibitory, net activating influence from GABAergic matrix neurons in caudate and putamen via GABAergic neurons of SN pars reticulata to dopamine neurons of SN pars compacta. Activations also may be mediated by excitatory cholinergic or amino acid-containing projections from nucleus pedunculopontinus. Depressions could be due to monosynaptic GABAergic projections from striosomes (patches) in caudate and putamen to dopamine neurons. Similar projections exist from ventral striatum to dopamine neurons in medial SN pars compacta and group A10 in the ventral tegmental area and from dorsal striatum to group A8 dopamine neurons dorsolateral to SN (Lynd-Balta and Haber 1994). Heavy circle represents dopamine neurons. These projections represent the most likely inputs underlying the dopamine responses, without ruling out inputs from globus pallidus and subthalamic nucleus.

Phasic dopamine influences on target structures


Divergent projections. There are ∼8,000 dopamine neurons in each substantia nigra of rats (Oorschot 1996) and 80,000–116,000 in macaque monkeys (German et al. 1988; Percheron et al. 1989). Each striatum contains ∼2.8 million neurons in rats and 31 million in macaques, resulting in a nigrostriatal divergence factor of 300–400. Each dopamine axon ramifies abundantly in a limited terminal area in striatum and has ∼500,000 striatal varicosities from which dopamine is released (Andén et al. 1966). This results in dopamine input to nearly every striatal neuron (Groves et al. 1995) and a moderately topographic nigrostriatal projection (Lynd-Balta and Haber 1994). The cortical dopamine innervation in monkeys is highest in areas 4 and 6, is still sizeable in frontal, parietal, and temporal lobes, and is lowest in the occipital lobe (Berger et al. 1988; Williams and Goldman-Rakic 1993). Cortical dopamine synapses are predominantly found in layers I and V–VI, contacting a large proportion of cortical neurons there. Together with the rather homogeneous response nature, these data suggest that the dopamine response advances as a simultaneous, parallel wave of activity from the midbrain to striatum and frontal cortex (Fig. 7).

Fig. 7.

Global dopamine signal advancing to striatum and cortex. Relatively homogeneous population response of the majority of dopamine neurons to appetitive and alerting stimuli and its progression from substantia nigra to postsynaptic structures can be viewed schematically as a wave of synchronous, parallel activity advancing at a velocity of 1–2 m/s (Schultz and Romo 1987) along the diverging projections from the midbrain to striatum (caudate and putamen) and cortex. Responses are qualitatively indistinguishable between neurons of substantia nigra (SN) pars compacta and ventral tegmental area (VTA). Dopamine innervation of all neurons in striatum and many neurons in frontal cortex would allow the dopamine reinforcement signal to exert a rather global effect. Wave has been compressed to emphasize the parallel nature.

Dopamine release. Impulses of dopamine neurons at intervals of 20–100 ms lead to a much higher dopamine concentration in striatum than the same number of impulses at intervals of 200 ms (Garris and Wightman 1994; Gonon 1988). This nonlinearity is mainly due to the rapid saturation of the dopamine reuptake transporter, which clears the released dopamine from the extrasynaptic region (Chergui et al. 1994). The same effect is observed in nucleus accumbens (Wightman and Zimmerman 1990) and occurs even with longer impulse intervals because of sparser reuptake sites (Garris et al. 1994b; Marshall et al. 1990; Stamford et al. 1988). Dopamine release after an impulse burst of <300 ms is too short for activating the autoreceptor-mediated reduction of release (Chergui et al. 1994) or the even slower enzymatic degradation (Michael et al. 1985). Thus a bursting dopamine response is particularly efficient for releasing dopamine.

Estimates based on in vivo voltammetry suggest that a single impulse releases ∼1,000 dopamine molecules at synapses in striatum and nucleus accumbens. This leads to immediate synaptic dopamine concentrations of 0.5–3.0 μM (Garris et al. 1994a; Kawagoe et al. 1992). At 40 μs after release onset, >90% of dopamine has left the synapse, some of the rest being later eliminated by synaptic reuptake (half onset time of 30–37 ms). At 3–9 ms after release onset, dopamine concentrations reach a peak of ∼250 nM when all neighboring varicosities simultaneously release dopamine. Concentrations are homogeneous within a sphere of 4 μm diam (Gonon 1997), which is the average distance between varicosities (Doucet et al. 1986; Groves et al. 1995). Maximal diffusion is restricted to 12 μm by the reuptake transporter and is reached in 75 ms after release onset (half transporter onset time of 30–37 ms). Concentrations would be slightly lower and less homogeneous in regions with fewer varicosities or when <100% of dopamine neurons are activated, but they are two to three times higher with impulse bursts. Thus the reward-induced, mildly synchronous, bursting activations in ∼75% of dopamine neurons may lead to rather homogeneous concentration peaks in the order of 150–400 nM. Total increases of extracellular dopamine last 200 ms after a single impulse and 500–600 ms after multiple impulses of 20–100 ms intervals applied during 100–200 ms (Chergui et al. 1994; Dugast et al. 1994). The extrasynaptic reuptake transporter (Nirenberg et al. 1996) subsequently brings dopamine concentrations back to their baseline of 5–10 nM (Herrera-Marschitz et al. 1996). Thus in contrast to classic, strictly synaptic neurotransmission, synaptically released dopamine diffuses rapidly into the immediate juxtasynaptic area and reaches short peaks of regionally homogenous extracellular concentrations.

Receptors. Of the two principal types of dopamine receptors, the adenylate cyclase-activating, D1 type receptors constitute ∼80% of dopamine receptors in striatum. Of these 80% are in the low-affinity state of 2–4 μM and 20% in the high-affinity state of 9–74 nM (Richfield et al. 1989). The remaining 20% of striatal dopamine receptors belong to the adenylase cyclase-inhibiting D2 type of which 10–0% are in the low-affinity state and 80–90% in the high-affinity state, with similar affinities as D1 receptors. Thus D1 receptors overall have an ∼100 times lower affinity than D2 receptors. Striatal D1 receptors are located predominantly on neurons projecting to internal pallidum and substantia nigra pars reticulata, whereas striatal D2 receptors are located mostly on neurons projecting to external pallidum (Bergson et al. 1995; Gerfen et al. 1990; Hersch et al. 1995; Levey et al. 1993). However, the differences in receptor sensitivity may not play a role beyond signal transduction, thus reducing the differences in dopamine sensitivity between the two types of striatal output neurons.

Dopamine is released to 30–40% from synaptic and to 60–70% from extrasynaptic varicosities (Descarries et al. 1996). Synaptically released dopamine acts on postsynaptic dopamine receptors at four anatomically distinct sites in the striatum, namely inside dopamine synapses, immediately adjacent to dopamine synapses, inside corticostriatal glutamate synapses, and at extrasynaptic sites remote from release sites (Fig. 8) (Levey et al. 1993; Sesack et al. 1994; Yung et al. 1995). D1 receptors are localized mainly outside of dopamine synapses (Caillé et al. 1996). The high transient concentrations of dopamine after phasic impulse bursts would activate D1 receptors in the immediate vicinity of the active release sites and activate and even saturate D2 receptors everywhere. D2 receptors would remain partly activated when the ambient dopamine concentration returns to baseline after phasic increases.

Fig. 8.

Influences of dopamine release on typical medium spiny neurons in the dorsal and ventral striatum. Dopamine released by impulses from synaptic varicosities activates a few synaptic receptors (probably of D2 type in the low-affinity state) and diffuses rapidly out of the synapse to reach low affinity D1 type receptors (D1?) that are located nearby, within corticostriatal synapses, or at a limited distance. Phasically increased dopamine activates nearby high-affinity D2 type receptors to saturation (D2?). D2 receptors remain partly activated by the ambient dopamine concentrations after the phasically increased release. Extrasynaptically released dopamine may get diluted by diffusion and activate high-affinity D2 receptors. It should be noted that, in variance with this schematic diagram, most D1 and D2 receptors are located on different neurons. Glutamate released from corticostriatal terminals reaches postsynaptic receptors located on the same dendritic spines as dopamine varicosities. Glutamate also reaches presynaptic dopamine varicosities where it controls dopamine release. Dopamine influences on spiny neurons in frontal cortex are comparable in many respects.

Summary. The observed, moderately bursting, short-duration, nearly synchronous, response of the majority of dopamine neurons leads to optimal, simultaneous dopamine release from the majority of closely spaced striatal varicosities. The neuronal response induces a short puff of dopamine that is released from extrasynaptic sites or diffuses rapidly from synapses into the juxtasynaptic area. Dopamine quickly reaches regionally homogenous concentrations likely to influence the dendrites of probably all striatal and many cortical neurons. In this way, the reward message in 60–80% of dopamine neurons is broadcast as a divergent, rather global reinforcement signal to the striatum, nucleus accumbens, and frontal cortex, assuring a phasic influence on a maximum number of synapses involved in the processing of stimuli and actions leading to reward (Fig. 7). Dopamine released by neuronal activations after rewards and reward-predicting stimuli would affect juxtasynaptic D1 receptors on striatal neurons projecting to internal pallidum and substantia nigra pars reticulata and all D2 receptors on neurons projecting to external pallidum. The reduction of dopamine release induced by depressions with omitted rewards and reward-predicting stimuli would reduce the tonic stimulation of D2 receptors by ambient dopamine. Thus positive reward prediction errors would influence all types of striatal output neurons, whereas the negative prediction error might predominantly influence neurons projecting to external pallidum.

Potential cocaine mechanisms. Blockade of the dopamine reuptake transporter by drugs like cocaine or amphetamine enhances and prolongs phasic increases in dopamine concentrations (Church et al. 1987a; Giros et al. 1996; Suaud-Chagny et al. 1995). The enhancement would be particularly pronounced when rapid, burst-induced increases in dopamine concentration reach a peak before feedback regulation becomes effective. This mechanism would lead to a massively enhanced dopamine signal after primary rewards and reward-predicting stimuli. It also would increase the somewhat weaker dopamine signal after stimuli resembling rewards, novel stimuli, and particularly salient stimuli that might be frequent in everyday life. The enhancement by cocaine would let these nonrewarding stimuli appear as strong or even stronger than natural rewards without cocaine. Postsynaptic neurons could misinterpret such a signal as a particularly prominent reward-related event and undergo long-term changes in synaptic transmission.


Dopamine actions on striatal neurons depend on the type of receptor activated, are related to the depolarized versus hyperpolarized states of membrane potentials and often involve glutamate receptors. Activation of D1 dopamine receptors enhances the excitation evoked by activation of N-methyl-d-aspartate (NMDA) receptors after cortical inputs via L-type Ca2+ channels when the membrane potential is in the depolarized state (Cepeda et al. 1993, 1998; Hernandez-Lopez et al. 1997; Kawaguchi et al. 1989). By contrast, D1 activation appears to reduce evoked excitations when the membrane potential is in the hyperpolarized state (Hernandez-Lopez et al. 1997). In vivo dopamine iontophoresis and axonal stimulation induce D1-mediated excitations lasting 100–500 ms beyond dopamine release (Gonon 1997; Williams and Millar 1991). Activation of D2 dopamine receptors reduces Na+ and N-type Ca2+ currents and attenuates excitations evoked by activation of NMDA or α-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid (AMPA) receptors at any membrane state (Cepeda et al. 1995; Yan et al. 1997). At the systems level, dopamine exerts a focusing effect whereby only the strongest inputs pass through striatum to external and internal pallidum, whereas weaker activity is lost (Brown and Arbuthnott 1983; Filion et al. 1988; Toan and Schultz 1985; Yim and Mogenson 1982). Thus the dopamine released by the dopamine response may lead to an immediate overall reduction in striatal activity, although a facilitatory effect on cortically evoked excitations may be mediated via D1 receptors. The following discussion will show that the effects of dopamine neurotransmission may not be limited to changes in membrane polarization.


Tetanic electrical stimulation of cortical or limbic inputs to striatum and nucleus accumbens induces posttetanic depressions lasting several tens of minutes in slices (Calabresi et al. 1992a; Lovinger et al. 1993; Pennartz et al. 1993; Walsh 1993; Wickens et al. 1996). This manipulation also enhances the excitability of corticostriatal terminals (Garcia-Munoz et al. 1992). Posttetanic potentiation of similar durations is observed in striatum and nucleus accumbens when postsynaptic depolarization is facilitated by removal of magnesium or application of γ-aminobutyric acid (GABA) antagonists (Boeijinga et al. 1993; Calabresi et al. 1992b; Pennartz et al. 1993). D1 or D2 dopamine receptor antagonists or D2 receptor knockout abolish posttetanic corticostriatal depression (Calabresi et al. 1992a; Calabresi et al. 1997; Garcia-Munoz et al. 1992) but do not affect potentiation in nucleus accumbens (Pennartz et al. 1993). Application of dopamine restores striatal posttetanic depression in slices from dopamine-lesioned rats (Calabresi et al. 1992a) but fails to modify posttetanic potentiation (Pennartz et al. 1993). Short pulses of dopamine (5–20 ms) induce long-term potentiation in striatal slices when applied simultaneously with tetanic corticostriatal stimulation and postsynaptic depolarization, complying with a three-factor reinforcement learning rule (Wickens et al. 1996).

Further evidence for dopamine-related synaptic plasticity is found in other brain structures or with different methods. In the hippocampus, posttetanic potentiation is increased by bath application of D1 agonists (Otmakhova and Lisman 1996) and impaired by D1 and D2 receptor blockade (Frey et al. 1990). Burst contingent but not burst noncontingent local applications of dopamine and dopamine agonists increase neuronal bursting in hippocampal slices (Stein et al. 1994). In fish retina, activation of D2 dopamine receptors induces movements of photoreceptors in or out of the pigment epithelium (Rogawski 1987). Posttrial injections of amphetamine and dopamine agonists into rat caudate nucleus improve performance in memory tasks (Packard and White 1991). Dopamine denervations in the striatum reduce the number of dendritic spines (Arbuthnott and Ingham 1993; Anglade et al. 1996; Ingham et al. 1993), suggesting that the dopamine innervation has persistent effects on corticostriatal synapses.


An estimated 10,000 cortical terminals and 1,000 dopamine varicosities contact the dendritic spines of each striatal neuron (Doucet et al. 1986; Groves et al. 1995; Wilson 1995). The dense dopamine innervation becomes visible as baskets outlining individual perikarya in pigeon paleostriatum (Wynne and Güntürkün 1995). Dopamine varicosities form synapses on the same dendritic spines of striatal neurons that are contacted by cortical glutamate afferents (Fig. 8) (Bouyer et al. 1984; Freund et al. 1984; Pickel et al. 1981; Smith et al. 1994), and some dopamine receptors are located inside corticostriatal synapses (Levey et al. 1993; Yung et al. 1995). The high number of cortical inputs to striatal neurons, the convergence between dopamine and glutamate inputs at the spines of striatal neurons, and the largely homogeneous dopamine signal reaching probably all striatal neurons are ideal substrates for dopamine-dependent synaptic changes at the spines of striatal neurons. This also may hold for the cortex where dendritic spines are contacted by synaptic inputs from both dopamine and cortical neurons (Goldman-Rakic et al. 1989), although dopamine probably does not influence every cortical neuron.

The basal ganglia are connected by open and closed loops with the cortex and with subcortical limbic structures. The striatum receives to varying degrees inputs from all cortical areas. Basal ganglia outputs are directed predominantly toward frontal cortical areas but also reach the temporal lobe (Middleton and Strick 1996). Many inputs from functionally heterogeneous cortical areas to the striatum are organized in segregated, parallel channels, as are the outputs from internal pallidum directed to different motor cortical areas (Alexander et al. 1986; Hoover and Strick 1993). However, afferents from functionally related but anatomically different cortical areas may converge on striatal neurons. For example, projections from somatotopically related areas of primary somatosensory and motor cortex project to common striatal regions (Flaherty and Graybiel 1993, 1994). Corticostriatal projections diverge into separate striatal “matrisomes” and reconverge in the pallidum, thus increasing the synaptic “surface” for modulatory interactions and associations (Graybiel et al. 1994). This anatomic arrangement would allow the dopamine signal to determine the efficacy of highly structured, task-specific cortical inputs to striatal neurons and exert a widespread influence on forebrain centers involved in the control of behavioral action.


Dopamine neurons appear to report appetitive events according to a prediction error (Eqs. 1 and 2 ). Current learning theories and neuronal models demonstrate the crucial importance of prediction errors for learning.

Learning theories


Behavioral learning theories formalize the acquisition of associations between arbitrary stimuli and primary motivating events in classical conditioning paradigms. Stimuli gain associative strength over consecutive trials by being repeatedly paired with a primary motivating eventΔV=αβ(λV) Equation 3where V is current associative strength of the stimulus, λ is maximum associative strength possibly sustained by the primary motivating event, α and β are constants reflecting the salience of conditioned and unconditioned stimuli, respectively (Dickinson 1980; Mackintosh 1975; Pearce and Hall 1980; Rescorla and Wagner 1972). The (λ-V) term indicates the degree to which the primary motivating event occurs unpredictably and represents an error in the prediction of reinforcement. It determines the rate of learning, as associative strength increases when the error term is positive and the conditioned stimulus does not fully predict the reinforcement. When V = λ, the conditioned stimulus fully predicts the reinforcer, and V will not further increase. Thus learning occurs only when the primary motivating event is not fully predicted by a conditioned stimulus. This interpretation is suggested by the blocking phenomenon, according to which a stimulus fails to gain associative strength when presented together with another stimulus that by itself fully predicts the reinforcer (Kamin 1969). The (λ-V) error term becomes negative when a predicted reinforcer fails to occur, leading to a loss of associative strength of the conditioned stimulus (extinction). Note that these models use the term “reinforcement” in the broad sense of increasing the frequency and intensity of specific behavior and do not refer to any particular type of learning.


The Rescorla-Wagner model relates to the general principle of learning driven by errors between the desired and the actual output, such as the least mean square error procedure (Kalman 1960; Widrow and Sterns 1985). This principle has been applied to neuronal network models in the Delta rule, according to which synaptic weights (ω) are adjusted byΔω=η(ta)x Equation 4where t is desired (target) output of the network, a is actual output, and η and x are learning rate and input activation, respectively (Rumelhart et al. 1986; Widrow and Hoff 1960). The desired output (t) is analogous to the outcome (λ), the actual output (a) is analogous to the prediction modified during learning (V), and the delta error term (δ = ta) is equivalent to the reinforcement error term (λ-V) of the Rescorla-Wagner rule (Eq. 3) (Sutton and Barto 1981).

The general dependence on outcome unpredictability relates intuitively to the very essence of learning. If learning involves the acquisition or change of predictions of outcome, no change in predictions and hence no learning will occur when the outcome is perfectly well predicted. This restricts learning to stimuli and behavioral reactions that lead to surprising or altered outcomes, and redundant stimuli preceding outcomes already predicted by other events are not learned. Besides their role in bringing about learning, reinforcers have a second, distinctively different function. When learning is completed, fully predicted reinforcers are crucial for maintaining learned behavior and preventing extinction.

Many forms of learning may involve the reduction of prediction errors. In a general sense, these systems process an external event, generate predictions of this event, compute the error between the event and its prediction, and modify both performance and prediction according to the prediction error. This may not be limited to learning systems dealing with biological reinforcers but concern a much larger variety of neural operations, such as visual recognition in cerebral cortex (Rao and Ballard 1997).

Reinforcement algorithms


Neuronal network models can be trained with straightforward reinforcement signals that emit a prediction-independent signal when a behavioral reaction is correctly executed but no signal with an erroneous reaction. Learning in these largely instrumental learning models consists in changing the synaptic weights (ω) of model neurons according toΔω=ɛrxy Equation 5where ɛ is learning rate, r is reinforcement, and x and y are activations of pre- and postsynaptic neurons, respectively, assuring that only synapses participating in the reinforced behavior are modified. A popular example is the associative reward-penalty model (Barto and Anandan 1985). These models acquire skeletal or oculomotor responses, learn sequences, and perform the Wisconsin Card Sorting Test (Arbib and Dominey 1995; Dehaene and Changeux 1991; Dominey et al. 1995; Fagg and Arbib 1992). Processing units in these models acquire similar properties as neurons in parietal association cortex (Mazzoni et al. 1991).

However, the persistence of the teaching signal after learning requires additional algorithms for preventing run away synaptic strengths (Montague and Sejnowski 1994) and for avoiding acquisition of redundant stimuli presented together with reinforcer-predicting stimuli. Previously learned behavior perseveres when contingencies change, as omitted reinforcement fails to induce a negative signal. Learning speed may be increased by adding external information from a teacher (Ballard 1997) and by incorporating information about the past performance (McCallum 1995).


In a particularly efficient class of reinforcement algorithms (Sutton 1988; Sutton and Barto 1981), synaptic weights are modified according to the error in reinforcement prediction computed over consecutive time steps (t) in each trialr^(t)=r(t)+P(t)P(tl) Equation 6where r is reinforcement and P is reinforcement prediction. P(t) is usually multiplied by a discount factor γ with 0 ≤ γ < 1 to account for the decreasing influence of increasingly remote rewards. For reasons of simplicity, γ is set to 1 here. In the case of a single stimulus predicting a single reinforcer, the prediction P(t − 1) exists before the time t of reinforcement but terminates at the time of reinforcement [P(t) = 0]. This leads to an effective reinforcement signal at the time (t) of reinforcementr^ (t)=r(t)P(tl) Equation 6aThe (t) term indicates the difference between actual and predicted reinforcement. During learning, reinforcement is incompletely predicted, the error term is positive when reinforcement occurs, and synaptic weights are increased. After learning, reinforcement is fully predicted by a preceding stimulus [P(t − 1) = r(t)], the error term is nil on correct behavior, and synaptic weights remain unchanged. When reinforcement is omitted due to inadequate performance or changed contingencies, the error is negative and synaptic weights are reduced. The (t) term is analogous to the (λ-V) error term of the Rescorla-Wagner model (Eq. 4 ). However, it concerns individual time steps (t) within each trial rather than predictions evolving over consecutive trials. These temporal models of reinforcement capitalize on the fact that the acquired predictions include the exact time of reinforcement (Dickinson et al. 1976; Gallistel 1990; Smith 1968).

The temporal difference (TD) algorithms also employ acquired predictions for changing synaptic weights. In the case of an unpredicted, single conditioned stimulus predicting a single reinforcer, the prediction P(t) begins at time (t), there is no preceding prediction [P(t − 1) = 0], and reinforcement has not yet occurred [r(t) = 0]. According to Eq. 6, the model emits a purely predictive effective reinforcement signal at the time (t) of the predictionr^=P(t) Equation 6bIn the case of multiple, consecutive predictive stimuli, again with reinforcement absent at the time of predictions, the effective reinforcement signal at the time (t) of the prediction reflects the difference between the current prediction P(t) and the preceding prediction P(t − 1)r^=P(t)P(tl) Equation 6cThis constitutes an error term of higher order reinforcement. Similar to fully predicted reinforcers, all predictive stimuli that are fully predicted themselves are cancelled out [P(t − 1) = P(t)], resulting in = 0 at the times (t) of these stimuli. Only the earliest predictive stimulus contributes to the effective reinforcement signal, as this stimulus P(t) is not predicted by another stimulus [P(t − 1) = 0]. This results in the same = P(t) at the time (t) of the first prediction as in the case of a single prediction (Eq. 6b).

Fig. 9.

Basic architectures of neural network models implementing temporal difference algorithms in comparison with basal ganglia connectivity. A: in the original implementation the effective teaching signal y - ȳ is computed in model neuron A and sent to presynaptic terminals of inputs x to neuron B, thus influencing xB processing and changing synaptic weights at the xB synapse. Neuron B influences behavioral output via axon y and at the same time contributes to the adaptive properties of neuron A, namely its response to reinforcer-predicting stimuli. More recent implementations of this simple architecture use neuron A rather than neuron B for emitting an output O of the model (Montague et al. 1996; Schultz et al. 1997). Reprinted from Sutton and Barto (1981) with permission by American Psychological Association. B: recent implementation separates the teaching component A, called the critic (right), from an output component comprised of several processing units B, termed the actor (left). The effective reinforcement signal (t) is computed by subtracting the temporal difference in weighted reinforcer prediction γP(t) − P(t − 1) from primary reinforcement r(t) received from the environment (γ is the discount factor reducing the value of more distant reinforcers). Reinforcer prediction is computed in a separate prediction unit C, which is a part of the critic and forms a closed loop with the teaching element A, whereas primary reinforcement enters the critic through a separate input rt. Effective reinforcement signal influences synaptic weights at incoming axons in the actor, which mediates the output and in the adaptive prediction unit of the critic. Reprinted from Barto (1995) with permission by MIT Press. C: basic connectivity of the basal ganglia reveals striking similarites with the actor-critic architecture. Dopamine projection emits the reinforcement signal to the striatum and is comparable with the unit A in parts A and B, the limbic striatum (or striosome-patch) takes the position of the prediction unit C in the critic, and the sensorimotor striatum (or matrix) resembles the actor units B. In the original model (A), the single major deviation from established basal ganglia anatomy consists in the influence of neuron A being directed at presynaptic terminals, whereas dopamine synapses are located on postsynaptic dendrites of striatal neurons (Freund et al. 1984). Reprinted from Smith and Bolam (1990) with permission by Elsevier Press.

Taken together, the effective reinforcement signal (Eq. 6 ) is composed of the primary reinforcement, which decreases with emerging predictions (Eq. 6a) and is replaced gradually by the acquired predictions (Eqs. 6b and 6c). With consecutive predictive stimuli, the effective reinforcement signal moves backward in time from the primary reinforcer to the earliest reinforcer-predicting stimulus. The retrograde transfer results in a more specific assignment of credit to the involved synapses, as predictions occur closer in time to the stimuli and behavioral reactions to be conditioned, as compared with reinforcement at trial end (Sutton and Barto 1981).

Implementations of reinforcement learning algorithms employ the prediction error in two ways, for changing synaptic weights for behavioral output and for acquiring the predictions themselves to continuously compute the prediction error (Fig. 9 A) (McLaren 1989; Sutton and Barto 1981). These two functions are separated in recent implementations, in which the prediction error is computed in the adaptive critic component and changes the synaptic weights in the actor component mediating behavioral output (Fig. 9 B) (Barto 1995). A positive error increases the reinforcement prediction of the critic, whereas a negative error from omitted reinforcement reduces the prediction. This renders the effective reinforcement signal highly adaptive.

Neurobiological implementations of temporal difference learning


The dopamine response coding an error in the prediction of reward (Eq. 1 ) closely resembles the effective error term of animal learning rules (λ-V; Eq. 4 ) and the effective reinforcement signal of TD algorithms at the time (t) of reinforcement [r(t) − P(t − 1); Eq. 6a], as noted before (Montague et al. 1996). Similarly, the dopamine appetitive event prediction error (Eq. 2 ) resembles the higher order TD reinforcement error [P(t) − P(t − 1); Eq. 6c]. The nature of the widespread, divergent projections of dopamine neurons to probably all neurons in the striatum and many neurons in frontal cortex is compatible with the notion of a TD global reinforcement signal, which is emitted by the critic for influencing all model neurons in the actor (compare Fig. 7 with Fig. 9 B). The critic-actor architecture is particularly attractive for neurobiology because of its separate teaching and performance modules. In particular, it resembles closely the connectivity of the basal ganglia, including the reciprocity of striatonigral projections (Fig. 9 C), as first noted by Houk et al. (1995). The critic simulates dopamine neurons, the reward prediction enters from striosomal striatonigral projections, and the actor resembles striatal matrix neurons with dopamine-dependent plasticity. Interestingly, both dopamine response and theoretical error terms are sign-dependent. They differ from error terms with absolute values that do not discriminate between acquisition and extinction and should have predominantly attentional effects.


Although originally developed on the basis of the Rescorla-Wagner model of classical conditioning, models using TD algorithms learn a wide variety of behavioral tasks through basically instrumental forms of conditioning. These tasks reach from balancing a pole on a cart wheel (Barto et al. 1983) to playing world class backgammon (Tesauro 1994). Robots using TD algorithms learn to move about two dimensional space and avoid obstacles, reach and grasp (Fagg 1993) or insert a peg into a hole (Gullapalli et al. 1994). Using the TD reinforcement signal to directly influence and select behavior (Fig. 9 A), TD models replicate foraging behavior of honeybees (Montague et al. 1995) and simulate human decision making (Montague et al. 1996). TD models with an explicit critic-actor architecture constitute very powerful models that efficiently learn eye movements (Friston et al. 1994; Montague et al. 1993), sequential movements (Fig. 10), and orienting reactions (Contreras-Vidal and Schultz 1996). A recent model added activating-depressing novelty signals for improving the teaching signal, used stimulus and action traces in the critic and actor, and employed winner-take-all rules for improving the teaching signal and for selecting actor neurons with the largest activation. This reproduced in great detail both the responses of dopamine neurons and the learning behavior of animals in delayed response tasks (Suri and Schultz 1996). It is particularly interesting to see that teaching signals using prediction errors result in faster and more complete learning as compared with unconditional reinforcement signals (Fig. 10) (Friston et al. 1994).

Fig. 10.

Advantage of predictive reinforcement signals for learning. A temporal difference model with critic-actor architecture and eligibility trace in the actor was trained in a sequential 2 step-3 choice task (inset upper left). Learning advanced faster and reached higher performance when a predictive reinforcement signal was used as teaching signal (adaptive critic, top) as compared with using an unconditional reinforcement signal at trial end (bottom). This effect becomes progressively more pronounced with longer sequences. Comparable performance with the unconditional reinforcement signal would require a much longer eligibility trace. Data were obtained from 10 simulations (R. Suri and W. Schultz, unpublished observations). A similar improvement in learning with predictive reinforcement was found in a model of oculomotor behavior (Friston et al. 1994).

Possible learning mechanisms using the dopamine signal

The preceding section has shown that the formal prediction error signal emitted by the dopamine response can constitute a particularly suitable teaching signal for model learning. The following sections describe how the biological dopamine response could be potentially used for learning by basal ganglia structures and suggest testable hypotheses.


Learning would proceed in two steps. The first step involves the acquisition of a dopamine reward-predicting response. In subsequent trials, the predictive dopamine signal would specifically strengthen the synaptic weights (ω) of Hebbian-type corticostriatal synapses that are active at the time of the reward-predicting stimulus, whereas the inactive corticostriatal synapses are left unchanged. This results in the three factor learning ruleΔω=ɛ r^ i o Equation 8where is dopamine reinforcement signal, i is input activity, o is output activity, and ɛ is learning rate.

In a simplified model, four cortical inputs (i1–i4) contact the dendritic spines of three medium size spiny striatal neurons (o1–o3; Fig. 11). Cortical inputs converge on striatal neurons, each input contacting a different spine. The same spines are unselectively contacted by a common dopamine input R. Activation of dopamine input R indicates that an unpredicted reward-predicting stimulus occurred in the environment, without providing further details (goodness signal). Let us assume that cortical input i2 is activated simultaneously with dopamine neurons and codes one of several specific parameters of the same reward-predicting stimulus, such as its sensory modality, body side, color, texture and position, or a specific parameter of a movement triggered by the stimulus. A set of parameters of this event would be coded by a set of cortical inputs i2. Cortical inputs i1, i3, and i4 unrelated to current stimuli and movements are inactive. The dopamine response leads to unselective dopamine release at all varicosities but would selective strengthen only the active corticostriatal synapses i2–o1 and i2–o2, provided the cortical inputs are strong enough to activate striatal neurons o1 and o2.

Fig. 11.

Differential influences of a global dopamine reinforcement signal on selective corticostriatal activity. Dendritic spines of 3 medium-sized spiny striatal neurons o1, o2, and o3 are contacted by 4 cortical inputs i1, i2, i3, and i4 and by axonal varicosities from a single dopamine neuron R (or from a population of homogenously activated dopamine neurons). Each striatal neuron receives ∼10,000 cortical and 1,000 dopamine inputs. At single dendritic spines, different cortical inputs converge with the dopamine input. In 1 version of the model, the dopamine signal enhances simultaneously active corticostriatal transmission relative to nonactive transmission. For example, dopamine input R is active at the same time as cortical input i2, whereas i1, i3, i4 are inactive. This leads to a modification of i2 → o1 and i2 → o2 transmission but leaves i1 → o1, i3 → o2, i3 → o3, and i4 → o3 transmissions unaltered. In a version of the model employing plasticity, synaptic weights of corticostriatal synapses are long-term modified by the dopamine signal according to the same rule. This may occur when dopamine responses to a conditioned stimulus act on corticostriatal synapses that also are activated by this stimulus. In another version employing plasticity, dopamine responses to a primary reward may act backwards in time on corticostriatal synapses that were previously active. These synapses would be made eligible for modification by a hypothetical postsynaptic neuronal trace left from that activity. In comparing the basal ganglia structure with the recent TD model of Fig. 9 B, dopamine input R replicates the critic with neuron A, the striatum with neurons o1–o3 replicates the actor with neuron B, cortical inputs i1–i4 replicate the actor input, and the divergent projection of dopamine neurons R on multiple spines of multiple striatal neurons o1–o3 replicates the global influence of the critic on the actor. A similar comparison was made by Houk et al. (1995). This drawing is based on anatomic data by Freund et al. (1984), Smith and Bolam (1990), Flaherty and Graybiel (1993), and Smith et al. (1994).

This learning mechanism employs the acquired dopamine response at the time of the reward-predicting stimulus as a teaching signal for inducing long-lasting synaptic changes (Fig. 12 A). Learning of the predictive stimulus or triggered movement is based on the demonstrated acquisition of dopamine response to the reward-predicting stimulus, together with dopamine-dependent plasticity in the striatum. Plasticity changes alternatively might occur in cortical or subcortical structures downstream from striatum after dopamine-mediated short-term enhancement of synaptic transmission in the striatum. The retroactive effects of reward on stimuli and movements preceding the reward are mediated by the response transfer to the earliest reward-predicting stimulus. The dopamine response to predicted or omitted primary reward is not used for plasticity changes in the striatum as it does not occur simultaneously with the events to be conditioned, although it could be involved in computing the dopamine response to the reward-predicting stimulus in analogy to the architecture and mechanism of TD models.

Fig. 12.

Influences of dopamine reinforcement signal on possible learning mechanisms in the striatum. A: predictive dopamine reward response to a conditioned stimulus (CS) has a direct enhancing or plasticity effect on striatal neurotransmission related to that stimulus. B: dopamine response to primary reward has a retrograde plasticity effect on striatal neurotransmission related to the preceding conditioned stimulus. This mechanism is mediated by an eligibility trace outlasting striatal activity. Solid arrows indicate direct effects of dopamine signal on striatal neurotransmission (A) or the eligibility trace (B), small arrow in B indicates indirect effect on striatal neurotransmission via the eligibility trace.


Learning may occur in a single step if the dopamine reward signal has a retroactive action on striatal synapses. This requires hypothetical traces of synaptic activity that last until reinforcement occurs and makes those synapses eligible for modification by a teaching signal that were active before reinforcement (Hull 1943; Klopf 1982; Sutton and Barto 19811). Synaptic weights (ω) are changed according toΔω=ɛ r^ h (i,o) Equation 9where is dopamine reinforcement signal, h (i, o) is eligibility trace of conjoint input and output activity, and ɛ is learning rate. Potential physiological substrates of eligibility traces consist in prolonged changes in calcium concentration (Wickens and Kötter 1995), formation of calmodulin-dependent protein kinase II (Houk et al. 1995), or sustained neuronal activity found frequently in striatum (Schultz et al. 1995a) and cortex.

Dopamine-dependent plasticity involving eligibility traces constitutes an elegant mechanism for learning sequences backward in time (Sutton and Barto 1981). To start, the dopamine response to the unpredicted primary reward mediates behavioral learning of the immediately preceding event by modifying corticostriatal synaptic efficacy (Fig. 11). At the same time, the dopamine response transfers to the reward-predicting event. A depression at the time of omitted reward prevents learning of erroneous reactions. In the next step, the dopamine response to the unpredicted reward-predicting event mediates learning of the immediately preceding predictive event, and the dopamine response likewise transfers back to that event. As this occurs repeatedly, the dopamine response moves backward in time until no further events precede, allowing at each step the preceding event to acquire reward prediction. This mechanism would be ideally suited for forming behavioral sequences leading to a final reward.

This learning mechanism fully employs the dopamine error in the prediction of appetitive events as retroactive teaching signal inducing long-lasting synaptic changes (Fig. 12 B). It uses dopamine-dependent plasticity together with striatal elibility traces whose biological suitability for learning remains to be investigated. This results in direct learning by outcome, essentially compatible with the influence of the teaching signal on the actor of TD models. The demonstrated retrograde movement of the dopamine response is used for learning earlier and earlier stimuli.


Both mechanisms described above employ the dopamine response as a teaching signal for modifying neurotransmission in the striatum. As the contribution of dopamine-dependent striatal plasticity to learning is not completely understood, another mechanism could be based on the demonstrated plasticity of the dopamine response without requiring striatal plasticity. In a first step, dopamine neurons acquire responses to reward-predicting stimuli. In a subsequent step, the predictive responses could be used to increase the impact of cortical inputs that occur simultaneously at the same dendritic spines of striatal neurons. Postsynaptic activity would change according toΔactivity=δr^ i Equation 10where is dopamine reinforcement signal, i is input activity, and δ is an amplification constant. Rather than constituting a teaching signal, the predictive dopamine response provides an enhancing or motivating signal for striatal neurotransmission at the time of the reward-predicting stimulus. With competing stimuli, neuronal inputs occurring simultaneously with the reward-predicting dopamine signal would be processed preferentially. Behavioral reactions would profit from the advance information and become more frequent, faster, and more precise. The facilitatory influence of advance information is demonstrated in behavioral experiments by pairing a conditioned stimulus with lever pressing (Lovibond 1983).

A possible mechanism may employ the focusing effect of dopamine. In the simplified model of Fig. 11, dopamine globally reduces all cortical influences. This lets only the strongest input pass to striatal neurons, whereas the other, weaker inputs become ineffective. This requires a nonlinear, contrast-enhancing mechanism, such as the threshold for generating action potentials. A comparable enhancement of strongest inputs could occur in neurons that would be predominantly excited by dopamine.

This mechanism employs the acquired, reward-predicting dopamine response as a biasing or selection signal for influencing postsynaptic processing (Fig. 12 A). Improved performance is based entirely on the demonstrated plasticity of dopamine responses and does not require dopamine-dependent plasticity in striatal neurons. The responses to unpredicted or omitted reward occur too late for influencing striatal processing but may help to compute the predictive dopamine response in analogy to TD models.

Electrical stimulation of dopamine neurons as unconditioned stimulus

Electrical stimulation of circumscribed brain regions reliably serves as reinforcement for acquiring and sustaining approach behavior (Olds and Milner 1954). Some very effective self-stimulation sites coincide with dopamine cell bodies and axon bundles in the midbrain (Corbett and Wise 1980), nucleus accumbens (Phillips et al. 1975), striatum (Phillips et al. 1976), and prefrontal cortex (Mora and Myers 1977; Phillips et al. 1979), but also are found in structures unrelated to dopamine systems (White and Milner 1992). Electrical self-stimulation involves the activation of dopamine neurons (Fibiger and Phillips 1986; Wise and Rompré 1989) and is reduced by 6-hydroxydopamine–induced lesions of dopamine axons (Fibiger et al. 1987; Phillips and Fibiger 1978), inhibition of dopamine synthesis (Edmonds and Gallistel 1977), depolarization inactivation of dopamine neurons (Rompré and Wise 1989), and dopamine receptor antagonists administered systemically (Furiezos and Wise 1976) or into nucleus accumbens (Mogenson et al. 1979). Self-stimulation is facilitated with cocaine- or amphetamine-induced increases in extracellular dopamine (Colle and Wise 1980; Stein 1964; Wauquier 1976). Self-stimulation directly increases dopamine utilization in nucleus accumbens, striatum and frontal cortex (Fibiger et al. 1987; Mora and Myers 1977).

It is intriguing to imagine that electrically evoked dopamine impulses and release may serve as unconditioned stimulus in associative learning, similar to stimulation of octopamine neurons in honeybees learning the proboscis reflex (Hammer 1993). However, dopamine-related self-stimulation differs in at least three important aspects from the natural activation of dopamine neurons. Rather than only activating dopamine neurons, natural rewards usually activate several neuronal systems in parallel and allow the distributed coding of different reward components (see further text). Second, electrical stimulation is applied as unconditional reinforcement without reflecting an error in reward prediction. Third, electrical stimulation is only delivered like a reward after a behavioral reaction, rather than at the time of a reward-predicting stimulus. It would be interesting to apply electrical self-stimulation in exactly the same manner as dopamine neurons emit their signal.

Learning deficits with impaired dopamine neurotransmission

Many studies investigated the behavior of animals with impaired dopamine neurotransmission after local or systemic application of dopamine receptor antagonists or destruction of dopamine axons in ventral midbrain, nucleus accumbens, or striatum. Besides observing locomotor and cognitive deficits reminiscent of Parkinsonism, these studies revealed impairments in the processing of reward information. The earliest studies argued for deficits in the subjective, hedonic perception of rewards (Wise 1982; Wise et al. 1978). Further experimentation revealed impaired use of primary rewards and conditioned appetitive stimuli for approach and consummatory behavior (Beninger et al. 1987; Ettenberg 1989; Miller et al. 1990; Salamone 1987; Ungerstedt 1971; Wise and Colle 1984; Wise and Rompre 1989). Many studies described impairments in motivational and attentional processes underlying appetitive learning (Beninger 1983, 1989; Beninger and Hahn 1983; Fibiger and Phillips 1986; LeMoal and Simon 1991; Robbins and Everitt 1992, 1996; White and Milner 1992; Wise 1982). Most learning deficits are associated with impaired dopamine neurotransmission in nucleus accumbens, whereas dorsal striatum impairments lead to sensorimotor deficits (Amalric and Koob 1987; Robbins and Everitt 1992; White 1989). However, the learning of instrumental tasks in general and of discriminative stimulus properties in particular appear to be frequently spared, and it is not entirely resolved whether some of the apparent learning deficits may be confounded by motor performance deficits (Salamone 1992).

Degeneration of dopamine neurons in Parkinson's disease also leads to a number of declarative and procedural learning deficits, including associative learning (Linden et al. 1990; Sprengelmeyer et al. 1995). Deficits are present in trial-and-error learning with immediate reinforcement (Vriezen and Moscovitch 1990) and when associating explicit stimuli with different outcomes (Knowlton et al. 1996), even in early stages of Parkinson's disease without cortical atrophy (Canavan et al. 1989). Parkinsonian patients also show impaired time perception (Pastor et al. 1992). All of these deficits occur in the presence of L-Dopa treatment, which restores tonic striatal dopamine levels without reinstating phasic dopamine signals.

These studies suggest that dopamine neurotransmission plays an important role in the processing of rewards for approach behavior and in forms of learning involving associations between stimuli and rewards, whereas an involvement in more instrumental forms of learning could be questioned. It is unclear whether these deficits reflect a more general behavioral inactivation due to tonically reduced dopamine receptor stimulation rather than the absence of a phasic dopamine reward signal. To resolve this question, as well as more specifically elucidate the role of dopamine in different learning forms, it would be helpful to study learning in those situations in which the phasic dopamine response to appetitive stimuli actually occurs.

Forms of learning possibly mediated by the dopamine signal

The characteristics of dopamine responses and the potential influence of dopamine on striatal neurons may help to delineate some of the learning forms in which dopamine neurons could be involved. The preferential responses to appetitive as opposed to aversive events would favor an involvement in the learning of approach behavior and mediating positive reinforcement effects, rather than withdrawal and punishment. The responses to primary rewards outside of tasks and learning contexts would allow dopamine neurons to play a role in a relatively wide spectrum of learning involving primary rewards, both in classical and instrumental conditioning. The responses to reward-predicting stimuli reflect stimulus-reward associations and would be compatible with an involvement in reward expectation underlying general incentive learning (Bindra 1968). By contrast, dopamine responses do not explicitly code rewards as goal objects, as they only report errors in reward prediction. They also appear to be insensitive to motivational states, thus disfavoring a specific role in state-dependent incentive learning of goal-directed acts (Dickinson and Balleine 1994). The lack of clear relationships to arm and eye movements would disfavor a role in directly mediating the behavioral responses that follow incentive stimuli. However, comparisons between discharges of individual neurons and learning of whole organisms are intrinsically difficult. At the synaptic level, phasically released dopamine reaches many dendrites on probably every striatal neuron and thus could exert a plasticity effect on the large variety of behavioral components involving the striatum, which may include the learning of movements.

The specific conditions in which phasic dopamine signals could play a role in learning are determined by the kinds of stimuli that effectively induce a dopamine response. In the animal laboratory, dopamine responses require the phasic occurrence of appetitive, novel, or particularly salient stimuli, including primary nutrient rewards and reward-predicting stimuli, whereas aversive stimuli do not play a major role. Dopamine responses may occur in all behavioral situations controlled by phasic and explicit outcomes, although higher order conditioned stimuli and secondary reinforcers were not yet tested. Phasic dopamine responses would probably not play a role in forms of learning not mediated by phasically occurring outcomes, and the predictive response would not be able to contribute to learning in situations in which phasic predictive stimuli do not occur, such as relatively slow changes of context. This leads to the interesting question of whether the sparing of some forms of learning by dopamine lesions or neuroleptics might simply reflect the absence of phasic dopamine responses in the first place because the effective stimuli eliciting them were not used.

The involvement of dopamine signals in learning may be illustrated by a theoretical example. Imagine dopamine responses during acquisition of a serial reaction time task when a correct reaction suddenly leads to a nutrient reward. The reward response subsequently is transferred to progressively earlier reward-predicting stimuli. Reaction times improve further with prolonged practice as the spatial positions of targets become increasingly predictable. Although dopamine neurons continue to respond to the reward-predicting stimuli, the further behavioral improvement might be mainly due to the acquisition of predictive processing of spatial positions by other neuronal systems. Thus dopamine responses would occur during the initial, incentive part of learning in which subjects come to approach objects and obtain explicit primary, and possibly conditioned, rewards. They would be less involved in situations in which the progress of learning goes beyond the induction of approach behavior. This would not restrict the dopamine role to initial learning steps, as many situations require to initially learn from examples and only later involve learning by explicit outcomes.


Prediction error

The prediction error signal of dopamine neurons would be an excellent indicator of the appetitive value of environmental events relative to prediction but fails to discriminate among foods, liquids, and reward-predicting stimuli and among visual, auditory, and somatosensory modalities. This signal may constitute a reward alert message by which postsynaptic neurons are informed about the surprising appearance or omission of a rewarding or potentially rewarding event without indicating further its identity. It has all the formal characteristics of a powerful reinforcement signal for learning. However, information about the specific nature of rewards is crucial for determining which of the objects should be approached and in which manner. For example, a hungry animal should primarily approach food but not liquid. To discriminate relevant from irrelevant rewards, the dopamine signal needs to be supplemented by additional information. Recent in vivo dialysis experiments showed higher food-induced dopamine release in hungry than in satiated rats (Wilson et al. 1995). This drive dependence of dopamine release may not involve impulse responses, as we have failed to find clear drive dependence with dopamine responses when comparing between early and late periods of individual experimental sessions during which animals became fluid-satiated (J. L. Contreras-Vidal and W. Schultz, unpublished data).

Reward specifics

Information concerning liquid and food rewards also is processed in brain structures other than dopamine neurons, such as dorsal and ventral striatum, subthalamic nucleus, amygdala, dorsolateral prefrontal cortex, orbitofrontal cortex, and anterior cingulate cortex. However, these structures do not appear to emit a global reward prediction error signal similar to dopamine neurons. In primates, these structures process rewards as 1) transient responses after the delivery of reward (Apicella et al. 1991a,b, 1997; Bowman et al. 1996; Hikosaka et al. 1989; Niki and Watanabe 1979; Nishijo et al. 1988; Tremblay and Schultz 1995; Watanabe 1989), 2) transient responses to reward-predicting cues (Aosaki et al. 1994; Apicella et al. 1991b; 1996; Hollerman et al. 1994; Nishijo et al. 1988; Thorpe et al. 1983; Tremblay and Schultz 1995; Williams et al. 1993), 3) sustained activations during the expectation of immediately upcoming rewards (Apicella et al. 1992; Hikosaka et al. 1989; Matsumura et al. 1992; Schultz et al. 1992; Tremblay and Schultz 1995), and 4) modulations of behavior-related activations by predicted reward (Hollerman et al. 1994; Watanabe 1990, 1996). Many of these neurons differentiate well between different food rewards and between different liquid rewards. Thus they process the specific nature of the rewarding event and may serve the perception of rewards. Some of the reward responses depend on reward unpredictability and are reduced or absent when the reward is predicted by a conditioned stimulus (Apicella et al. 1997; Matsumoto et al. 1995; L. Tremblay and W. Schultz, unpublished data). They may process predictions for specific rewards, although it is unclear whether they signal prediction errors as their responses to omitted rewards are unknown.

Maintaining established performance

Three neuronal mechanisms appear to be important for maintaining established behavioral performance, namely the detection of omitted rewards, the detection of reward-predicting stimuli, and the detection of predicted rewards. Dopamine neurons are depressed when predicted rewards are omitted. This signal could reduce the synaptic efficacy related to erroneous behavioral responses and prevent their repetition. The dopamine response to reward-predicting stimuli is maintained during established behavior and thus continues to serve as advance information. Although fully predicted rewards are not detected by dopamine neurons, they are processed by the nondopaminergic cortical and subcortical systems mentioned above. This would be important for avoiding extinction of learned behavior.

Taken together, it appears that the processing of specific rewards for learning and maintaining approach behavior would profit strongly from a cooperation between dopamine neurons signaling the unpredicted occurrence or omission of reward and neurons in the other structures simultaneously, indicating the specific nature of the reward.


Noradrenaline neurons

Nearly the entire population of noradrenaline neurons in locus coeruleus in rats, cats, and monkeys shows rather homogeneous, biphasic activating-depressant responses to visual, auditory, and somatosensory stimuli eliciting orienting reactions (Aston-Jones and Bloom 1981; Foote et al. 1980; Rasmussen et al. 1986). Particularly effective are infrequent events to which animals pay attention, such as visual stimuli in an oddball discrimination task (Aston-Jones et al. 1994). Noradrenaline neurons discriminate very well between arousing or motivating and neutral events. They rapidly acquire responses to new target stimuli during reversal and lose responses to previous targets before behavioral reversal is completed (Aston-Jones et al. 1997). Responses occur to free liquid outside of any task and transfer to reward-predicting target stimuli within a task as well as to primary and conditioned aversive stimuli (Aston-Jones et al. 1994; Foote et al. 1980; Rasmussen and Jacobs 1986; Sara and Segal 1991). Responses are often transient and appear to reflect changes in stimulus occurrence or meaning. Activations may occur only for a few trials with repeated presentations of food objects (Vankov et al. 1995) or with conditioned auditory stimuli associated with liquid reward, aversive air puff, or electric foot shock (Rasmussen and Jacobs 1986; Sara and Segal 1991). During conditioning, responses occur to the first few presentations of novel stimuli and reappear transiently whenever reinforcement contingencies change during acquisition, reversal, and extinction (Sara and Segal 1991).

Taken together, the responses of noradrenaline neurons resemble the responses of dopamine neurons in several respects, being activated by primary rewards, reward-predicting stimuli, and novel stimuli and transferring the response from primary to conditioned appetitive events. However, noradrenaline neurons differ from dopamine neurons by responding to a much larger variety of arousing stimuli, by responding well to primary and conditioned aversive stimuli, by discriminating well against neutral stimuli, by rapidly following behavioral reversals, and by showing decrementing responses with repeated stimulus presentation which may require 100 trials for solid appetitive responses (Aston-Jones et al. 1994). Noradrenaline responses are strongly related to the arousing or attention-grabbing properties of stimuli eliciting orienting reactions while being much less focused on appetitive stimulus properties like most dopamine neurons. They are probably driven more by attention-grabbing than motivating components of appetitive events.

Serotonin neurons

Activity in the different raphe nuclei facilitates motor output by setting muscle tone and stereotyped motor activity (Jacobs and Fornal 1993). Dorsal raphe neurons in cats show phasic, nonhabituating responses to visual and auditory stimuli of no particular behavioral meaning (Heym et al. 1982; LeMoal and Olds 1979). These responses resemble responses of dopamine neurons to novel and particularly salient stimuli. Further comparisons would require more detailed experimentation.

Nucleus basalis Meynert

Primate basal forebrain neurons are activated phasically by a large variety of behavioral events including conditioned, reward-predicting stimuli and primary rewards. Many activations depend on memory and associations with reinforcement in discrimination and delayed response tasks. Activations reflect the familiarity of stimuli (Wilson and Rolls 1990a), become more important with stimuli and movements occurring closer to the time of reward (Richardson and DeLong 1990), differentiate well between visual stimuli on the basis of appetitive and aversive associations (Wilson and Rolls 1990b), and change within a few trials during reversal (Wilson and Rolls 1990c). Neurons also are activated by aversive stimuli, predicted visual and auditory stimuli, and movements. They respond frequently to fully predicted rewards in well established behavioral tasks (Mitchell et al. 1987; Richardson and DeLong 1986, 1990), although responses to unpredicted rewards are more abundant in some studies (Richardson and DeLong 1990) but not in others (Wilson and Rolls 1990ac). In comparison with dopamine neurons, they are activated by a much larger spectrum of stimuli and events, including aversive events, and do not show the rather homogeneous population response to unpredicted rewards and its transfer to reward-predicting stimuli.

Cerebellar climbing fibers

Probably the first error-driven teaching signal in the brain was postulated to involve the projection of climbing fibers from the inferior olive to Purkinje neurons in the cerebellar cortex (Marr 1969), and many cerebellar learning studies are based with this concept (Houk et al. 1996; Ito 1989; Kawato and Gomi 1992; Llinas and Welsh 1993). Climbing fiber inputs to Purkinje neurons transiently change their activity when loads for movements or gains between movements and visual feedback are changed and monkeys adapt to the new situation (Gilbert and Thach 1977; Ojakangas and Ebner 1992). Most of these changes consist of increased activity rather than the activation versus depression responses seen with errors in opposing directions in dopamine neurons. If climbing fiber activation were to serve as teaching signal, conjoint climbing fiber-parallel fiber activation should lead to changes in parallel fiber input to Purkinje neurons. This occurs indeed as long-term depression of parallel fiber input, mainly in in vitro preparations (Ito 1989). However, comparable parallel fiber changes are more difficult to find in behavioral learning situations (Ojakangas and Ebner 1992), leaving the consequences of potential climbing fiber teaching signals open at the moment.

A second argument for a role of climbing fibers in learning involves aversive classical conditioning. A fraction of climbing fibers is activated by aversive air puffs to the cornea. These responses are lost after Pavlovian eyelid conditioning using an auditory stimulus (Sears and Steinmetz 1991), suggesting a relationship to the unpredictability of primary aversive events. After conditioning, neurons in the cerebellar interpositus nucleus respond to the conditioned stimulus (Berthier and Moore 1990; McCormick and Thompson 1984). Lesions of this nucleus or injections of the GABA antagonist bicuculline into the inferior olive prevents the loss of inferior olive air puff responses after conditioning, suggesting that monosynaptic or polysynaptic inhibition from interpositus to inferior olive suppresses responses after conditioning (Thompson and Gluck 1991). This might allow inferior olive neurons to be depressed in the absence of predicted aversive stimuli and thus report a negative error in the prediction of aversive events similar to dopamine neurons.

Thus climbing fibers may report errors in motor performance and errors in the prediction of aversive events, although this may not always involve bidirectional changes as with dopamine neurons. Climbing fibers do not appear to acquire responses to conditioned aversive stimuli, but such responses are found in nucleus interpositus. The computation of aversive prediction errors may involve descending inhibitory inputs to inferior olive neurons, in analogy to striatal projections to dopamine neurons. Thus cerebellar circuits process error signals, albeit differently than dopamine neurons and TD models, and they might implement error learning rules like the Rescorla-Wagner rule (Thompson and Gluck 1991) or the formally equivalent Widrow-Hoff rule (Kawato and Gomi 1992).


Impaired dopamine neurotransmission with Parkinson's disease, experimental lesions or neuroleptic treatment is associated with many behavioral deficits in movement (akinesia, tremor, rigidity), cognition (attention, bradyphrenia, planning, learning), and motivation (reduced emotional responses, depression). The range of deficits appears too wide to be simply explained by a malfunctioning dopamine reward signal. Most deficits are considerably ameliorated by systemic dopamine precursor or receptor agonist therapy, although this cannot in a simple manner restitute the phasic information transmission by neuronal impulses. However, many appetitive deficits are not restored by this therapy, such as pharmacologically induced discrimination deficits (Ahlenius 1974) and parkinsonian learning deficits (Canavan et al. 1989; Knowlton et al. 1996; Linden et al. 1990; Sprengelmeyer et al. 1995; Vriezen and Moscovitch 1990).

From these considerations, it appears that dopamine neurotransmission plays two separate functions in the brain, the phasic processing of appetitive and alerting information and the tonic enabling of a wide range of behaviors without temporal coding. Deficits in a similar double dopamine function may underlie the pathophysiology of schizophrenia (Grace 1991). It is interesting to note that phasic changes of dopamine activity may occur at different time scales. Whereas the reward responses follow a time course in the order of tens and hundreds of milliseconds, dopamine release studies with voltammetry and microdialysis concern time scales of minutes and reveal a much wider spectrum of dopamine functions, including the processing of rewards, feeding, drinking, punishments, stress, and social behavior (Abercrombie et al. 1989; Church et al. 1987b; Doherty and Gratton 1992; Louilot et al. 1986; Young et al. 1992, 1993). It appears that dopamine neurotransmission follows at least three time scales with progressively wider roles in behavior, from the fast, rather restricted function of signaling rewards and alerting stimuli via a slower function of processing a considerable range of positively and negatively motivating events to the tonic function of enabling a large variety of motor, cognitive, and motivational processes.

The tonic dopamine function is based on low, sustained, extracellular dopamine concentrations in the striatum (5–10 nM) and other dopamine-innervated areas that are sufficient to stimulate extrasynaptic, mostly D2 type dopamine receptors in their high affinity state (9–74 nM; Fig. 8) (Richfield et al. 1989). This concentration is regulated locally within a narrow range by synaptic overflow and extrasynaptic dopamine release induced by tonic spontaneous impulse activity, reuptake transport, metabolism, autoreceptor-mediated release and synthesis control, and presynaptic glutamate influence on dopamine release (Chesselet 1984). The importance of ambient dopamine concentrations is demonstrated experimentally by the deleterious effects of unphysiologic levels of receptor stimulation. Reduced dopamine receptor stimulation after lesions of dopamine afferents or local administration of dopamine antagonists in prefrontal cortex lead to impaired performance of spatial delayed response tasks in rats and monkeys (Brozoski et al. 1979; Sawaguchi and Goldman-Rakic 1991; Simon et al. 1980). Interestingly, increases of prefrontal dopamine turnover induce similar impairments (Elliott et al. 1997; Murphy et al. 1996). Apparently, the tonic stimulation of dopamine receptors should be neither too low nor too high to assure an optimal function of a given brain region. Changing the influence of well-regulated, ambient dopamine would compromise the correct functioning of striatal and cortical neurons. Different brain regions may require specific levels of dopamine for mediating specific behavioral functions. It may be speculated that ambient dopamine concentrations are also necessary for maintaining striatal synaptic plasticity induced by a dopamine reward signal. A role of tonic dopamine on synaptic plasticity is suggested by the deleterious effects of dopamine receptor blockade or D2 receptor knockout on posttetanic depression (Calabresi et al. 1992a, 1997).

Numerous other neurotransmitters exist also in low ambient concentrations in the extracellular liquid, such as glutamate in striatum (0.9 μM) and cortex (0.6 μM) (Herrera-Marschitz et al. 1996). This may be sufficient to stimulate highly sensitive NMDA receptors (Sands and Barish 1989) but not other glutamate receptor types (Kiskin et al. 1986). Ambient glutamate facilitates action potential activity via NMDA receptor stimulation in hippocampus (Sah et al. 1989) and activates NMDA receptors in cerebral cortex (Blanton and Kriegstein 1992). Tonic glutamate levels are regulated by uptake in cerebellum and increase during phylogenesis, influencing neuronal migration via NMDA receptor stimulation (Rossi and Slater 1993). Other neurotransmitters exist as well in low ambient concentrations, such as aspartate and GABA in striatum and frontal cortex (0.1 μM and 20 nM, respectively) (Herrera-Marschitz et al. 1996), and adenosine in hippocampus where it is involved in presynaptic inhibition (Manzoni et al. 1994). Although incomplete, this list suggests that neurons in many brain structures are permanently bathed in a soup of neurotransmitters that has powerful, specific, physiological effects on neuronal excitability.

Given the general importance of tonic extracellular concentrations of neurotransmitters, it appears that the wide range of parkinsonian symptoms would not be due to deficient transmission of reward information by dopamine neurons but reflect a malfunction of striatal and cortical neurons due to impaired enabling by reduced ambient dopamine. Dopamine neurons would not be actively involved in the wide range of processes deficient in parkinsonism but simply provide the background concentration of dopamine necessary to maintain proper functioning of striatal and cortical neurons involved in these processes.


I thank Drs. Dana Ballard, Anthony Dickinson, Francois Gonon, David D. Potter, Traverse Slater, Roland E. Suri, Richard S. Sutton, and R. Mark Wightman for enlightening discussions and comments, and also two anonymous referees for extensive comments.

The experimental work was supported by the Swiss National Science Foundation (currently 31.43331.95), the Human Capital and Mobility and the Biomed 2 programs of the European Community via the Swiss Office of Education and Science (CHRX-CT94–0463 via 93.0121 and BMH4-CT95–0608 via 95.0313–1), the James S. McDonnell Foundation, the Roche Research Foundation, the United Parkinson Foundation (Chicago), and the British Council.


View Abstract