|
|
||||||||
1Computation and Neural Systems and 2Division of Biology, California Institute of Technology, Pasadena, California
Submitted 10 January 2005; accepted in final form 13 April 2005
|
|
ABSTRACT |
|---|
|
200-ms delay following the target-acquiring saccade in the memory task but often fired concurrently with the target-acquiring saccade in the object task. The hypothesis that this postsaccadic bursting activity reflects the expectation of a reward was tested with a series of manipulations to the memory-guided saccade task. It was found that although the timing of the bursting activity corresponds to a visual feedback stimulus, the visual feedback is not required for the neurons to discharge a burst. Second, blocks of no-reward trials reveal an extinction of the bursting activity as the monkeys come to understand that they would not be rewarded for properly generated saccades. Finally, the delivery of unexpected rewards confirmed that in many of the neurons, the activity is not related to a motor plan to acquire the reward (e.g., licking). Thus we conclude that reward expectancy is represented by the activity of SMA neurons, even in the context of an oculomotor task. These results suggest that the reward expectancy signal is broadcast over a large extent of motor cortex, and may facilitate the learning of new, coordinated behavior between different body parts. |
|
INTRODUCTION |
|---|
|
The DMFC has been shown to participate in volitional (Schlag and Schlag-Rey 1985) or goal-oriented motor acts (Mann et al. 1988
). It contains at least three well-studied motor-representation areas that are thought to be involved in higher-order control of behavior: the SEF (Schlag and Schlag-Rey 1985, 1987
), the SMA (Luppino et al. 1991
; Matsuzaka et al. 1992
) and the presupplementary motor area (pre-SMA) (Fujii et al. 2002
; Nakamura et al. 1998
; Shima and Tanji 2000
). These three motor areas of DMFC can be distinguished based on anatomical connectivity (Luppino et al. 1993
; Parthasarathy et al. 1992
) and physiological responsivity (Matsuzaka et al. 1992
). The SMA and pre-SMA are located in the DMFC on and above the medial wall in the frontal lobe. An orofacial region occupies the rostral end of the SMA, and further rostral is the pre-SMA. Intracortical microstimulation evokes movements in both areas, though the movements evoked in the pre-SMA require longer trains of pulses that produce more complex movements (Fujii et al. 2002
). The pre-SMA has been implicated in planned motor acts (Matsuzaka and Tanji 1996
) and the acquisition (Nakamura et al. 1998
), planning, and regulating (Shima and Tanji 2000
) of sequential procedures. Additionally, pre-SMA neurons respond more often to visual stimulation compared with SMA neurons. The SMA consists of a rostrocaudal progression of orofacial, forelimb, and hindlimb movement representations (Mitz and Wise 1987
). Lateral to the SMA, microstimulation will evoke eye movements. This area is defined as SEF (Fujii et al. 2002
). Several studies have shown that electrical microstimulation at low currents (<50 µA and sometimes as low at 10 µA) will elicit saccades in SEF (Chen and Wise 1995
; Fujii et al. 1995
; Mann et al. 1988
; Russo and Bruce 1993
; Tehovnik and Sommer 1996
).
Three recent studies have explicitly connected the SEF to reward variables. Amador et al. (2000)
discovered reward-predicting and -detecting neuronal activity in SEF. Schall and colleagues used the countermanding task to characterize three different types of neurons in the SEFerror, conflict, and reinforcement neuronsand suggested that these could serve a performance monitoring function (Stuphorn et al. 2000
). Roesch and Olson (2003
, 2004
) found modulations of neural activity in response to both reward and punishment and concluded that these modulations during the early stages of the trials correlate with the motivation and not reward expectation. In this study, we present neural activity reflecting reward expectation during a later stage of the trials, specifically after the monkey performs the instructed behavior. Our findings are similar to the reports of Amador and colleagues and Stuphorn and colleagues; however, the reward expectancy signal wedescribe is found in the SMA, whereas these other studies were recording from nearby SEF. Taken together, these results suggest that a reward expectancy signal may be present throughout the DMFC.
In this study, we present evidence that a reward-expectancy signal is expressed in the neural activity of the SMA during the performance of an oculomotor task. The signal is not related to the metric of the eye movement. Rather it encodes expectation of the reward after the successful completion of the instructed behavior. The results presented here began as a discovery during a project that was originally intended to investigate the contribution of the SEF to saccades to objects. Some early recordings in the SMA uncovered a postsaccadic bursting activity that we hypothesized might be related to the expectation of reward, and experiments devised during the course of the project confirmed this hypothesis. While future studies of reward expectancy in SMA might use tasks that are specifically designed to investigate reward variables, with the two oculomotor tasks employed here we are able to establish two novel findings. First, we show that a reward expectancy signal is present in the SMA, an area that has been thought to be concerned only with movements of the body and limbs, during an oculomotor task. Second, we show the coupling of the signals onset time with a secondary reinforcer. These findings suggest a general learning mechanism that would reinforce all motor representations in DMFC that are active just before the animal can expect to receive a reward. A preliminary account of this study has appeared previously (Campos et al. 2003
).
|
|
METHODS |
|---|
|
Stimuli and tasks
Monkeys were seated in a dimly lit room, 42 cm from a tangent screen. Stimuli were rear-projected with 800 x 600 resolution and a refresh rate of 72 Hz using a custom-built software display client with OpenGL. Task logic was controlled by National Instruments real-time LabView software.
Two eye-movement tasks were used: a memory-guided saccade task and an object-based saccade task. In both tasks, the monkey was instructed to perform a saccade from a central fixation point to 1 of 43 targets placed at regular intervals to cover the entire visual field out to 17° of visual angle in every direction from central fixation.
In the memory-guided saccade task (Fig. 1A), monkeys were required to maintain central fixation while a peripheral target was briefly flashed, wait until the central fixation point extinguished, and then saccade to the remembered location. After successfully holding fixation at the target location, the target re-appeared to provide visual feedback of the correct eye position. The monkey then had to maintain fixation on the visible target for an additional interval of 250 before receiving a juice reward of
0.2 ml.
|
0.2 ml) of juice. The memory-guided and object-based saccade tasks were designed to investigate the neural computations supporting object-based saccades; however, the important difference between the tasks for the purposes of this study was actually what happened after the saccade was completed. In the object task, the target was visible at the time of the saccade, and the monkey could acquire a visible target. In the memory task, the target reappeared 250 ms after the saccade to the remembered location. Thus in the object task the animals received earlier feedback from a secondary reinforcement stimulus.
In a recording session, a block of memory-guided saccades preceded a block of object based saccades. The memory-guided saccade block consisted of three correct saccades to each location. The object-based saccade block consisted of 12 correct saccades to each location. Control trials were performed during the memory-guided saccade block at the discretion of the experimenter.
Recording procedure
Neurons were accessed on vertical penetrations with glass-coated platinum-iridium electrodes (Fred Haer). The electrodes were advanced with a Fred Haer or Narashige microdrive system through a blunt stainless steel guide tube pressed against the dura. Neurons were generally found 13 mm beneath the exterior of the dura.
Waveforms were amplified and isolated on-line with a commercial hardware and software package (Plexon). Cell activity was monitored with custom built on-line data visualization software written in Matlab.
Data analysis
Bursting activity was identified using a burst detection algorithm (Hanes et al. 1995
; Thompson et al. 1996
). Bursts were initially detected by a threshold crossing of a surprise index (SI), which is the negative of the log of a calculated significance level. The significance level describes the likelihood that the observed number of spikes would occur in a given interval, considering the average firing rate of the cell, based on the assumption that the inter-spike intervals follow a Poisson distribution. The significance level used to calculate the threshold for the SI was 0.01. The mean of the Poisson distribution was calculated as the number of spikes in the trial divided by the duration of the trial. Because the mean can change from trial to trial, the algorithm assumes stationarity only over the duration of a single trial, and the threshold will adapt to changes in the baseline firing rate of the neuron over time. After the initial threshold crossing, the beginning and end of the burst were precisely identified, and multiple bursts could be identified in a single spike train (Thompson et al. 1996
).
For ANOVA of firing activity in task intervals, the intervals were defined as follows. The baseline period was the interval between the acquisition of the fixation point and the cue appearance. The cue period was the interval that the cue was visible, and the memory period was the interval between the cue disappearance and the fixation point disappearance (the signal for the monkey to make the saccade). The saccade period was the 200-ms interval preceding the acquisition of the target, and the postsaccadic period was the interval from the target acquisition until the delivery of reward. All intervals were defined by these same events in both the memory and object-based tasks. The duration of the postsaccadic interval was 500 ms in the memory task, and 250 ms in the object-based saccade task.
Electrical stimulation
A BAK instruments stimulator was used to deliver biphasic currents at 330 Hz of typically <200 µA in 100- to 500-ms trains through the recording electrodes.
Electromyography
Electromyographic (EMG) recordings were performed in one monkey with a World Precision Instruments (DAM 80) AC/DC amplifier, and paired hook-wire electrodes (44 ga x 100 mm) from Viasys healthcare.
MR imaging
Magnetic resonance (MR) imaging was performed at the Caltech Brain Imaging Center on a 3 T Siemens Trio. Anatomical images were acquired sagittally with 0.7 mm slice thickness using an in plane field of view of 168 x 168 mm on a 256 x 256 base matrix, yielding a final native voxel resolution of 0.656 x 0.656 x 0.7 mm. These images were realigned via multi-planar reformat to recording chamber landmarks using Siemens Syngo software (version MR 2003T DHHS.) This rotated volume was resliced at 0.7 mm spacing along the z axis of the chamber and visualized using the AFNI software package (Cox 1996
).
|
|
RESULTS |
|---|
|
|
|
The sites of all of the electrode penetrations included in this study are superimposed on axial MRI scans in Fig. 2 (A and B). While recordings were taken on the surface of cortex, MRI sections for anatomical localization were chosen at a depth appropriate to clearly show the locations of the penetrations relative to surrounding sulci.
|
While much of SMA is in F3 on the mesial surface, there is also a portion of F3 on the dorsal surface, within
3 mm of the midline that is also considered SMA proper (Luppino et al. 1991
; Matsuzaka et al. 1992
). The neurons of interest in this report, indicated in red on the axial slices in Fig. 2, mostly cluster within this distance to the left (monkeys right) of the midline for monkey S and to the right (monkeys left) of the midline for monkey R. No recordings were performed in SEF. In both monkeys, some of the recordings were in area F2, lateral to SMA-proper (Luppino et al. 1991
). In monkey S, the majority of the recordings were directly medial to the genu of the arcuate sulcus, whereas in monkey R, the recordings were medial and somewhat posterior. The SEF is medial to the arcuate sulcus and somewhat anterior, although there is some variability in the precise location of SEF as described in previous studies. See Sommer and Tehovnik (1999)
for review.
Microstimulation
Electrical stimulation experiments show the progression of body movement responses typical of the SMA (Mitz and Wise 1987
). Because eye movements were not observed to be elicited in either of the monkeys by stimulation of 50 µA, which is the upper limit of the low-threshold criterion for eliciting eye movements in the SEF (Russo and Bruce 1993
), or even currents as high as 200 µA, the recordings were not in the oculomotor area SEF.
Population characteristics
The average spike activity recorded during memory saccades from all sites for each monkey is summarized in Fig. 2, C and D. The average firing rate aligned to the target acquire event (end of saccade) is shown. The activity from monkey R (Fig. 2D) is exclusively postsaccadic. In monkey S, saccadic and memory period activity was also observed (Fig. 2C); this could be due to the more anterior placement of the chamber. In monkey S, the dominant peak of activity still occurs after a delay following the target acquire event.
Although there were many neurons with modulated activity during different epochs of the task (see Table 2), very few were spatially tuned. A two-way ANOVA between baseline firing rates and 1) the firing rates from different task intervals and 2) the spatial locations of the targets was used to confirm this observation. A very small number of neurons passed the significance test (P < 103) for dependence of firing rate on task interval and target location (cue: 1; memory: 7; saccade: 6).
Shift in burst onset times
The postsaccadic burst in both trial types (Fig. 3, A and Bmemory; C and Dobject) for one of these neurons is illustrated with raster plots of spike traces aligned to the target acquire event (A and C) and the reward delivery (B and D). Bursts of activity identified with the burst detection algorithm (METHODS) are shown as horizontal blue lines beneath the spike trains. The bursts in the object task (C) begin at a time that could be related to saccade generation. However, the bursts of activity in the memory task (A) come substantially later, revealing that these bursts do not participate in the generation of a saccade or at least not in the context of the memory saccade task. For this neuron, the postsaccadic firing terminates with reward delivery. Other neurons (see Fig. 5 for example) were also observed to terminate just before or soon after reward delivery.
|
|
|
30% of the trials in both tasks (n = 30) is plotted as a histogram in Fig. 4B. In general, the bursting activity came later, relative to the target acquire event, in the memory task compared with the object-based task. Neurons in this category showed a mean shift in the onset time of the burst of 202 ms. This number is comparable to, though slightly less than, the amount of the time the animal was required to fixate the remembered target location in the memory task before the reappearance of the target (250 ms). The onset time of the burst corresponded to the appearance of the visual feedback, which was immediate in the object-based task but delayed in the memory-guided saccade task. In both cases, the visual feedback could serve as a predictor of a reward. The hypothesis that bursting activity reflects an expectation of reward was then tested in a series of control experiments outlined in the following text.
Bursting does not accompany nonrewarded target acquisitions
It could be argued that the neurons simply signal the acquisition of any target, regardless of the expectation of reward. We tested this possibility by comparing the activity of the neurons after the initial acquisition of the fixation point with the activity after the reappearance of the target in the memory-guided saccade task. We used ANOVA on two intervals: the first interval was between the fixation acquire event and the appearance of the cue (250 ms), and the second interval was between the target onset and the reward delivery (250 ms). This analysis reveals that of the 50 neurons with a significant postsaccadic modulation, none were significantly active during the initial acquisition of the fixation point.
Bursting is not a visual response
Control trials of the memory-guided saccade task in which the visual feedback was withheld were run to test whether or not the bursting activity is related to the visual feedback signal. As shown in Fig. 3 and again in Fig. 5, removal of the visual feedback (indicated in the figures with the green bar composed of green stars) does not eliminate the onset of the bursting activity, though it may reduce the intensity, or vary the onset time. Figure 5A shows an example in which the postsaccadic bursting activity was slightly extended by this control, although otherwise unchanged.
The bursting signal is therefore not indicating the reappearance of the target, although the visual reinforcement serves to sharpen and intensify the neural discharge. This control was run on 34 neurons, and 12 of them showed no significant difference in the mean firing rate from the time the target appeared (or should have appeared) until the end of the trial (ANOVA, P < 105) in control versus normal trials. Of the remaining neurons, many exhibited a temporal shift in their active periods, or a decrease in firing, but only one showed an extinction of the bursting activity. This control shows that visual feedback could be dissociated from the reward delivery, and the neural response remained.
Bursting properties in the absence of reward
To see if, all else being equal, the absence of reward would have an effect on the neural activity, we occasionally withheld the reward for a block of trials during the memory-guided saccade task, even for correctly performed trials. In the no reward blocks, the monkeys generally continued to correctly perform the task for
30 trials before stopping, and this comprehension of changing task conditions was reflected in the recorded neural activity. After a few trials, bursting activity would stop altogether. This control was run while recording 11 neurons, and all of these showed a significant difference in the mean firing rate from the target acquire event until the end of the trial (ANOVA, P < 105) in control versus normal trials. The vast majority (10) ceased firing during the prereward interval in the no-reward blocks, and the remaining neuron (of the 11 that were modulated) increased its firing rate after the reward period. An example neuron is shown in Fig. 5B. The firing activity is gradually extinguished in the no reward block (black bar). In contrast, the activity during the no-visual feedback trials (green bar) has a less precise onset time but does not extinguish. While the no-visual feedback trials show the effect of removing a predictor of reward, the no-reward blocks reveal the dynamic effects of the monkeys coming to understand that they should no longer expect a reward.
Unexpected reward trials
By removing the reward, the possibility that the bursting activity encoded an orofacial (e.g., licking) motor response was not eliminated. Every time the monkey expected a reward, he would presumably also plan a licking movement to acquire it. The monkey would often stop licking the juice tube in the blocks of trials in which the reward was turned off, and this would correspond to the termination of the bursting activity. Of course, if the monkey no longer expected to be rewarded for the eye movements, he also had no reason to lick the juice tube.
To dissociate licking movements from the expectation of reward, control trials were run in which an unexpected reward was delivered. A bonus reward would be delivered with a 5% probability at the end of the fixation interval, just before the cue presentation. To quantify a response, an ANOVA (P < 105) compared firing rates in the 200-ms interval during the bonus reward delivery, with the corresponding 200-ms period at the end of the fixation interval in normal trials. This first interval is the actual interval that the valve regulating the flow of reward was open. While running this control, 25 neurons with reward related activity were recorded, and 23 of these demonstrated no correlated activity in the unexpected juice-delivery period. This control shows that the majority of the recorded neurons are not responsive to rewards when they are not expected, ruling out the possibility that the neural activity is attributable to motor commands required to obtain the reward, such as licking and swallowing.
To address the possibility that the neural activity reflected monkeys postural responses or attempted postural responses or preparation for either, we recorded muscle activity in three muscles active during postural adjustments. We recorded EMG (see METHODS) from left and right Latissimus dorsi and right Semitendinosus of monkey S during the performance of both tasks. We observed that these muscles were active during trunk movements and leg movements. While there was activity recorded from these muscle groups during the execution of the task, we found that it was not temporally locked to reward expectation. These negative EMG results rule out the possibility that the monkey is consistently making postural adjustments in anticipation of the reward delivery.
|
|
DISCUSSION |
|---|
|
Onset of reward-expectancy signal corresponds to a secondary reinforcer
The activity generally started with the secondary reinforcer and stopped with the delivery of the reward. The secondary reinforcer in this context was visual feedback that occurred before reward delivery. In the memory task, the target reappears after 250 ms of fixation on the remembered target location. This visual feedback helped ensure accuracy in the initial learning of the task but also became a predictor of the upcoming reward. In the object task, the saccade target is visible, and so the monkey can be sure that he made a saccade to the target because he can see it. The onset time of the reward expectancy signal corresponds to the onset of the visual feedback in the tasks, either 250 ms after the correct saccade in the memory task or immediately during the correct saccade in the object task.
As shown in control experiments, although the secondary reinforcer helped to synchronize the timing of the bursting activity, it was not necessary for the neurons to burst. This and other controls discussed in the following text confirm that the bursting discharge carries a reward expectancy signal.
Control experiments establish the reward expectancy interpretation
In a series of control experiments, the argument was built that this activity reflects an expectation of reward. First, the bursting activity was dissociated from visual feedback with the demonstration that visual feedback is not required for the neurons to discharge, although it regularizes the timing of the onset. Second, when the reward was removed for a block of trials, the reward expectancy activity gradually disappeared, showing that this activity represents a dynamic variable corresponding to the comprehension of a changed task condition. Finally, the possibility that the neural trace signified a licking plan or a detection of reward was ruled out because there was generally no response to unexpected reward delivery.
Reward-related activity in the SEFs
Reward-related neural signals have already been described in the SEF (Amador et al. 2000
; Roesch and Olson 2003
; Stuphorn et al. 2000
). We found a reward expectancy signal in the SMA that appears very similar to types of activity found by Amador and colleagues and Stuphorn and colleagues.
Reward expectancy and reward prediction
Reward prediction (RP) neurons have been described in SEF along with a set of complementary reward detecting (RD) neurons (Amador et al. 2000
). The neural activity we are describing as reflecting reward expectation is similar to the RP neural activity. We did not find evidence of RD activity.
The firing rates of RP neurons increase before the occurrence of a reward, then abruptly cease firing at reward delivery, just as we found in the bursting activity of many of the neurons in this study. Our results, combined with the results of Amador and colleagues, are therefore evidence that reward expectation can be found in both SMA and SEF and likely throughout the DMFC. Our study adds to the findings of RP neurons by recording neural responses during the unexpected delivery of reward and submitting the monkey to short blocks of no-reward trials.
We choose to use the term reward expectancy because "expectancy" captures the way the neural activity continues until reward delivery. Furthermore, this designation separates itself from reward prediction nomenclature found in the dopamine neuron literature. To predict is to foretell on the basis of experience, whereas to expect is to await or look forward to the coming or occurrence. The reward prediction signal found in midbrain dopamine neurons and the reward expectancy signal in the DMFC likely play different roles in learning and goal-oriented behavior (see following text).
Reward expectancy and reinforcement
Reinforcement signals have been found in SEF using a countermanding saccade task (Stuphorn et al. 2000
). The reinforcement neurons were shown to increase activation while awaiting reward. The term reward expectancy describes the function of this activity. Again, the results of this study are evidence that the reward expectation signal found in the SMA is also present in the SEF.
Reward expectancy versus enhanced motivation
The postsaccadic burst cannot be a correlate of motivation (Roesch and Olson 2003
,2004
) simply because it comes after the behavior it would presumably motivate. Our use of the burst detection algorithm (Hanes et al. 1995
; Thompson et al. 1996
) establishes that this bursting activity comes in the postsaccadic interval.
In the Roesch and Olson study, the preferred direction of a neuron was first identified, and then a memory-guided saccade task was run to and away from the preferred direction of the cell. Because we rarely found neurons to be spatially tuned, we may have been recording from different types of neurons in the SMA.
Interestingly, the authors noted that in areas in which reward effects were common, such as the SMAr, neurons "fired more strongly than reward-insensitive neurons during the period extending from the completion of the saccade to delivery of the ingested reward." The authors did not think their paradigm capable of distinguishing between various interpretations of the significance of this effect, such as preparation and execution of liking movements, or increased intensity of reward anticipation. In the present study, our control experiments show that the postsaccadic activity in SMA reflects reward expectation.
Reward expectancy versus attention
The expected value of a reward has been shown to modulate activity during the performance of a task (Ikeda and Hikosaka 2003
; Platt and Glimcher 1999
; Shidara and Richmond 2002
; Watanabe 1996
). It has been argued that so far sufficient controls for these studies have not been performed to determine whether the cognitive state being manipulated is expected value or attention because these two states likely occur together and can be easily confounded. Likewise, studies examining attention may have recorded the effects of expected value (Maunsell 2004
).
In the current study, it is unlikely that the reward expectancy signal is actually an attention signal. The signal occurs after the task and is not spatially tuned and thus cannot reflect attention to the saccade location. It also cannot reflect attention to the reward because there was no activity when the reward was presented unexpectedlynovelty is a powerful attractor of attention.
Reward signals and reinforcement learning algorithms
A reinforcement learning algorithm (Sutton and Barto 1988
) has been proposed to account for different reward-related signals that have been found in the brain, such as the error of reward prediction in midbrain dopamine neurons (Schultz et al. 1997
). The prediction error signal is widely recognized as evidence for the implementation of a reinforcement learning algorithm in the brain. For example, the prediction error can serve to update action value estimates, so that the animal can have accurate estimates of the reward that can be expected for an action. In this formalism, the expected value, V, is updated after every trial according to the experienced reward by the equation
![]() |
is the learning rate, and Vt+1 is the updated estimate of expected value at time t+1. In this formulation, the time steps are individual trials, and the signal that corresponds to the error of reward prediction found in dopamine neurons (Schultz et al. 1997
The dynamics of the reward expectancy signal in SMA corresponds to the dynamics of the expected value of the action, V. Specifically, this algorithm captures the way the postsaccadic firing activity in the SMA gradually dissipates in no-reward blocks. When the reward, R, is zero for a series of trials, the preceding equation will diminish the expected value of the action, V, until it reaches the new value of the reward, 0. The learning rate parameter,
, determines how quickly the estimate of the expected value approaches the new value. This equation also describes how the neural activity will return to normal firing when the reward is again delivered as usual.
Functional significance of reward expectancy in DMFCa signal to guide learning
As in neural network models of reinforcement learning (Mazzoni et al. 1991
; Suri and Schultz 1999
), the reward signal found in DMFC could be used to train other parts of cortex to perform visuospatial tasks requiring arbitrary sensorimotor transformations. The reward expectancy signal found in the SEF (Amador et al. 2000
; Stuphorn et al. 2000
) is in position to shape future oculomotor behavior through its connections with the frontal eye fields (FEF) (Schall et al. 1993
) and the superior colliculus (SC) (Fries 1984
).
A reward-expectancy signal is better than the detection of the reward itself for training purposes for two reasons. First, the reward-expectancy signal implies that there is an internal model with an expected sensory outcome for a behavior, in this case, the secondary reinforcer. This model can be matched with a reward signal and refined as often as rewards are delivered or unexpectedly withheld. Second, the reward expectancy signal comes at a time that is more proximal to the behaviors which earned the reward, and thus may be able to reinforce the high level motor signals in DMFC related to those behaviors.
Usefulness of reward expectancy in SMA during an eye-movement task
A reward-expectancy signal present in DMFC could maintain and enhance the high-level representations of behaviors that earn a reward. But why would this activity be present in the SMA, which is concerned with movements of the body and limbs, during an eye-movement task? It is possible that the reward-expectancy signal is maintained throughout DMFC so that it can enhance any volitional motor acts that precede reward. For instance, in hand-eye coordination tasks, the reward signal can reinforce activity in the limb area of SMA and SEF together. Other areas of SMA that do not have a convergence of activation of the motor map activation and reward signal would not be reinforced and would not produce learning (Sutton and Barto 1988
). Thus the expectation signal may be more widely broadcast than the motor activations of a particular behavior. This broader signal may serve to learn new coordinations of different body parts for particular tasks.
|
|
GRANTS |
|---|
|
|
|
FOOTNOTES |
|---|
Address for reprint requests and other correspondence: M. Campos, Computation and Neural Systems, California Institute of Technology, MC 216-76, Pasadena, CA 91125 (E-mail: mcampos{at}caltech.edu)
|
|
REFERENCES |
|---|
|
Barraclough DJ, Conroy ML, and Lee D. Prefrontal cortex and decision making in a mixed-strategy game. Nat Neurosci 7: 404410, 2004.[CrossRef][Web of Science][Medline]
Campos M, Breznen B, and Andersen RA. Reward expectancy in dorsomedial frontal cortex of the macaque monkey. Soc Neurosci Abstr 187.3, 2003.
Chen L and Wise S. Neuronal activity in the supplementary eye field during acquisition of conditional oculomotor associations. J Neurophysiol 73: 11011121, 1995.
Cox RW. AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. Comput Biomed Res 29: 162173, 1996.[CrossRef][Web of Science][Medline]
Cromwell HC and Schultz W. Effects of expectations for different reward magnitudes on neuronal activity in primate striatum. J Neurophysiol 89: 28232838, 2003.
Fries W. Cortical projections to the superior colliculus in the macaque monkey: a retrograde study using horseradish peroxidase. J Comp Neurol 230: 5576, 1984.[CrossRef][Web of Science][Medline]
Fujii N, Mushiake H, Tamai M, and Tanji J. Microstimulation of the supplementary eye field during saccade preparation. Neuroreport 6: 25652568, 1995.[Web of Science][Medline]
Fujii N, Mushiake H, and Tanji J. Distribution of eye- and arm-movement-related neuronal activity in the SEF and in the SMA and pre-SMA of monkeys. J Neurophysiol 87: 21582166, 2002.
Hanes D, Thompson K, and Schall J. Relationship of presaccadic activity in frontal eye field and supplementary eye field to saccade initiation in macaque: Poisson spike train analysis. Exp Brain Res 103: 8596, 1995.[Web of Science][Medline]
Hassani OK, Cromwell HC, and Schultz W. Influence of expectation of different rewards on behavior-related neuronal activity in the striatum. J Neurophysiol 85: 24772489, 2001.
Hikosaka K and Watanabe M. Delay activity of orbital and lateral prefrontal neurons of the monkey varying with different rewards. Cereb Cortex 10: 263271, 2000.
Ikeda T and Hikosaka O. Reward-dependent gain and bias of visual responses in primate superior colliculus. Neuron 39: 693700, 2003.[CrossRef][Web of Science][Medline]
Kobayashi S, Lauwereyns J, Koizumi M, Sakagami M, and Hikosaka O. Influence of reward expectation on visuospatial processing in macaque lateral prefrontal cortex. J Neurophysiol 87: 14881498, 2002.
Luppino G, Matelli M, Camarda R, and Rizzolati G. Corticocortical connections of area F3 (SMA-proper) and area F6 (pre-SMA) in the macaque monkey. J Comp Neurol 338: 114140, 1993.[CrossRef][Web of Science][Medline]
Luppino G, Matelli M, Camarda R, Gallese V, and Rizzolatti G. Multiple representations of body movements in mesial area 6 and the adjacent cingulate cortex: an intracortical microstimulation study in the macaque monkey. J Comp Neurol 311: 463482, 1991.[CrossRef][Web of Science][Medline]
Mann S, Thau R, and Schiller P. Conditional task-related responses in monkey dorsomedial frontal cortex. Exp Brain Res 69: 460468, 1988.[Web of Science][Medline]
Matsumoto K, Suzuki W, and Tanaka K. Neuronal correlates of goal-based motor selection in the prefrontal cortex. Science 301: 229232, 2003.
Matsuzaka Y, Aizawa H, and Tanji J. A motor area rostral to the supplementary motor area (presupplementary motor area) in the monkey: neuronal activity during a learned motor task. J Neurophysiol 68: 653662, 1992.
Matsuzaka Y and Tanji J. Changing directions of forthcoming arm movements: neuronal activity in the presupplementary and supplementary motor area of monkey cerebral cortex. J Neurophysiol 76: 23272342, 1996.
Maunsell JHR. Neuronal representations of cognitive state: reward or attention? Trends Cognit Sci 8: 261265, 2004.[CrossRef][Web of Science][Medline]
Mazzoni P, Andersen RA, and Jordan MI. A more biologically plausible learning rule that backpropagation applied to a network model of cortical area 7a. Cereb Cortex 1: 293307, 1991.
Mitz A and Wise S. The somatotopic organization of the supplementary motor area: intracortical microstimulation mapping. J Neurosci 7: 10101021, 1987.[Abstract]
Musallam S, Corneil BD, Greger B, Scherberger H, and Andersen RA. Cognitive control signals for neural prosthetics. Science 305: 258262, 2004.
Nakamura K, Sakai K, and Hikosaka O. Neuronal activity in medial frontal cortex during learning of sequential procedures. J Neurophysiol 80: 26712687, 1998.
Parthasarathy H, Schall J, and Graybiel A. Distributed but convergent ordering of corticostriatal projections: analysis of the frontal eye field and the supplementary eye field in the macaque monkey. J Neurosci 12: 44684488, 1992.[Abstract]
Platt ML and Glimcher PW. Neural correlates of decision variables in parietal cortex. Nature 400: 233238, 1999.[CrossRef][Medline]
Roesch MR and Olson CR. Impact of expected reward on neuronal activity in prefrontal cortex, frontal and supplementary eye fields and premotor cortex. J Neurophysiol 90: 17661789, 2003.
Roesch MR and Olson CR. Neuronal activity related to reward value and motivation in primate frontal cortex. Science 304: 307310, 2004.
Russo G and Bruce C. Effect of eye position within the orbit on electrically elicited saccadic eye movements: a comparison of the macaque monkeys frontal and supplementary eye fields. J Neurophysiol 69: 800818, 1993.
Satoh T, Nakai S, Sato T, and Kimura M. Correlated coding of motivation and outcome of decision by dopamine neurons. J Neurosci 23: 99139923, 2003.
Schall JD, Morel A, and Kaas JH. Topography of supplementary eye field afferents to frontal eye field in macaque: implications for mapping between saccade coordinate systems. Vis Neurosci 10: 385393, 1993.[Web of Science][Medline]
Schlag J and Schlag-Rey M. Unit activity related to spontaneous saccades in frontal dorsomedial cortex of monkey. Exp Brain Res 58: 208211, 1985.[Web of Science][Medline]
Schlag J and Schlag-Rey M. Evidence for a supplementary eye field. J Neurophysiol 57: 179200, 1987.
Schultz W, Dayan P, and Montague PR. A neural substrate of prediction and reward. Science 275: 15931599, 1997.
Shidara M and Richmond BJ. Anterior cingulate: single neuronal signals related to degree of reward expectancy. Science 296: 17091711, 2002.
Shima K and Tanji J. Neuronal activity in the supplementary and presupplementary motor areas for temporal organization of multiple movements. J Neurophysiol 84: 21482160, 2000.
Sommer MA and Tehovnik EJ. Reversible inactivation of macaque dorsomedial frontal cortex: effects on saccades and fixations. Exp Brain Res 124: 429446, 1999.[CrossRef][Web of Science][Medline]
Stuphorn V, Taylor T, and Schall J. Performance monitoring by the supplementary eye field. Nature 408: 857860, 2000.[CrossRef][Medline]
Sugrue LP, Corrado GS, and Newsome WT. Matching behavior and the representation of value in the parietal cortex. Science 304: 17821787, 2004.
Suri RE and Schultz W. A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience 91: 871890, 1999.[CrossRef][Web of Science][Medline]
Sutton RS and Barto AG. Reinforcement Learning : An Introduction. Cambridge, MA: MIT Press, 1988.
Tehovnik E and Sommer M. Compensatory saccades made to remembered targets following orbital displacement by electrically stimulating the dorsomedial frontal cortex or frontal eye fields of primates. Brain Res 727: 221224, 1996.[CrossRef][Web of Science][Medline]
Thompson K, Hanes D, Bichot N, and Schall J. Perceptual and motor processing stages identified in the activity of macaque frontal eye field neurons during visual search. J Neurophysiol 76: 40404055, 1996.
Tremblay L, Hollerman JR, and Schultz W. Modifications of reward expectation-related neuronal activity during learning in primate striatum. J Neurophysiol 80: 964977, 1998.
Tremblay L and Schultz W. Reward-related neuronal activity during go-nogo task performance in primate orbitofrontal cortex. J Neurophysiol 83: 18641876, 2000.
Watanabe K, Lauwereyns J, and Hikosaka O. Neural correlates of rewarded and unrewarded eye movements in the primate caudate nucleus. J Neurosci 23: 1005210057, 2003.
Watanabe M. Reward expectancy in primate prefrontal neurons. Nature 382: 629632, 1996.[CrossRef][Medline]
This article has been cited by other articles:
![]() |
K. Wunderlich, A. Rangel, and J. P. O'Doherty Neural computations underlying action-based decision making in the human brain PNAS, October 6, 2009; 106(40): 17199 - 17204. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. K. Berdyyeva and C. R. Olson Monkey Supplementary Eye Field Neurons Signal the Ordinal Position of Both Actions and Objects J. Neurosci., January 21, 2009; 29(3): 591 - 599. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Visit Other APS Journals Online |