|
|
||||||||
1Neuroscience Laboratory, Institute for Medical Sciences, Ajou University School of Medicine, Suwon, Korea; and 2Department of Neurobiology, Yale University School of Medicine, New Haven, Connecticut
Submitted 20 March 2007; accepted in final form 14 October 2007
|
|
ABSTRACT |
|---|
|
|
|
INTRODUCTION |
|---|
|
In contrast, the mechanisms responsible for updating the value functions based on the reward prediction errors are not well understood. Clearly, this updating mechanism has to integrate multiple types of signals, such as value functions and reward prediction errors. In addition, the process of reinforcement learning would be greatly facilitated if memory signals related to the animal's recent actions are also available in the same anatomical structure involved in updating the value functions because a reward or penalty resulting from a particular action is often revealed after a substantial temporal delay. This problem, referred to as temporal credit assignment, is not trivial for several reasons. First, information about the consequences of a particular action is necessarily delayed in the brain, compared with when the corresponding motor command is issued. Second, a substantial delay can also result from the complex physical properties of the animal's environment. The problem of temporal credit assignment is particularly challenging when the long-term consequence of an action needs to be assessed. Therefore to update the value function of a chosen action correctly after a temporal delay, memory for previously executed actions must be available in the brain structures that are involved in updating value functions.
Several lines of anatomical and physiological evidence suggest that the striatum might play a key role in the process of updating value functions. First, the convergence of cortical inputs and dopaminergic projections to the striatum provides the anatomical substrate necessary for integrating reward prediction errors and value functions. Inputs from cortical areas related to planning and execution of motor responses, such as the prefrontal cortex, may provide the signals related to the actions chosen by the animal and their value functions (Baeg et al. 2003
; Barraclough et al. 2004
; Lee et al. 2007
; Leon and Shadlen 1999
;Watanabe 1996
), whereas dopamine neurons provide reward prediction errors (Schultz 1998
). Second, striatal neurons modulate their activity according to a variety of factors related to the value functions and actions chosen by the animal, suggesting that the striatum plays an important role in action selection (Cromwell and Schultz 2003
; Kawagoe et al. 1998
; Nicola et al. 2004
; Samejima et al. 2005
). However, it is not known whether neurons in the striatum encode signals related to the animal's previous actions.
In the present study, we investigated the time course of signals related to the rat's goal selection behavior in the ventral striatum (VS) during a visual discrimination task. Our task did not allow the animal to make its choices freely, and hence it could not be determined whether striatal signals related to the selected goals reflect the animal's choice or simply the movement direction. Nevertheless, in this paper, we refer to the direction of animal's movement toward a goal as "choice" for the sake of brevity. Rats were rewarded with a constant amount of water for visiting the lit side of a figure-8-shaped maze. This made it possible to distinguish choice-related signals from the signals related to motivational significance of the animal's behavior because correct actions were always rewarded with the same reward. The results show that many VS neurons modulate their activity according to the animal's goal choice in the previous trial, suggesting that neural signals related to previous actions necessary for updating value functions exist in the VS.
|
|
METHODS |
|---|
|
The experimental protocol was approved by the Ethics Review Committee for Animal Experimentation of the Ajou University School of Medicine. Experiments were performed with young male Sprague-Dawley rats (
9
11 wk old, 250
330 g, n = 3). Animals were individually housed in the colony room and initially allowed free access to food and water. Once behavioral training began, animals were restricted to 30 min access to water after finishing one behavioral session per day. Experiments were performed in the dark phase of 12-h light/dark cycle.
Behavioral task
The animals were trained to perform a visual discrimination task. This was an imperative (forced-choice) task in which the animals were rewarded with water (0.05 ml) only for visiting the side of a figure-8-shaped maze indicated by the visual cue. The overall dimension of the maze was 90 x 50 cm, and the width of the track was 9–13 cm. It was elevated 40 cm from the floor with 5 cm high walls along the entire track. The visual cue was delivered by one of the two green light-emitting diodes (diameter: 5 mm) that were located above the upper left and upper right arms of the maze (8 cm above the maze floor and 3 cm lateral to the midline; Fig. 1A). The sequence of light signals (left vs. right) was chosen randomly across trials. After visiting one of the goal locations, the animal was required to return to the starting location at the center of the maze by completing the remaining track (Fig. 1A). When the animal entered the central section of the maze, one of the visual cues was turned on and the next trial began immediately. The visual cue was extinguished 2 s after its onset, or when the animal exited the central section of the maze, whichever came first. The animals performed the task for 30
40 trials per day. Presentation of a visual stimulus and the delivery of water were triggered by infrared light beam sensors along the maze.
|
A microdrive array (Neuro-hyperdrive, Kopf Instruments, Tujunga, CA) loaded with 12 tetrodes was implanted in the left or right VS (1.9 mm A, 1.0 mm L, 6.5–8.0 mm V from bregma) under deep anesthesia with sodium pentobarbital (50 mg/kg body wt). Tetrodes were fabricated by twisting four strands of polyimide-insulated nichrome wires (H. P. Reid, Palm Coast, FL) together and gently heated to fuse the insulation without short-circuiting the wires (final overall diameter:
40 µm). The electrode tips were cut and gold-plated to reduce impedance to 0.3–0.6 M
measured at 1 kHz. After 7
10 days of recovery from surgery, tetrodes were gradually advanced toward the VS (maximum 320 µm/day). Once the tetrodes entered the intended recording region, they were advanced only 20
40 µm/day. When new unit signals that were different from those recorded in the previous day were obtained, recordings were made without advancing the tetrode. The identity of unit signals was determined based on the clustering pattern of spike waveform parameters (Fig. 2), averaged spike waveforms, baseline discharge frequencies, autocorrelograms, and interspike interval histograms (Baeg et al. 2007
). Unit signals were collected via a headstage of complementary metal oxide semiconductor (CMOS) operational amplifier (Neuralynx, Bozeman, MT), amplified with the gain of 5,000
10,000, band-pass-filtered between 0.6 and 6 kHz, digitized at 32 kHz, and stored on a SUN4u workstation using Cheetah data acquisition system (Neuralynx). Single units were isolated by examining various two-dimensional projections of the relative amplitude data in all channels of each tetrode (Fig. 2A), and manually applying boundaries to each subjectively identified unit cluster using custom software (Xclust, M. Wilson). Spike width was also used as an additional feature of spike waveforms for unit isolation. Only those clusters that were clearly separable from each other and from background noise throughout the recording session were included in the analysis. The head position of the animals was also recorded at 60 Hz by tracking an array of light-emitting diodes mounted on the headstage. Unit signals were recorded with the animals placed on a pedestal (resting period) for
10 min before and after experimental sessions to examine the stability of recorded unit signals. Unstable units were excluded from the analysis. When recordings were completed, small marker lesions were made by passing an electrolytic current (50 µA, 30 s, cathodal) through one channel of each tetrode and recording locations were verified histologically as previously described (Baeg et al. 2001
).
|
UNIT CLASSIFICATION.
We separated recorded units, using a nonhierarchical k-means clustering algorithm (SPSS 10.0), into two groups based on average firing rate, spike duration, and the ratio between the peak and valley amplitudes of a filtered spike waveform (Fig. 2B). Of 523 units that were subject to analysis (
0.1 Hz), the type 1 neurons (n = 483) had low firing rates (2.5 ± 0.1 and 2.6 ± 0.1 Hz during resting and running periods, respectively), long spike durations (257.6 ± 4.5 µs), and high peak-to-valley ratios (1.57 ± 0.01). In contrast, the type 2 neurons (n = 40) had relatively high firing rates (26.3 ± 2.1 and 27.9 ± 1.9 Hz during resting and running periods, respectively), short spike durations (201.6 ± 13.7 µs), and low peak-to-valley ratios (1.39 ± 0.07). The type 1 and 2 neurons correspond to putative medium spiny neurons (MSNs) and local interneurons, respectively (Wilson 2004
). Although both types of neurons were included in the analyses, essentially the same results were obtained when putative local neurons were excluded from the analyses (data not shown).
BEHAVIORAL STAGES. To determine the time course of neural signals related to different variables manipulated in this study, we divided each trial into four stages, and the mean spike rate in each stage was analyzed separately. These four stages correspond to response selection, approach to reward, reward consumption, and return to the center of the maze (Fig. 1A). The response selection stage corresponds to the central section of the maze. The reward approach stage spans the period between the end of the response selection stage and the time of the animal's arrival at one of the reward sites. The reward consumption stage was the time period in which the animal stayed in the reward site. The return stage started as the animal departed from a reward site and ended when the animal entered the central section of the maze. Average durations of the response selection, reward approach, reward consumption, and return stages were 0.81 ± 0.12, 1.26 ± 0.04, 4.76 ± 0.37, and 4.78 ± 0.59 s, respectively. Each trial consisted of all four behavioral stages beginning with the response selection stage.
ANALYSIS OF CHOICE-RELATED ACTIVITY.
To test how the neural signals related to the animal's choice changed across different stages, we applied a multiple linear regression analysis in which the mean firing rate (St) of a neuron during a particular behavioral stage of a trial t was given by a linear function of the animal's behavioral choice in the same trial (Ct) and the three previous trials (Ct–1, Ct–2, and Ct–3) as the following
![]() | (1) |
t represents the error term. This regression analysis was carried out separately for each behavioral stage including all trials and also including only correct trials. Both correct and error trials were included in the subsequent analyses.
We then tested whether the effect of the animal's previous choice on neural activity could merely reflect small differences in the animal's trajectory in the central portion of the maze that was correlated with the animal's previous choice (Euston and McNaughton 2006
). We divided the response selection stage of each trial into 12 bins of equal distances, obtained the mean lateral position of the animal's head in each bin, and calculated the coefficient for the first principal component of these values for each trial. To avoid the influence of spurious head movement, we included only those trials in which the mean lateral head position lied within 2 SD from the value averaged across all trials for each bin. Overall, 8.5% of the trials were excluded by this criterion. We then applied the following regression analysis
![]() | (2) |
t represents the error term. This analysis was applied to the activity during the response selection stage only because we did not find any systematic effect of the animal's previous choice on its movement trajectory for other behavioral stages.
If VS neurons encode the value of reward expected from a particular action (Samejima et al. 2005
), their activity might be influenced by the conjunction of the animal's previous choice and their outcome (Barraclough et al. 2004
). Therefore the possibility that neural signals for the previous choice actually reflect the action value functions for the left or right choice was tested by applying the following regression analysis that includes the interaction between the animal's choice and reward in the previous trial
![]() | (3) |
t represents the error term. We also tested the possibility that the signals seemingly related to the animal's previous choice actually encode the previous visual cue, namely, whether the previous cue indicated the left-ward or right-ward turn. To this end, we applied a multiple regression analysis that includes the visual cue and behavioral choice in the previous trial (Vt–1 and Ct–1, respectively) as independent variables. In other words
![]() | (4) |
t is the error term. DISCRIMINANT ANALYSIS. We used a linear discriminant analysis with a leave-one-out cross-validation procedure to examine further how reliably the information about the animal's behavioral choice in a given trial can be inferred from the neuronal ensemble activity during the response selection stage of each trial. In this analysis, a single trial was removed, and a linear discriminant function was generated based on the neuronal ensemble activity in the remaining trials separated according to the animal's choice in the current trial (trial lag = 0) or in each of the three previous trials (trial lag = 1–3). The removed trial was then classified based on this discriminant function. This procedure was repeated for all trials and the percentage of correct classification was calculated for each session. We then used a t-test to determine whether this is significantly different from the chance level (50% correct classification) across the entire sessions. This analysis was applied separately to the neuronal ensemble activity during the first and last 500 ms of the response selection stage as well as the activity during the entire response selection stage.
We also used a discriminant analysis to test further whether the activity during the response selection stage was related more closely to the visual cue or the animal's choice in the previous trial. In this analysis, the discriminant function was determined based on the activity in all correct trials, and the percentage of trials in which the error trials were classified correctly according to the animal's previous choice was computed. This analysis was performed on the activity of individual neurons as well as ensemble activity. The statistical significance was evaluated at the level of 0.05, unless noted otherwise, and all the data are expressed as means ± SE.
|
|
RESULTS |
|---|
|
Rats were trained in a visual discrimination task (Fig. 1A) until they performed correctly in >70% of the trials for three consecutive days. Recordings began once the animals reached this criterion. On average, the rats performed 35.4 ± 0.3 trial/session, and the average rate of correct trials was 90.3 ± 0.8%. While the animals were performing the task, a total of 572 well-isolated and stable neurons were recorded in the VS (Fig. 2A). To improve the reliability of the analyses further, neurons with the overall activity lower than 0.1 spikes/s were excluded from the analysis (n = 49). Thus a total of 523 units were included in the analyses described in the following text.
Signals related to animal's choice
Consistent with the findings from previous studies in rats (e.g., Chang et al. 2002
; Daw 2003
; Lavoie and Mizumori 1994
; Mulder et al. 2004
; Shibata et al. 2001
; Woodward et al. 1999
), neurons in the VS displayed diverse patterns of activity during the task performance. Elevated neuronal activities were observed across all behavioral stages and over the entire maze, as illustrated by the firing rate maps in Fig. 3, which were constructed as described previously (Song et al. 2005
). Furthermore, many of these neurons displayed modulations in their activity according to the animal's choice in the current or previous trials. For example, the neuron illustrated in Fig. 3A modulated its activity reliably according to the animal's choice in the previous trial during the response selection stage (Fig. 3A, previous trial), whereas its activity did not show differential activity related to the animal's upcoming choice in the same trial (Fig. 3A, current trial). The remaining example neurons shown in Fig. 3 modulated their activity according to the animal's choice in the current trial during the reward approach (Fig. 3B), reward consumption (Fig. 3C), or return (Fig. 3D) stages. Therefore the activity of these neurons reflected the animal's choice after its chosen action has been executed.
|
20 correct trials (39 sessions, 307 units). The animal's choice in the current trial significantly affected activity in more than a quarter of neurons in the reward approach (25.4% for all and 21.2% for correct trials), reward consumption (31.6 and 31.9%, for all and correct trials, respectively), and return (27.2 and 23.8%, for all and correct trials, respectively) stages. In contrast, in the response selection stage, <3% of neurons were significantly modulated by animal's choice in the current trial (i.e., impending behavioral choice), which was below the level expected by chance (alpha = 0.05, horizontal lines in Fig. 4). On the other hand, the animal's choice in the immediately preceding trial (Ct–1) modulated the activity of many neurons (17.0 and 16.9% for all and correct trials, respectively) in the response selection stage. The proportion of such neurons then decreased gradually during the subsequent reward approach (11.7 and 10.1%, for all and correct trials, respectively), reward consumption (7.5% for both), and return (5.4 and 8.8%, for all and correct trials, respectively) stages. Nevertheless, the number of neurons that significantly modulated their activity according to the animal's choice in the previous trial during the reward approach and reward consumption stages was significantly higher than the level expected by chance (binomial test, P < 0.05). This was also true for the return stage when only correct trials were included in the analysis. Regarding animal's choice in two or three trials ago (Ct–2 and Ct–3), it modulated only small numbers of neurons in all behavioral stages. None of these numbers were significantly different from the level expected by chance (binomial test, P > 0.05).
|
2 test, P < 0.001). In the analysis that included only the correct trials, the number of such neurons was 18 (of 65 neurons, 27.7%), which was also significantly higher than expected by chance (
2 test, P < 0.001). Thus if a neuron significantly modulated its activity according to the animal's choice in a given trial, this neuron was also likely to modulate its activity according to the animal's choice in the previous trial during the reward approach stage. This provides additional evidence that the effect of the animal's behavior in the previous trial on neural activity during the reward approach stage was not due to chance. We also tested whether the activity seemingly related to the animal's previous choice might have simply resulted from the variability in the animal's trajectory during the response selection stage that was systematically related to the animal's previous choice. This analysis was performed using the coefficients for the first principal component of the animal's trajectory (see METHODS) that accounted for 61.0% of the total variance of the lateral head position during the response selection stage. We found that the previous choice significantly influenced neuronal activity during the response selection stage more frequently (n = 73 of 523, 14.0%) than the component coefficient (n = 31, 6.0%). Moreover, the coefficients for the previous choice (a2, Eq. 2) estimated with and without the first principal component (Pt) were highly correlated (r = 0.953). Therefore the effect of previous choice on neural activity in the response selection stage cannot be entirely explained by the variability in the animal's trajectory. Because the position of the rat was determined by tracking an array of diodes mounted on rat's head, it is possible that part of animal's body was outside the central section of the maze (i.e., return stage) although the animal position was determined to be in the response selection stage. To test whether this factor influenced the results, we performed the same analysis using only the data from the second half of the response selection period (i.e., the last 6 of 12 spatial bins) where rat's body is expected to be fully contained in the central section. This analysis yielded essentially the same result (data not shown).
To test whether the activity of VS neurons related to the animal's previous choice might contribute to the process of updating action value functions, we applied a regression model that includes the interaction between the animal's choice and reward in the previous trial (see METHODS, Eq. 3) to neuronal activity during the response selection stage. The numbers of neurons that significantly modulated their activity according to the animal's choice in the previous trial, its reward and their interaction were 69 (of 523 neurons, 13.2%), 35 (6.7%), and 38 (7.3%), respectively. Hence the percentage of neurons carrying the signals related to the animal's previous choice per se was higher than that of neurons that showed the interaction between the animal's choice and reward in the previous trial.
These results indicate that once the animal made a behavioral choice (i.e., making a left or right turn from the central section), the choice signal was reflected in neural activity in the reward approach stage, and it persisted through the reward consumption and return stages. When the animal returned to the central section (the response selection stage) and was ready to begin a new trial, the choice signal was somewhat reduced but still persisted. Thus neural activity in the response selection stage carried signals related to the animal's behavioral choice in the previous trial (Ct–1). Once the animal made its choice in the new trial, signals related to the animal's new choice modulated the neural activity in the VS strongly during the reward approach stage, but at the same time, the previous choice signal still modulated neural activity, albeit to a less degree. The signals related to the animal's previous choice (Ct–1) were further diminished as the animal proceeded to the reward consumption and return stages. By the time animal proceeded further to the response selection stage in the next trial (Ct+1), there was no longer any evidence for the signals related to the animal's previous choice (Ct–1).
Choice signal in different phases of the response selection stage
The lack of neural signals related to impending behavioral choice is somewhat at odds with the results from previous studies suggesting the role of the VS in action selection (Nicola 2007
; Nicola et al. 2004
; Pennartz et al. 1994
; Setlow et al. 2003
; Taha and Fields 2006
). Thus we further tested the possibility that signals related to the animal's upcoming action are conveyed largely during a particular phase of the response selection stage. In particular, we tested whether such signals are confined to a late phase of the response selection stage, immediately before the reward approach stage began. Therefore we applied the same multiple regression analysis described in the preceding text separately for the activity during the first and last 500 ms of the response selection stage. As shown in Fig. 5A, this analysis yielded essentially the same results as the analysis based on the activity during the entire response selection stage. Few neurons carried signals related to the impending behavioral choice (Ct), whereas significant proportions of neurons (first 500 ms: 16.8%, P < 0.001; last 500 ms: 16.1%, P < 0.001, binomial test) modulated their activities according to the behavioral choice in the previous trial (Ct–1). Choices made two or three trials ago (Ct–2 or Ct–3) did not significantly influence neuronal activity during these time periods.
|
Signals related to cue versus choice
The results described so far suggest that signals related to the animal's previous action are encoded by a substantial fraction of VS neurons during the response selection stage. However, the animals in our study produced relatively small number of errors, resulting in relatively high correlation between sensory cues and behavioral choices. We therefore tested the possibility that the neural activity seemingly related to the animal's previous choice actually encodes the information about the visual cue in the previous trial using a regression analysis (see METHODS, Eq. 4). The results showed that the activity of 5.4 and 12.6% of neurons was significantly modulated by the sensory cue and the animal's choice in the previous trial, respectively. Hence the effect of the previous behavioral choice on neural activity cannot be fully explained by neural activity related to the previous sensory cue.
Next we constructed a linear discriminant function based on neural activity during the response selection stage of correct trials and used this function to classify error trials. If neural activity is related to the animal's choice, previous error trials should be classified according to the previous choice. Otherwise, they should be classified according to the previous cue. Unlike the preceding multiple regression analysis, in which neuronal activity can be significantly modulated by either, neither or both of the previous choice and cue, this analysis forced the error trials to be classified according to either the previous choice or the previous cue. When the discriminant analysis was based on activities of single neurons (n = 523), 57.3% of error trials (1,062 of 1,855) were classified according to the previous choice. Similarly, when we used neuronal ensemble activity for the discriminant analysis, 67.1% of all error trials (141 of 210) were classified according to the previous choice. Both of these values were significantly higher than expected by chance (binomial test, P < 0.001). Therefore signals related to the animal's previous choices are unlikely to result from the influence of the sensory cue in the previous trial.
|
|
DISCUSSION |
|---|
|
The results from the present study show that some neurons in the VS carry memory signals for previous actions. Multiple regression analysis on individual neuronal activity revealed that about one-sixth of VS neurons significantly changed their activities during the response selection stage depending on the animal's choice in the previous trial. Similar results were obtained with the discriminant analysis applied to the ensemble activity. The effect of the animal's previous choice on neural activity was found for the initial as well as last 500-ms interval during the response selection stage, suggesting that the memory signal for the previous choice is maintained throughout the entire response selection stage. Although this effect diminished gradually over time during the following behavioral stages, a statistically significant number of neurons showed signals related to the animal's choice in the previous trial during the subsequent stages (reward approach, reward consumption, and return). Interestingly, the activity of many neurons in the VS encoded the signals related to the animal's previous choice as well as the signals related to the animal's choice in the current trial, suggesting that memory signals related to two different actions are maintained in the VS simultaneously.
The optimal behavioral strategy for the animal during the visual discrimination task used in this study was relatively simple and only required the animal to move to the direction indicated by the visual cue. Therefore once the animals were fully trained, there was no further need for learning, and the outcomes from their choices were always fully predictable. The fact that the neurons in the VS still encoded the signals related to the animal's previous choices therefore suggests that such signals are transmitted to the VS even when they are no longer needed. Because the animal's environment can always change unpredictably, the presence of signals related to the animal's previous choices in the VS might be useful when the contingencies between the animal's actions and reward change. It is also possible that such neural signals might play a role in preventing extinction.
Role of memory signals related to previous choices
The exact role of choice-related memory signals observed in the VS is not clear. Nevertheless, these signals can potentially play an important role in bridging the temporal gap between the time when the animal decides to take a particular action and the time when the outcome of such an action is revealed. Theoretical studies have proposed two different mechanisms to solve this problem of so-called temporal credit assignment (Fig. 1B). Both of these mechanisms have been originally proposed as a possible means to account for the fact that a conditional stimulus can acquire the ability to predict the occurrence of an unconditional stimulus even when there is a temporal gap between them (Sutton and Barto 1990
). However, similar mechanisms might be used to bind the signals related to the animal's action and its consequences. First, the sensory representation of the animal's action might be augmented by a series of transient signals that are known as complete serial compound stimuli or tapped delay lines (Montague et al. 1996
; Pan et al. 2005
; Sutton and Barto 1990
) (Fig. 1B, TDL) or by signals that gradually decay with a particular time constant (Suri and Schultz 1999
). Second, the problem of binding the animal's action and its subsequent outcome might be resolved through eligibility trace (Fig. 1B, ET), as proposed in TD(
) algorithm where the parameter
controls the decay rate for the eligibility trace (Sutton and Barto 1998
). In contrast to one-step temporal difference learning [or TD(0)] algorithm, memory traces for previous actions are maintained across many trials in the TD(
) algorithm. In this regard, a recent study that combined unit recording and modeling suggested a process similar to TD(
) algorithm (Pan et al. 2005
). A main difference between the representation of serial compound stimulus and the eligibility trace is that the latter is explicitly related to the gating of learning process.
A potential biological substrate for either serial compound stimulus representation or eligibility might be provided by sustained neural activity or biochemical processes in the synaptic terminals (Houk et al. 1995
). Our results suggest that they might be represented in the form of sustained neural activity within a population of VS neurons. If memory signals in the present study indeed correspond to eligibility trace for previous actions, then it spans at least two trials in the current task. It would be important to test in a future study whether the duration of memory signals in the VS can be adjusted depending on the demands of a specific task. It should be noted, however, that some of the results from our study are not consistent with eligibility trace. Whereas eligibility trace for an action should sum over the same repeated choices (Sutton and Barto 1990
, 1998
), some neurons showed the opposite preference for the current and previous choices (e.g., Fig. 3B). Hence, memory signals in the VS might represent the serial compound stimulus related to the animal's previous action or multiple processes including the serial compound stimulus as well as eligibility trace. It is also possible that VS memory signals may serve other unknown functions. Further work is required to clarify this issue.
Source of memory signals in the VS
Currently, the source of signals related to the animal's previous action in the VS is not known. Nevertheless, corticostriatal projections originating from the prefrontal cortex are likely to convey such information to the VS. Consistent with this possibility, neurons in the rat medial prefrontal cortex, which sends direct projections to the VS (see Vertes 2004
and references therein), often displayed signals more strongly related to the animal' previous action than the animal's future action in a spatial delayed alternation task, especially while the animal was still learning the task (Baeg et al. 2003
). It is possible that memory signals for previous actions exist in the loop consisting of the prefrontal cortex, VS, ventral pallidum/substantia nigra pars reticulata, and thalamus. Therefore it would be important to test whether memory signals for previous actions also exist in the ventral pallidum/substantia nigra pars reticulata and the mediodorsal/ventromedial thalamic nuclei, which receive input projections from the VS and project back to the prefrontal cortex (Groenewegen et al. 1999
). Neurons recorded from the dorsolateral prefrontal cortex (DLPFC) of monkeys that were trained to make stochastic choices in a simulated competitive game also displayed robust signals related to the animal's choice in the previous trial (Barraclough et al. 2004
). The activity of DLPFC was sometimes influenced by the animal's choice two or three trials before the current trial (Seo et al. 2007
). Whether signals related to previous actions also exist in the primate striatum is currently unknown. Nevertheless the presence of such signals in the DLPFC raises the possibility that memory signal for previous actions is also maintained in the cortico-basal ganglia loop of the primate brain. Although the DLPFC does not send direct projections to the VS (Haber and McFarland 1999
), it projects heavily to the medial and orbital prefrontal cortices (Cavada et al. 2000
), which in turn send direct projections to the VS (Haber et al. 1995
, 2006
). The signals related to previous actions in the DLPFC may be transmitted to the VS via such indirect projections. This possibility needs to be tested in future studies.
Role of the VS in action selection
Our results show that VS neurons do not carry signals on the animal's choice of future action, which is at odds with the proposed role of the VS in action selection (e.g., Nicola 2007
; Pennartz et al. 1994
). One line of supporting evidence for this proposal is the finding that some neurons in rat VS selectively responded to a particular cue-response combination (Nicola et al. 2004
; Setlow et al. 2003
). In these studies, however, one of the sensory cues signaled an available reward, whereas the other cue signaled either no reward or an aversive stimulus. Thus it is not clear whether the observed pattern of activity represents stimulus-action association or stimulus-reward association (i.e., reward expectancy) (Knutson and Cooper 2005
; O'Doherty 2004
; Schultz 2006
). In this regard, Daw (2003)
has shown stimulus-evoked anticipatory responses of VS neurons that were independent on rat's choice of action but dependent on the type of reward, supporting the latter possibility. Our results also indicate that VS neurons do not carry information about specific sensory cues or associated behavioral responses when correct choices always lead to the same amount of reward. Similarly, Chang et al. (2002)
have shown that VS neuronal ensemble activity 1 s before lever press did not readily differentiate correct/error or left/right lever presses that led to the same amount of reward. In monkey VS, a considerable number of neurons responded to reward-predicting stimuli, whereas few neurons showed future action-selective activity in a go-no-go task (Schultz et al. 1992
). The results from these studies therefore do not support the proposed role of the VS in action selection and are consistent with the behavioral studies suggesting the role of the VS in Pavlovian rather than instrumental conditioning (reviewed in Cardinal et al. 2002
). They are also consistent with the human brain-imaging studies that did not find VS activation in association with action selection. Instead, activation of other brain regions, such as the dorsal striatum (DS), was correlated with action selection (reviewed in O'Doherty 2004
), which is in line with numerous physiological studies demonstrating DS neural signals related to impending movement direction of the monkey (Alexander and Crutcher 1990
; Hikosaka et al. 1989
; Kobayashi et al. 2007
; Pasquereau et al. 2007
; Pasupathy and Miller 2005
; Samejima et al. 2005
). Thus with the caveat that the VS may contribute to action selection under some circumstances (e.g., Nicola 2007
; Redgrave et al. 1999
), our results suggest that action selection generally takes place elsewhere in the brain.
|
|
GRANTS |
|---|
|
|
|
ACKNOWLEDGMENTS |
|---|
|
Present addresses: Y. B. Kim, Dept. of Neuroscience, University of Pittsburgh, Pittsburgh, PA 15260; and E. H. Baeg, Dept. of Psychiatry, University of Pittsburgh, Pittsburgh, PA 15213.
|
|
FOOTNOTES |
|---|
Address for reprint requests and other correspondence: M. W. Jung, Neuroscience Laboratory, Institute for Medical Sciences, Ajou University School of Medicine, Suwon 443-721, Korea (E-mail: min{at}ajou.ac.kr)
|
|
REFERENCES |
|---|
|
Baeg EH, Kim YB, Huh K, Mook-Jung I, Kim HT, Jung MW. Dynamics of population code for working memory in the prefrontal cortex. Neuron 40: 177–188, 2003.[CrossRef][Web of Science][Medline]
Baeg EH, Kim YB, Jang J, Kim HT, Mook-Jung I, Jung MW. Fast spiking and regular spiking neural correlates of fear conditioning in the medial prefrontal cortex of the rat. Cereb Cortex 11: 441–451, 2001.
Baeg EH, Kim YB, Kim J, Ghim JW, Kim JJ, Jung MW. Learning-induced enduring changes in the functional connectivity among prefrontal cortical neurons. J Neurosci 27: 909–918, 2007.
Barraclough DJ, Conroy ML, Lee D. Prefrontal cortex and decision making in a mixed-strategy game. Nat Neurosci 7: 404–410, 2004.[CrossRef][Web of Science][Medline]
Cardinal RN, Parkinson JA, Hall J, Everitt BJ. Emotion and motivation: the role of the amygdala, ventral striatum, and prefrontal cortex. Neurosci Biobehav Rev 26: 321–352, 2002.[CrossRef][Web of Science][Medline]
Cavada C, Company T, Tejedor J, Cruz-Rizzolo RJ, Reinoso-Suarez F. The anatomical connections of the macaque monkey orbitofrontal cortex. A review. Cereb Cortex 10: 220–242, 2000.
Chang JY, Chen L, Luo F, Shi LH, Woodward DJ. Neuronal responses in the frontal cortico-basal ganglia system during delayed matching-to-sample task: ensemble recording in freely moving rats. Exp Brain Res 142: 67–80, 2002.[CrossRef][Web of Science][Medline]
Cromwell HC, Schultz W. Effects of expectations for different reward magnitudes on neuronal activity in primate striatum. J Neurophysiol 89: 2823–2838, 2003.
Daw ND. Reinforcement Learning Models of the Dopamine System and Their Behavioral Implications (PhD thesis). Pittsburgh, PA: Carnegie Mellon University, 2003.
Daw ND, Doya K. The computational neurobiology of learning and reward. Curr Opin Neurobiol 16: 199–204, 2006.[CrossRef][Web of Science][Medline]
Euston DR, McNaughton BL. Apparent encoding of sequential context in rat medial prefrontal cortex is accounted for by behavioral variability. J Neurosci 26: 13143–13155, 2006.
Groenewegen HJ, Galis-de Graaf Y, Smeets WJ. Integration and segregation of limbic cortico-striatal loops at the thalamic level: an experimental tracing study in rats. J Chem Neuroanat 16: 167–185, 1999.[CrossRef][Web of Science][Medline]
Haber SN, Kim KS, Mailly P, Calzavara R. Reward-related cortical inputs define a large striatal region in primates that interface with associative cortical connections, providing a substrate for incentive-based learning. J Neurosci 26: 8368–8376, 2006.
Haber SN, Kunishio K, Mizobuchi M, Lynd-Balta E. The orbital and medial prefrontal circuit through the primate basal ganglia. J Neurosci 15: 4851–4867, 1995.[Abstract]
Haber SN, McFarland NR. The concept of the ventral striatum in nonhuman primates. Ann NY Acad Sci 877: 33–48, 1999.[CrossRef][Web of Science][Medline]
Hikosaka O, Sakamoto M, Usui S. Functional properties of monkey caudate neurons. I. Activities related to saccadic eye movements. J Neurophysiol 61: 780–798, 1989.
Houk JC, Adams JL, Barto AG. A model of how the basal ganglia generate and use neural signals that predict reinforcement. In: Models of Information Processing in the Basal Ganglia, edited by JC Houk, JL Davis, DG Beiser. Cambridge, MA: MIT Press, 1995, p. 249–270.
Kawagoe R, Takikawa Y, Hikosaka O. Expectation of reward modulates cognitive signals in the basal ganglia. Nat Neurosci 1: 411–416, 1998.[CrossRef][Web of Science][Medline]
Kobayashi S, Kawagoe R, Takikawa Y, Koizumi M, Sakagami M, Hikosaka O. Functional differences between macaque prefrontal cortex and caudate nucleus during eye movements with and without reward. Exp Brain Res 176: 341–355, 2007.[Web of Science][Medline]
Knutson B, Cooper JC. Functional magnetic resonance imaging of reward prediction. Curr Opinion Neurol 18: 411–417, 2005.[Web of Science][Medline]
Lavoie AM, Mizumori SJ. Spatial, movement- and reward-sensitive discharge by medial ventral striatum neurons of rats. Brain Res 638: 157–168, 1994.[CrossRef][Web of Science][Medline]
Lee D. Neural basis of quasi-rational decision making. Curr Opin Neurobiol 16: 191–198, 2006.[CrossRef][Web of Science][Medline]
Lee D, Rushworth MFS, Walton ME, Watanabe M, Sakagami M. Functional specialization of the primate frontal cortex during decision making. J Neurosci 27: 8170–8173, 2007.
Leon MI, Shadlen MN. Effect of expected reward magnitude on the response of neurons in the dorsolateral prefrontal cortex of the macaque. Neuron 24: 415–425, 1999.[CrossRef][Web of Science][Medline]
Montague PR, Dayan P, Sejnowski TJ. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J Neurosci 16: 1936–1947, 1996.
Mulder AB, Tabuchi E, Wiener SI. Neurons in hippocampal afferent zones of rat striatum parse routes into multi-pace segments during maze navigation. Eur J Neurosci 19: 1923–1932, 2004.[CrossRef][Web of Science][Medline]
Nicola SM. The nucleus accumbens as part of a basal ganglia action selection circuit. Psychopharmacology 191: 521–550, 2007.[CrossRef][Medline]
Nicola SM, Yun IA, Wakabayashi KT, Fields HL. Cue-evoked firing of nucleus accumbens neurons encodes motivational significance during a discriminative stimulus task. J Neurophysiol 91: 1840–1865, 2004.
O'Doherty JP. Reward representations and reward-related learning in the human brain: insights from neuroimaging. Curr Opin Neurobiol 14: 769–776, 2004.[CrossRef][Web of Science][Medline]
Pan WX, Schmidt R, Wickens JR, Hyland BI. Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J Neurosci 25: 6235–6242, 2005.
Pasquereau B, Nadjar A, Arkadir D, Bezard E, Goillandeau M, Bioulac B, Gross CE, Boraud T. Shaping of motor responses by incentive values through the basal ganglia. J Neurosci 27: 1176–1183, 2007.
Pasupathy A, Miller EK. Different time courses of learning-related activity in the prefrontal cortex and striatum. Nature 433: 873–876, 2005.[CrossRef][Medline]
Pennartz CM, Groenewegen HJ, Lopes da Silva FH. The nucleus accumbens as a complex of functionally distinct neuronal ensembles: an integration of behavioural, electrophysiological and anatomical data. Prog Neurobiol 42: 719–761, 1994.[CrossRef][Web of Science][Medline]
Redgrave P, Prescott TJ, Gurney K. The basal ganglia: a vertebrate solution to the selection problem? Neuroscience 89: 1009–1023, 1999.[CrossRef][Web of Science][Medline]
Samejima K, Ueda Y, Doya K, Kimura M. Representation of action-specific reward values in the striatum. Science 310: 1337–1340, 2005.
Schultz W. Predictive reward signal of dopamine neurons. J Neurophysiol 80: 1–27, 1998.
Schultz W. Behavioral theories and the neurophysiology of reward. Annu Rev Psychol 57: 187–115, 2006.
Schultz W, Apicella P, Scarnati E, Ljungberg T. Neuronal activity in monkey ventral striatum related to the expectation of reward. J Neurosci 12: 4595–4610, 1992.[Abstract]
Seo H, Barraclough DJ, Lee D. Dynamic signals related to choices and outcomes in the dorsolateral prefrontal cortex. Cereb Cortex 17: i110–i117, 2007, doi:10.1093/cercor/bhm064. 2007.
Setlow B, Schoenbaum G, Gallagher M. Neural encoding in ventral striatum during olfactory discrimination learning. Neuron 38: 625–636, 2003.[CrossRef][Web of Science][Medline]
Shibata R, Mulder AB, Trullier O, Wiener SI. Position sensitivity in phasically discharging nucleus accumbens neurons of rats alternating between tasks requiring complementary types of spatial cues. Neuroscience 108: 391–411, 2001.[CrossRef][Web of Science][Medline]
Song EY, Kim YB, Kim YH, Jung MW. Role of active movement in place-specific firing of hippocampal neurons. Hippocampus 15: 8–17, 2005.[CrossRef][Web of Science][Medline]
Suri RE, Schultz W. A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience 91: 871–890, 1999.[CrossRef][Web of Science][Medline]
Sutton RS, Barto AG. Time-derivative models of Pavlovian reinforcement. In: Learning and Computational Neuroscience: Foundations of Adaptive Networks, edited by M Gabriel, J. Moore. Cambridge, MA: MIT Press, 1990.
Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Cambridge MA: MIT Press, 1998.
Taha SA, Fields HL. Inhibitions of nucleus accumbens neurons encode a gating signal for reward-directed behavior. J Neurosci 26: 217–222, 2006.
Vertes RP. Differential projections of the infralimbic and prelimbic cortex in the rat. Synapse 51: 32–58, 2004.[CrossRef][Web of Science][Medline]
Watanabe M. Reward expectancy in prmiate prefrontal neurons. Nature 382: 629–632, 1996.[CrossRef][Medline]
Wilson CJ. Basal ganglia. In: The Synaptic Organization of the Brain, edited by GM Shepherd. New York: Oxford, 2004.
Woodward DJ, Chang JY, Janak P, Azarov A, Anstrom K. Mesolimbic neuronal activity across behavioral states. Ann NY Acad Sci 877: 91–112, 1999.[CrossRef][Web of Science][Medline]
This article has been cited by other articles:
![]() |
M. Ito and K. Doya Validation of Decision-Making Models and Analysis of Decision Variables in the Rat Basal Ganglia J. Neurosci., August 5, 2009; 29(31): 9861 - 9874. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Carrillo-Reid, F. Tecuapetla, N. Vautrelle, A. Hernandez, R. Vergara, E. Galarraga, and J. Bargas Muscarinic Enhancement of Persistent Sodium Current Synchronizes Striatal Medium Spiny Neurons J Neurophysiol, August 1, 2009; 102(2): 682 - 690. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Seo, D. J. Barraclough, and D. Lee Lateral Intraparietal Cortex and Reinforcement Learning during a Mixed-Strategy Game J. Neurosci., June 3, 2009; 29(22): 7278 - 7289. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Huh, S. Jo, H. Kim, J. H. Sul, and M. W. Jung Model-based reinforcement learning under concurrent schedules of reinforcement in rodents Learn. Mem., April 29, 2009; 16(5): 315 - 323. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Seo and D. Lee Cortical mechanisms for reinforcement learning in competitive games Phil Trans R Soc B, December 12, 2008; 363(1511): 3845 - 3857. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Visit Other APS Journals Online |