Goal-directed and habit-based behaviors are driven by multiple but dissociable decision making systems involving several different brain areas, including the hippocampus and dorsal striatum. On repetitive tasks, behavior transitions from goal directed to habit based with experience. Hippocampus has been implicated in initial learning and dorsal striatum in automating behavior, but recent studies suggest that subregions within the dorsal striatum have distinct roles in mediating habit-based and goal-directed behavior. We compared neural activity in the CA1 region of hippocampus with anterior dorsolateral and posterior dorsomedial striatum in rats on a spatial choice task, in which subjects experienced reward delivery changes that forced them to adjust their behavioral strategy. Our results confirm the importance of the hippocampus in evaluating predictive steps during goal-directed behavior, while separate circuits in the basal ganglia integrated relevant information during automation of actions and recognized when new behaviors were needed to continue obtaining rewards.
- neural ensemble
the process of learning and automating actions is driven by multiple but dissociable neural circuits that instantiate different decision making systems (Balleine et al. 2007; Everitt and Robbins 2013; Graybiel 2008; Hikosaka et al. 1995; Jog et al. 1999; Johnson et al. 2007; van der Meer et al. 2012; Miyachi et al. 1997, 2002; O'Keefe and Nadel 1978; Packard and McGaugh 1996; Redish 1999, 2013; Yin and Knowlton 2004). Initial learning and adaptation to changes in the environment are driven by goal-directed systems, in which an organism engages in evaluative and predictive steps, integrating past experience and potential future outcomes (Balleine et al. 2007; Buckner and Carroll 2007; Killcross and Coutureau 2003; van der Meer et al. 2012; Redish 2013). Thus goal-directed behavior is cognitively intensive but flexible, since planning for multiple forthcoming options occurs at or before the time of action selection. Automated behavior is driven by habit-based systems and develops with increasing experience on a task, wherein specific situations trigger specific action chains (Jog et al. 1999; Packard and McGaugh 1996; Smith and Graybiel 2013; Yin and Knowlton 2004, 2006) Because future outcomes are not considered at the time of action selection in the habit system, situations release actions quickly, but once these associations are well established they are difficult to change (e.g., insensitivity to devaluation; Adams 1982; Adams and Dickinson 1981).
Several studies have reported the hippocampus (HC) as a structure important for goal-directed behavior (Johnson and Redish 2007; Maguire and Hassabis 2011; O'Keefe and Nadel 1978; Schacter et al. 2011). HC neurons that are spatially tuned form a cognitive map, allowing for integration of past and potential future experiences in order to plan behavior (O'Keefe and Nadel 1978; McNaughton et al. 2006; Redish 1999; Wikenheiser and Redish 2015a). Neurons in the dorsolateral striatum also respond to spatial cues (Mizumori et al. 2004; Schmitzer-Torbert and Redish 2004, 2008; Yeshenko et al. 2004), but only when spatial cues contain information about obtaining rewards (Berke et al. 2009; Schmitzer-Torbert and Redish 2008). Neural activity in the dorsolateral striatal neurons is related to specific motor movements and actions (Alexander and DeLong 1985a, 1985b; Carelli and West 1991; Cho and West 1997; Jog et al. 1999; Schmitzer-Torbert and Redish 2008), likely ones that have consistently led to reinforcement.
Recent studies have discovered anatomical (Berendse et al. 1992; McGeorge and Faull 1989; Swanson 2000) and functional (Devan et al. 1999; Yin et al. 2004, 2005a; Yin and Knowlton 2004) differences between dorsolateral and dorsomedial striatum. Anterior dorsolateral striatum (aDLS) receives input from motor and sensory areas (Alexander and Crutcher 1990; Berendse et al. 1992; McGeorge and Faull 1989) and regulates motor control and habit-based behaviors (Carelli and West 1991; Cho and West 1997; Hikosaka et al. 1995; Miyachi et al. 1997; Smith and Graybiel 2013). Dorsomedial striatum has been implicated as playing a role in goal-directed behavior (Devan et al. 1999; Gremel and Costa 2013; Yin et al. 2005b; Yin and Knowlton 2004), such as reversal learning (Castañé et al. 2010; Kirkby 1969; Ragozzino 2007; Ragozzino and Choi 2004) and changing strategies (Ragozzino 2007; Ragozzino et al. 2002a, 2002b).
Importantly, recent studies have found anatomical (Berendse et al. 1992; McGeorge and Faull 1989) and functional (Corbit and Janak 2010; Yin et al. 2005b; Yin and Knowlton 2004) differences between anterior (aDMS) and posterior (pDMS) dorsomedial striatum and, consequently, in their role in behavior. While aDMS receives input from anterior cingulate cortex, dorsal prelimbic area, and some motor/sensory areas, pDMS receives input from the orbitofrontal cortex, ventral prelimbic area, and entorhinal cortex. The aDMS has been postulated to be involved with certain goal-directed behaviors (Clarke et al. 2008; Devan et al. 1999; Ragozzino et al. 2002a, 2002b), but neural correlates in the aDMS of these behaviors have not been found (Kimchi and Laubach 2009; Thorn et al. 2010). The anatomical inputs to pDMS are from structures involved in reversal learning, strategy changing, and action-outcome learning. Interestingly, studies have implicated the pDMS in many of these specific types of learning (Lex and Hauber 2010a, 2010b; Lucantonio et al. 2014; Stalnaker et al. 2012; Yin et al. 2005a, 2005b; Yin and Knowlton 2004).
We therefore hypothesized that just as the aDLS integrates information from sensorimotor areas to translate this information into behavior, the pDMS likely plays more of a role in goal-directed behavior than aDMS, integrating information from goal-oriented cortical areas and translating this information into action. Neural representational comparisons have already been made between aDLS and aDMS (Thorn et al. 2010). However, although the lesion data suggest a stronger role of pDMS than aDMS in these types of goal-oriented learning (Yin et al. 2005b; Yin and Knowlton 2004), to date no one has directly compared neural ensemble recordings from the pDMS. In this article, we report results from simultaneous recordings of aDLS and pDMS.
When rats come to decision points, they sometimes pause, orient toward a goal, and then reorient back and forth. This behavioral phenomenon is termed vicarious trial-and-error (VTE) behavior and has been hypothesized to reflect an underlying search process (Johnson and Redish 2007; van der Meer et al. 2012; Muenzinger 1938; Tolman 1938). VTE primarily occurs during goal-directed behaviors (Gardner et al. 2013; Papale et al. 2012; Schmidt et al. 2013; Tolman 1938). Changes to the reward contingencies within an environment consistently produce an increase in the occurrence of VTE (Blumenthal et al. 2011; Powell and Redish 2014; Schmidt et al. 2013), likely reflecting deliberative behavior as subjects form new or different strategies. Deliberation entails the search and evaluation of potential possibilities (Buckner and Carroll 2007; Daw et al. 2005; Johnson and Redish 2007; Redish 2013). During VTE behaviors, HC neural ensembles sweep forward ahead of the animal toward the potential goals (Gupta et al. 2012; Johnson and Redish 2007). If these sweeps of spatial representation are reflective of the search and evaluation process, then they would not be expected to occur in structures involved in the habit-based components, such as aDLS. Previous experiments have found that aDLS representations do not show forward sweeps (van der Meer et al. 2010), but it remains unknown whether pDMS representations do.
In contrast, in stable environments, as actions become more automated (e.g., habit based), control shifts to sensorimotor circuits capable of encoding action chains (Dezfouli et al. 2014; Graybiel 2008; Yin and Knowlton 2006), which include the aDLS (Alexander and Crutcher 1990; Berendse et al. 1992; Carrelli and West 1991; McGeorge and Faull 1989). Graybiel and colleagues have reported that the development of these action chains on a cued T-maze aligns with the development of preferential firing in dorsolateral striatum at the beginning and end of their T-maze (task bracketing; Barnes et al. 2005; Jog et al. 1999; Smith and Graybiel 2013; Thorn et al. 2010). Task bracketing is thought to underlie behavioral “chunking,” or the bracketing of a sequence of action chains on a task (Graybiel 1998; Jog et al. 1999; Miller 1956). Smith and Graybiel (2013) recently found that task bracketing was anticorrelated with VTE behaviors. If task bracketing is a consequence of the development of action chains within a habit-based (automated) behavioral decision system, then one would hypothesize that structures involved in the goal-directed (deliberative) system, such as HC and pDMS, should not show task bracketing.
Multiple systems interact to produce appropriate behavioral outputs, each of these systems forming neural circuits that run both in parallel and in conjunction with one another. To better understand how these decision making systems interact, we recorded neural ensembles from three different structures, simultaneously from pDMS and aDLS in six rats and from CA1 in another six rats, on a spatial navigation task that required rats to make decisions based on guidance from internal cues. In the analyzed probe trials, we introduced a change in the reward contingency without any physical change in the environment, forcing a change in behavior. On this task, rats begin each day showing goal-directed behaviors but develop an automated stereotypy of their path through the day (Schmitzer-Torbert and Redish 2004). On encountering this unsignaled change in reward contingency, rats typically return to goal-directed behaviors and reautomate their path under the new contingency (Blumenthal et al. 2011; Gupta et al. 2012; Powell and Redish 2014; Steiner and Redish 2012). This automation/reversal/reautomation allowed us to observe both goal-directed and habit-based behavior in a single session and to measure the neural correlates of these different types of behavior.
Eleven Fischer Brown Norway rats and one Brown Norway rat were trained to perform a modified version of a Hebb-Williams maze (HWM; Hebb and Williams 1946), similar to the multiple-T left, right, alternate (LRA) task (Blumenthal et al. 2011; Powell and Redish 2014; Steiner and Redish 2012). The maze was a wooden rectangle box with carpeted floor and LEGO brick walls that could be altered to change the internal maze portion (Fig. 1). The internal maze forms a series of low-cost choice points, which we refer to as the navigation sequence. At the end of the navigation sequence, rats came to a high-cost choice point and had to make a left or right turn. If a rat made the correct choice at the choice point, it would receive a food reward (2 unflavored food pellets, 45 mg each; Research Diets, New Brunswick, NJ) at a side feeder location and at a center feeder location (end zone). The pellets were delivered with automatic pellet dispensers (Med Associates, St. Albans, VT). If a rat made an incorrect choice at the choice point, it did not receive any food rewards and had to continue down the return arm to the end zone but would not receive food there. Although returning to the end zone started a new lap whether rewarded or not, rats ran the task continuously for 30 min. Lap identification was used for analysis only—no explicit event signaled the beginning or end of the lap.
Three different reward contingencies were used [left (L), right (R), or alternating (A)]. During training sessions, the reward contingency was held constant through an entire session but changed randomly from session to session. During the experimental phase, each session began with one reward contingency but the reward contingency changed at approximately the halfway point of the session (the reward contingency switch). Rats ran a subset of the six possible combinations (LR, LA, RL, RA, AL, AR) pseudorandomly. Every session lasted 30 min; rats earned their daily food intake on the task (∼12 g/day).
All procedures were conducted in accordance with National Institutes of Health guidelines for animal care and approved by the Institutional Animal Care and Use Committee at the University of Minnesota. Care was taken to minimize the number of animals used in these experiments and to minimize suffering.
After pretraining on the HWM, rats were chronically implanted with multitetrode hyperdrives [6 rats were implanted with 14-tetrode hyperdrives (made in house, 12 electrodes for recording, 2 for references) targeting the right dorsal HC, 3 rats were implanted with 14-tetrode hyperdrives targeting aDLS and pDMS unilaterally, and 3 rats were implanted with 28-tetrode hyperdrives (made in house, 24 electrodes for recording, 4 for references) targeting the aDLS and pDMS bilaterally]. See Table 1.
Nine rats were initially anesthetized with Nembutal (pentobarbital sodium, 40–50 mg/kg; Abbott Laboratories, North Chicago, IL), and three rats were anesthetized with isoflurane. All rats were maintained on isoflurane (0.5–2% isoflurane vaporized in medical-grade O2) during the implantation. All rats were situated on a stereotaxic apparatus (Kopf) and received Dual-Cillin (Phoenix Pharmaceutical, St. Joseph, MO) intramuscularly in each hindlimb. The dorsal parts of the rats' heads were shaved and disinfected with alcohol (70% isopropyl) and Betadine (Purdue Rederick, Norwalk, CT), and the skin overlying the scalp was removed. Several jewelers' screws were used to anchor the hyperdrive to the skull, and one of the screws was used as a recording ground. In six rats one craniotomy was opened (HC, targeting CA1); in three rats two craniotomies were opened (unilateral implantation of aDLS and pDMS), and in three rats four craniotomies were opened (bilateral implantation of aDLS and pDMS). Craniotomies were opened with a surgical trephine. The bundles for aDLS were centered at 0.7 mm anterior of bregma and 3.5 mm lateral of midline, and bundles for pDMS were centered at 0.4 posterior of bregma and 2.5 mm lateral of midline, in accordance with the study by Yin and Knowlton (2004). The bundles for HC were centered at 3.8 mm posterior of bregma and 3.0 mm lateral of midline.
The craniotomies around the hyperdrive were protected with Silastic (Dow Corning, Midland, MI). Dental acrylic (Perm Reline and Repair Resin, The Hygenic Corporation, Akron, OH) secured the hyperdrive to the skull. Immediately after surgery, all tetrodes were turned down 640 μm. After tetrodes were turned down, rats were given subcutaneous injections (5–10 ml) of sterile saline and oral administration of Tylenol (1 ml). To prevent infections, rats received subcutaneous injections of Baytril (enrofloxacin, 1.1 mg/kg) on the day of surgery and for 7 days after surgery.
After surgery, tetrodes were advanced 40–640 μm per day until reaching the striatum or HC. Initial entry into the HC or striatum was differentiated by observation of the corpus callosum, an area that is electrophysiologically quiet compared with the cortex, HC, and striatum. The HC pyramidal layer was identified by the size and reversal point of sharp-wave ripples, as well as by burst firing by cells in synchrony with the ripple portions of the sharp-wave ripple complexes. The striatum was further identified by the observation of medium spiny neurons, which have long interspike intervals and short bursts of firing.
In nine rats (3 striatum, 6 HC) recording neural activity while running a task was made possible by a 64-channel analog Cheetah system (Neuralynx, Bozeman, MT), and for the other three rats (striatum) a 96-channel digital Cheetah system was used. Spike trains were identified and recorded online with built-in filters, and then clustering of spike trains occurred offline. Neurons were separated into putative cells on the basis of specific waveform properties with KlustaKwik (K. D. Harris) and MClust 3.5 or MClust 4.0 (A. D. Redish). Only clusters with Lratio < 0.20 and isolation distance > 20 were used for analysis.
The position of the rat was monitored with LEDs on the head stage during experimental recording sessions, captured by an overhead camera. Position of the rat was recorded at 60 Hz with a video input to the Cheetah recording system, time-stamping the sampled position of the LEDs. Control of the experiment was performed with in-house code written in MATLAB. Events (e.g., feeder click and food delivery) were recorded and time-stamped by the Cheetah recording system and by MATLAB.
After the experiment was completed, tetrode locations were marked with small lesions by passing a small amount of anodal current (5 μA for 10 s) through each tetrode. After at least 2 days had passed, rats were anesthetized and perfused transcardially with saline followed by 10% formalin. Brains were stored in formalin followed by 30% sucrose formalin until slicing. Coronal slices were made through the area of the implantation and stained with cresyl violet to visualize tetrode tracks. Locations from the three regions were confirmed histologically (Fig. 2).
Striatal cells were classified into phasic-firing neurons, high-firing neurons, and tonic-firing neurons on the basis of the proportion of time spent in long (>2 s) interspike intervals (Schmitzer-Torbert and Redish 2004, 2008). HC cells were classified into putative pyramidal neurons and putative interneurons with a threshold of 6 Hz over the recording session. HC cells firing at an average of <6 Hz were classified as putative pyramidal neurons, while HC cells firing at an average of >6 Hz were classified as putative interneurons.
The choice point was defined as the top zone of the maze (see Figs. 4 and 6), based on behavioral navigation of the subjects. We defined the choice point as the point where overall path of the animal diverged as rats turned either left or right. As shown in Fig. 4, this point occurred near the end of the navigation sequence.
Firing rate at specific points on the maze over laps was obtained similarly to Thorn et al. (2010). Eight events on the maze were identified [start of the navigation sequence, middle of the navigation sequence, choice point, feeder click, side feeder (enter and exit), return arm, center feeder click, and end zone (start/end of each lap)]. Firing rate was measured over a 2-s time window (±1 s around each event). Firing rates were z-scored by taking the mean firing rate of each bin and then subtracting the mean firing rate for the rest of the maze divided by the standard deviation of firing for the rest of the maze. z-Scored firing rate was then divided into 500-ms time bins (4 bins for each event) within session over five-lap bins for all structures before and after the contingency switch. A measure of overall firing rate across laps was obtained by taking the mean z-scored firing rate for each five-lap bin, and a measure of overall firing rate across the maze was obtained by taking the mean z-scored firing rate for each 500-ms time bin.
A measure of task bracket-like effects was calculated based on an analysis by Smith and Graybiel (2013), who took the mean firing rate at the start and end of the maze minus the mean firing rate at the auditory cue at the choice point, a measure they called the task-bracket index. Similarly, in the present study, a normalized task-bracketing index was calculated by taking the mean firing rate of the last two bins of the end zone epoch (which marked the end and beginning of each lap on our task) and then subtracting the mean firing rate from the rest of the maze and dividing by the standard deviation of the mean firing rate from the rest of the maze (z-scored the same as presented above). This was done for early laps (1–15) and late laps (16–30) before and after the switch. A two-way ANOVA was performed to test for main effects and interactions, and post hoc tests were performed when there was a significant main effect or interaction and corrected with Bonferroni-Holm techniques.
A measure of tuning curves was obtained by taking neuronal firing at each time point and position of the rat for each neuron. Tuning curve information was normalized by occupancy to adjust for the amount of time the rat was at each position/time point. The maze was linearized with 1,000 points by creating an ideal path around the maze and then associating tuning curves with those points around the maze. This was done on a lap-by-lap basis.
Correlation of tuning curves.
After tuning curve information was obtained for each neuron in each structure (pDMS, aDLS, and CA1) on individual laps, four distributions of correlation coefficients were obtained for each region. The four different distributions (for each condition) were obtained by taking an average of left laps before the switch vs. average of left laps after the switch, an average of right laps before the switch vs. average of right laps after the switch, an average of left vs. average of right laps before the switch, and an average of left vs. average of right laps after the switch. Averages were calculated across laps, and maze locations were preserved.
To determine whether there was a significant difference between regions, we conducted a bootstrap analysis to determine whether the mean correlations were significantly far apart from each other in Euclidean distance. The analysis was applied to each of the four conditions separately. The bootstrap was performed by resampling the total distribution of correlations (across the 2 regions being compared) and redividing the total distribution into sample sizes matching the real sample sizes of the two distributions. (Thus if there were 800 cells providing 800 correlations divided between 500 cells in aDLS and 300 in CA1, we would take the 800 correlations, redivide them into groups of 500 and 300, and then find the Euclidean distance between the mean correlations.) One thousand bootstraps were done for each comparison. Finally, we compared each actual distance for each condition to the randomly sampled distribution created for each condition, and a P value was obtained by calculating the probability of random samples falling outside of actual mean distance.
A Change in Behavioral Strategy
We observed the behavior of 12 rats on a complex navigation task that required subjects to recognize a reward contingency and then to recognize and switch to a different reward contingency approximately midway through the session. Behavioral performance results from all 12 rats indicated that only a brief exploratory period was necessary in order to learn the starting contingency (Fig. 3).
When the novel contingency switch event was introduced performance dropped to the expected rate, given that they perseverated in the original contingency but then increased to an accuracy similar to that observed prior to the switch for the rest of the session (Fig. 3). VTE behavior appeared along with this sharp decrease in performance over the first 10 laps after the switch (Fig. 3). The occurrence of VTE behavior after the switch indicated that rats returned to flexible, deliberative behaviors when they were forced to reevaluate their current strategy.
Neurophysiological Data Sets
We recorded neuronal activity from subregions of the dorsal striatum (simultaneous recording from aDLS and pDMS) and HC (CA1). In the dorsal striatum, we recorded a total of 1,027 neurons. The majority of neurons were putative medium spiny neurons [MSP, 831 (81%) of 1,027 total neurons], 158 (15%) were putative high-firing interneurons (HFN), and 38 (4%) were putative tonic-firing neurons (TFN). By region, we recorded 541 MSP, 96 HFN, and 22 TFN from aDLS and 290 MSP, 62 HFN, and 16 TFN from pDMS (Table 1). In CA1, the majority of neurons were putative pyramidal cells [265 (95%) of 280 total neurons; Table 1]. Recording locations from all three data sets were confirmed histologically (Fig. 2).
Spatial Decoding of Forward Location at Choice Point
During VTE events at the choice point, HC neural activity tends to represent paths ahead of the rat (Gupta et al. 2010; Johnson and Redish 2007) while dorsolateral striatum activity does not (van der Meer et al. 2010). Johnson and Redish (2007) did not find any reliable relationship between the chosen side and the neural activity during these VTE events, but other studies have found representations of paths to chosen goals during more automated behaviors (Gupta et al. 2010; Wikenheiser and Redish 2015b). To examine the forwardness of spatial representation to compare representations in aDLS, pDMS, and CA1, we used a Bayesian spatial decoding algorithm, which estimates the rat's location from ensemble spiking activity (Zhang et al. 1998).
We decoded position as animals proceeded through a VTE event and found that the HC representation swept ahead of the animal to the two options. Figure 4B shows average decoding across a single non-VTE and a single VTE lap. Decoding on both laps shows mostly local decoding with forward components, consistent with previous experiments (Gupta et al. 2012; Johnson and Redish 2007; Wikenheiser and Redish 2015b). Also consistent with those previous experiments, decoding on the non-VTE lap proceeded primarily toward the chosen side, representing the current goal of the animal, while decoding on the VTE lap included decoding to both sides, consistent with previous experiments (Johnson and Redish 2007). This decoding analysis was performed as an average over the entire pass because we were interested specifically in how much decoding went ahead of the animal to the unchosen side. Examination of single theta cycles found that the representation of each side occurred serially, consistent with previous experiments (Johnson and Redish 2007; data not shown).
To examine the forwardness of representation at the choice point in each structure, we calculated the sum of the decoding probability in forward paths from the choice point (Fig. 4). A one-way ANOVA found a significant effect of regions [F(3) = 13.525, P < 0.0001]. Multiple comparisons with Bonferroni-corrected paired t-tests found higher forwardness in CA1 than in aDLS (P < 0.0001) and pDMS (P < 0.0001).
To address whether the forward representation reflected the succeeding choice, we calculated the difference of decoding probability between the chosen side and the unchosen side (chosen − unchosen). Because VTE has been found to anticorrelate with automation (Smith and Graybiel 2013), we separately examined the difference of the forward representation in VTE and non-VTE laps. One-sample t-tests revealed that CA1 represented the chosen side more than the unchosen side on non-VTE laps (P < 0.05), but this difference was not observed on VTE laps. Both striatal sets (aDLS and pDMS) showed higher forwardness on the chosen side more than on the unchosen side regardless of whether the lap included VTE or not (P < 0.005). The lack of representation of the chosen side in our HC recordings specifically during VTE laps is consistent with previous experiments (Johnson and Redish 2007).
Development of Ensemble Firing in Dorsolateral Striatum That Tracks Behavioral Performance
To examine whether neuronal ensembles from the different structures dynamically changed along with behavioral performance, we measured the average firing rate of each structure over laps. This measure of general neuronal activity revealed differences between the aDLS and the rest of the structures. Only in aDLS was there an increase in average firing rate before and after the switch (Fig. 5).
To obtain a more accurate measure of where this change was occurring, we adopted a method used by Graybiel and colleagues (Barnes et al. 2005; Jog et al. 1999; Smith and Graybiel 2013; Thorn et al. 2010), measuring firing rate at several maze events over laps. Specifically, we identified key maze locations (start zone, navigation sequence, choice point, feeder cue, feeders, return arm, and end zone) and obtained the firing rate ±1 s around entry into each maze location. We then plotted firing rate around each maze location across laps before and after the switch (for example, see Fig. 6). In the aDLS, there was an increase in firing rate primarily at the end zone, which marked the end and beginning of each lap (Fig. 6). We did not observe noticeable changes at any of the maze locations in any of the other structures.
Development of Task Bracketing Within Session in aDLS but Not pDMS or HC
Development of aDLS firing rate at the beginning and end of action sequences is similar to the effect seen by Graybiel and colleagues (Barnes et al. 2005; Jog et al. 1999; Smith and Graybiel 2013; Thorn et al. 2010), called task bracketing. Graybiel and colleagues observed that firing rate in the aDLS increasingly “bracketed” action chains along with increased experience and better performance across several sessions on a cued T-maze, recently reported to underlie habit-based behavior (Smith and Graybiel 2013). Our task allowed for the development of automated behavior within a single session, behavior that was disrupted by the contingency switch until subjects readjusted to the new contingency and then, again, automated their behavior. Thus we applied their task-bracketing index (see methods; compare Smith and Graybiel 2013) to examine whether a development of task bracketing would occur within session, before and after the contingency switch.
We found that a development of task bracketing was evident only in aDLS (Fig. 6). A two-way ANOVA [region (aDLS vs pDMS) × laps (early vs late)] revealed a significant main effect of region before [F(1) = 6, P = 0.0144] and after [F(1) = 5.57, P = 0.0184] the switch and a significant interaction of region × laps [F(1) = 4.43, P = 0.0354] before the switch. No significant task bracketing was found in the HC recordings.
Development of task bracketing in the aDLS tracked behavioral performance, such that as a subject's performance increased within session so did task bracketing in the aDLS (a Bonferroni-Holm-corrected paired t-test showed that aDLS late laps > aDLS early laps before the switch, P = 0.013). When performance decreased after the change of reward contingency was introduced, so too did task bracketing in the aDLS (a Bonferroni-Holm-corrected paired t-test showed that aDLS late laps before the switch > aDLS early laps after the switch, P = 0.001). These effects were not observed in any of the other structures. On late laps, aDLS task bracketing was higher than pDMS task bracketing before (P < 0.0001) and after (P = 0.009) the switch of reward contingency.
To investigate whether this was a general effect, we applied the task-bracketing index to the other maze locations (Fig. 7). In the HC, there were no significant increases or decreases at any of the additional locations. In the striatum, there were no additional significant interactions, such that the rate of firing did not develop or decline over laps in either the aDLS or the pDMS; however, there were several instances where one region or another showed overall higher task bracketing at a given location. Specifically, a main effect of region was found at the navigation sequence, where pDMS firing rate was greater than aDLS, both before [F(1) = 9.71, P = 0.0019] and after [F(1) = 4.52, P = 0.0336] the switch, at the choice point both before [F(1) = 10.32, P = 0.0013] and after [F(1) = 30.74, P < 0.0001] the switch, and at the feeder entry both before [F(1) = 48.05, P < 0.0001] and after [F(1) = 77.35, P < 0.0001] the switch. In addition to the overall elevated firing of aDLS compared with pDMS at the end zone (see above), a main effect of region was found at the feeder exit both before [F(1) = 77.49, P < 0.0001] and after [F(1) = 96.41, P < 0.0001] the switch. These additional findings indicate that, while pDMS showed elevated firing at events in the middle of the maze and aDLS showed elevated firing at the feeder exit and end zone, only at the end zone (the maze location that marked the beginning and end of each lap) was there a development of aDLS task bracketing.
Individual Neurons from Different Regions Displayed Distinct Patterns of Firing Rate Response
To understand how the different structures were responding to the reward contingency change in the task, we examined how the firing pattern of individual neurons changed with the contingency change (Figs. 8–10). To do this, we created tuning curves from individual neurons by linearizing the maze and measuring neuronal response on a lap-by-lap basis. In aDLS, cell responses tended to be biased to one side of the maze or the other (Fig. 8). In pDMS, firing rate appeared to be altered by the change of contingency, introduced approximately midway through the session, regardless of maze side (Fig. 9). In CA1, the location of the animal on the maze appeared to govern the neuronal response (Fig. 10).
To investigate whether there were differences of firing patterns of individual neurons between structures, we calculated two correlations of firing patterns across changes between goal (left- or right-side responding) and changes between reward contingency (pre- and postswtich) for each neuron. Specifically, a correlation coefficient was obtained for before vs. after contingency switch laps and for left vs. right laps. To control for lap side and contingency switch, we analyzed correlation coefficients in four different conditions (left laps before the switch vs. left laps after the switch, right laps before the switch vs. right laps after the switch, left vs. right laps before the switch, and left vs. right laps after the switch). We plotted these on a two-dimensional plane in order to observe interactions of specific side of the maze and laps that came before and after the change of contingency. aDLS neurons displayed different patterns of firing for left and right laps more than pDMS and HC.
Correlation coefficients of aDLS cells were consistently lower when correlating firing rates on left vs. right laps, with a broad range of correlation coefficients for before- vs. after-switch firing rates in all four conditions (Fig. 11). Compared with pDMS and CA1, correlation coefficients for left vs. right firing rates were lowest in aDLS (P < 0.0001, compare Figs. 12 and 13, see Fig. 14). The HC had several place cells with low correlations between left and right laps; however, these cases all occurred when the place field was on a return path. Whenever a place cell was located on the navigation sequence, it appeared that firing rate was consistent between left and right laps as well as before and after the switch (Fig. 10).
Previous experiments have reported that HC place cells sometimes change their place fields (remap) or the firing rate within a given field (rate modulation) on shared paths. These changes are called “splitter cells” because they split the cell's spatial response by a nonspatial trajectory component (Catanese et al. 2014; Wood et al. 2000). In CA1, we observed only a few examples of place field remapping on the navigation sequence (2/17 CA1 cells with place fields on the navigation sequence), consistent with our observations of tuning curve plots. The other cells whose place fields were at the same location showed minimal rate modulation either between right and left turns [ratio of firing rate, 1.77 ± 1.32 SD, not significant (NS)] or between before and after the switch of reward contingency (1.92 ± 0.73 SD, NS). For the splitter cell and rate modulation analyses, HC and striatal (see below) results were analyzed by comparing the ratio of firing rate to 1, since an absence of change in firing rate would equal a 1-to-1 ratio. t-Tests were corrected with Bonferroni statistics [0.05/n (regions)].
To compare striatal responses to task and behavioral changes, we applied these same splitter-cell analyses to the striatal data. In both aDLS and pDMS, almost half of the cells on the navigation sequence showed some trajectory-related modulation. In aDLS 15 of 37 cells changed their maze-related response on the navigation sequence, and in pDMS 7 of 16 cells changed their response. A χ2-test found a trend of a main effect in the number of splitter cells in aDLS, pDMS, and HC but did not reach significance [χ2(2)= 5.106, P = 0.08]. Of the half that did not change their maze responses, aDLS showed more rate modulation than either HC or pDMS between left and right turns [ratio of firing rate, 2.63 ± 1.85 SD, t(21) = 4.14, P < 0.008] while pDMS showed less (ratio of firing rate, 1.25 ± 0.25 SD, NS). A one-way ANOVA showed a main effect comparing ratio of firing rate in all regions between left and right laps [F(2) = 3.17, P = 0.05]. On the other hand, pDMS appeared to show more rate modulation between before and after the switch of reward contingency. Surprisingly, higher variation in pDMS resulted in nonsignificant rate modulation (2.37 ± 2.06 SD pDMS, NS), while rate modulation in aDLS was significant [1.97 ± 1.11 SD aDLS, t(15) = 3.47, P < 0.008]; however, a one-way ANOVA did not show a main effect when comparing ratio of firing rate between all regions before and after the switch of reward contingency [F(2) = 0.35, P = 0.71].
During probe sessions, we introduced a change to the task requirements for reinforcement that brought about a change in behavioral strategy (Fig. 3). We would expect neuronal response to reflect this strategy shift if an area was partially mediating this behavioral change. Observation of individual neurons found that pDMS neurons showed different firing rates before and after the switch (Fig. 12). Results from the correlation analysis showed that firing rate of before- vs. after-switch laps was altered more in pDMS than in aDLS and CA1 (lower correlations, Fig. 14, for each comparison P < 0.0001). Although the firing rates of many pDMS neurons were equally correlated on left vs. right laps and before- vs. after-switch laps, a number of pDMS neurons were less correlated on before- vs. after-switch laps than left vs. right laps (Fig. 12) and significantly more so than any other structure (Fig. 14). This was the case, even when subjects were performing the same actions during the same sequences, as evidenced by example cells showing differential firing patterns before and after the switch on a location of the maze where the rat would perform similar movements (Fig. 12).
Neurons in pDMS were less likely to change their neuronal response based on the lap side, compared with aDLS neurons (Fig. 14). Unlike aDLS, which has been shown to respond to specific actions, such as taking a left turn or arriving at the left feeder, results from the present study indicate that pDMS neurons may encode specific strategies, reflecting the current action-reward contingency. Although some reports have suggested that HC neuronal activity changes in response to a behavioral strategy shift, we did not observe this—our results indicated that neurons in the CA1 did not remap to new locations after the contingency change (Fig. 13). HC cells did show some rate modulation (Fyhn et al. 2007) on the navigation sequence across the contingency changes, but the modulation levels were not significant. Thus, unlike pDMS neurons, the changes in CA1 tuning curves were likely a consequence in the change in behavior (being more likely to go right vs. left or vice versa) and did not reflect the behavioral strategy change. This is likely due to the way that the animals were trained to expect multiple reward contingencies within a single environment (Fuhs 2006).
Current theories of decision making suggest that there are multiple decision making processes, including goal-oriented (action-outcome, deliberative) and habit-based (chunked action chains, procedural) that accomplish tasks by different information processing algorithms instantiated through different interacting anatomical structures. This hypothesis implies that different structures should provide different representations and those different representations should reflect information processing differently. Current theories have suggested that HC plays an important role in the goal-oriented system while aDLS plays an important role in the habit-based system. Although lesion data have suggested a role for pDMS in the goal-oriented system, neural ensembles therein have not been explored.
We found marked differences in aDLS, pDMS, and HC neuronal responses to a reward contingency change on a spatial navigation task on which rats automated their behavior, reverted to a goal-oriented decision process, and then reautomated their behavior. Neuronal firing in pDMS reflected changes in reward contingency, more so than either aDLS or HC (CA1) neurons. In contrast, aDLS developed firing at the beginning and end of laps that tracked behavioral performance on the task (task bracketing). CA1 tuning curves did not appear to change with changes in behavior. Instead, CA1 neurons displayed typical place cell activity at different locations on the maze that remained stable within sessions, consistent with a cognitive map that had already been formed for this well-learned task.
Looking at how information changed through the decision making process, neuronal ensembles recorded from CA1 showed more forward representation at the choice point compared with aDLS and pDMS. Importantly, CA1 represented sides equally on VTE laps but not non-VTE laps, while aDLS and pDMS represented the chosen side more than the unchosen side on all laps, suggesting that CA1 reflected the searching process itself, while the striatal representations reflected the selected action. On non-VTE laps, we expect that the rat was already aware of its target destination, and thus CA1 ensembles reflected the path to the current goal of the rat (Gupta et al. 2012; Wikenheiser and Redish 2015a, 2015b) However, on VTE laps, we expect that the rat was deliberating over multiple possibilities and the CA1 ensembles reflected the search process examining both goals. This distinction is consistent with previous observations in HC neural ensembles (Johnson and Redish 2007). Consistent with previous experiments, aDLS ensembles did not show strong forward representations (van der Meer et al. 2010), and what little forward information they did represent reflected the chosen option. Interestingly, although pDMS has been identified as playing an important role in goal-oriented (flexible, deliberative) decision processes (Lex and Hauber 2010a, 2010b; Yin et al. 2005a, 2005b; Yin and Knowlton 2004), the pDMS ensembles appeared more like aDLS ensembles than HC ensembles, with limited forward representations and a preference for the chosen side, even on VTE laps.
Recognizing a change in the environment and adjusting behavior appropriately is essential for survival. Different cortical substrates are involved in different environmental/reward-related behavioral changes. For example, adjusting behavioral strategies, reversal learning, and contingency degradation are all mediated by different cortical areas (Corbit et al. 2002; Lex and Hauber 2010a, 2010b; Lucantonio et al. 2014; Ragozzino 2007; Schoenbaum et al. 2002). Common among the evaluation of each change to the environment is the necessity of flexibly associating the outcome with the preceding action. In the HWM-LRA task, this entails the recognition that an action no longer leads to reward and the subsequent adjustment of behavior. Several recent studies have implicated the pDMS and its related input structures mediating these sorts of reward-related behavioral shifts (Corbit et al. 2002; Izquierdo et al. 2004; Killcross and Coutureau 2003; Lex and Hauber 2010a, 2010b; Shiflett et al. 2010; Yin et al. 2005a, 2005b), making pDMS/orbitofrontal and pDMS/prelimbic circuits important for altering behavior when an unexpected variation occurs in the environment. Recent studies indicate that cortical areas may evaluate state changes (for reviews, see Lucantonio et al. 2014; Ragozzino 2007; Torregrossa et al. 2008) and pDMS may integrate information from cortical areas into appropriate actions (Kimchi and Laubach 2009; Stalnaker et al. 2012). Results from the present study support this idea, with pDMS neuronal patterns reflecting different behavioral strategies more than aDLS or HC neurons.
In stable reward delivery contingencies, as the animal realizes that the same actions consistently lead to desired outcomes, goal-directed behavior typically transitions to more automated behaviors. Goal-directed behavior is cognitively intensive, since planning for future outcomes is a computationally expensive operation that must occur before action selection. Automating behavior is a way to optimize benefit from an environment. Thus situations (e.g., stimuli) associated with actions that have consistently led to reinforcement eventually come to release appropriate action chains (Adams 1982; Daw et al. 2006; Dezfouli et al. 2014; van der Meer et al. 2012) These stimulus-action associations are cached and controlled by sensorimotor circuits in the basal ganglia, such as the aDLS (Everitt and Robbins 2013; Graybiel 1998, 2008; Hikosaka et al. 2002; Miyachi et al. 2002; Packard and McGaugh 1996; Yin and Knowlton 2006).
Previous studies have reported the reorganization of neuronal activity in the aDLS with increased experience on a task (Barnes et al. 2005; Jog et al. 1999; van der Meer et al. 2010; Thorn et al. 2010). Graybiel and colleagues have reported increases in neuronal activity across several training states at the beginning and end of action sequences (task bracketing) on a cued T-maze, recently reported to underlie habit-based behavior (Smith and Graybiel 2013). We found that task bracketing in the aDLS developed along with increased performance within the task but was disrupted when a change of reward contingency was introduced, correlating with both decreased performance and increased revaluation.
Studies in primates suggest anatomical and functional differences between the rostral/caudal regions of the caudate and putamen (Miyachi et al. 1997, 2002), similar to the dorsolateral vs. dorsomedial striatal differences in rodents (Devan et al. 1999; Yin and Knowlton 2004, 2006), with recent rodent studies suggesting more of a role of posterior dorsomedial (pDMS) than anterior dorsomedial (aDMS) in flexible learning (Yin et al. 2005b; Yin and Knowlton 2004). In primates, a recent study reported that the head of the caudate is important for flexible learning and the tail is important for more stable information processing (Kim et al. 2014); our results suggest more stable processing in the rodent aDLS and more flexible processing in the rodent pDMS.
Previous work has reported that both HC (O'Keefe and Nadel 1978; Redish 1999) and aDLS (Mizumori et al. 2004; Schmitzer-Torbert 2004; Yeshenko et al. 2004) neurons were spatially tuned on a task in which spatial cues provided information about how to obtain rewards; however, on a spatial task in which spatial cues did not provide information about how to obtain rewards, only HC neurons were spatially tuned (Berke et al. 2009; Wikenheiser and Redish 2011) while aDLS neurons were not (Berke et al. 2009; Schmitzer-Torbert and Redish 2008). We found that HC and aDLS neurons responded in a similar fashion to spatial context in the present study, such that many neurons in both structures were spatially tuned to one side of the maze or the other. Neurons in the aDLS had more of a tendency to respond differently to left and right laps compared with neurons in any of the other structures, even on the navigation sequence. In contrast, HC place fields only differentiated left and right laps when the fields were on the return arms. This is inconsistent with previous studies that have found rate modulation and splitter cells (Ferbinteanu and Shapiro 2003; Frank et al. 2000; Wood et al. 2000) and may be due to differences of training and proficiency on the tasks, since the differentiation of activity depending on context correlates to task performance (Ferbinteanu and Shapiro 2003). Further differentiation of aDLS and HC was evident in the task-bracketing index measure, where only aDLS (and not HC) neurons developed task bracketing. Interestingly, pDMS did not develop task bracketing either, suggesting that among these three structures aDLS plays a unique role in the action chunking of the habit-based (procedural) decision system.
Our results suggest that separate circuits in the basal ganglia integrate relevant cortical information during automation of actions and the recognition of when new behaviors are needed to continue obtaining rewards. Subregions of the dorsal striatum, such as the aDLS and pDMS, integrated different information, with aDLS neurons developing bracketing patterns of firing along with behavioral performance and pDMS correlating with changes in the reward delivery contingency. HC neurons played a different role entirely, with an already available cognitive map (on this well-learned maze), on which search processes could play out. Interestingly, these search processes were not seen in either aDLS or pDMS.
Funding for this work was provided by National Institutes of Health (NIH) Grants MH-080318 and DA-030672 (A. D. Redish), a training fellowship on NIH T32 Grant DA-007234 (P. S. Regier), and Japan Society for the Promotion of Science (JSPS) KAKENHI-11J06508 (S. Amemiya).
No conflicts of interest, financial or otherwise, are declared by the author(s).
Author contributions: P.S.R., S.A., and A.D.R. conception and design of research; P.S.R. and S.A. performed experiments; P.S.R., S.A., and A.D.R. analyzed data; P.S.R., S.A., and A.D.R. interpreted results of experiments; P.S.R., S.A., and A.D.R. prepared figures; P.S.R., S.A., and A.D.R. drafted manuscript; P.S.R., S.A., and A.D.R. edited and revised manuscript; P.S.R., S.A., and A.D.R. approved final version of manuscript.
Present address of P. S. Regier: Center for Studies of Addiction, Department of Psychiatry, University of Pennsylvania, Philadelphia, PA.
- Copyright © 2015 the American Physiological Society