Operant conditioning of neural activity has typically been performed under controlled behavioral conditions using food reinforcement. This has limited the duration and behavioral context for neural conditioning. To reward cell activity in unconstrained primates, we sought sites in nucleus accumbens (NAc) whose stimulation reinforced operant responding. In three monkeys, NAc stimulation sustained performance of a manual target-tracking task, with response rates that increased monotonically with increasing NAc stimulation. We recorded activity of single motor cortex neurons and documented their modulation with wrist force. We conditioned increased firing rates with the monkey seated in the training booth and during free behavior in the cage using an autonomous head-fixed recording and stimulating system. Spikes occurring above baseline rates triggered single or multiple electrical pulses to the reinforcement site. Such rate-contingent, unit-triggered stimulation was made available for periods of 1–3 min separated by 3–10 min time-out periods. Feedback was presented as event-triggered clicks both in-cage and in-booth, and visual cues were provided in many in-booth sessions. In-booth conditioning produced increases in single neuron firing probability with intracranial reinforcement in 48 of 58 cells. Reinforced cell activity could rise more than five times that of non-reinforced activity. In-cage conditioning produced significant increases in 21 of 33 sessions. In-cage rate changes peaked later and lasted longer than in-booth changes, but were often comparatively smaller, between 13 and 18% above non-reinforced activity. Thus intracranial stimulation reinforced volitional increases in cortical firing rates during both free behavior and a controlled environment, although changes in the latter were more robust.
NEW & NOTEWORTHY Closed-loop brain-computer interfaces (BCI) were used to operantly condition increases in muscle and neural activity in monkeys by delivering activity-dependent stimuli to an intracranial reinforcement site (nucleus accumbens). We conditioned increased firing rates with the monkeys seated in a training booth and also, for the first time, during free behavior in a cage using an autonomous head-fixed BCI.
- intracranial reinforcement
- operant conditioning
- neural activity
- free behavior
volitional control of neural activity is critical for reliable and robust control of brain-machine interfaces (BMI). Indeed, BMIs can be seen as a form of neurofeedback that allows the user to see the consequences of neural activity and change that activity to optimize control of the external device (Fetz 2007). However, BMI control is only a subset of the possible range of volitional control of neural activity that can be explored directly with operant conditioning. Traditional techniques for operant conditioning of behavior in monkeys have limited the scope of investigation to specific tasks, using food reward and visual feedback delivered in a training booth. Constrained, task-related movements differ from natural behavior, and correlations between neural activity and movement established under particular task conditions may not hold under nontask conditions (Aflalo and Graziano 2006; Caminiti et al. 1990; Jackson et al. 2007). The vast majority of nonhuman primate research involving trained behavior has employed rewards in the form of food or water (Carmena et al. 2003; Jackson et al. 2006; Taylor et al. 2002), further limiting the circumstances in which neural activity was explored. Here we present a novel mechanism for rewarding neural activity during natural behavior using a closed-loop system delivering neurally contingent brain stimulation reward (BSR).
Olds and Milner (Olds 1958; Olds and Milner 1954) demonstrated that rats would press bars and navigate mazes for BSR, which could reinforce operant responding as effectively as more conventional food and liquid rewards. Later work by David Hiatt (1972) attempted to condition increases in single-unit activity using burst-triggered BSR in rats. As candidates for conditioning, he sought cells in hippocampus, cerebellum, midbrain and superior colliculus that were not movement related. Recently, BSR was used to elicit rate increases in prefrontal cortex neurons of freely behaving rats (Widge and Moritz 2014). The ability of freely moving rats to differentially control small groups of cortical neurons was demonstrated with food reward and continuous auditory feedback (Koralek et al. 2012).
Several studies have explored the efficacy of BSR in nonhuman primates. In a freely behaving chimpanzee, Delgado et al. (1970) deployed wireless closed-loop stimulation of reticular formation sites contingent on oscillations in amygdala field potentials. The triggering neural oscillations disappeared after a day of activity-dependent stimulation, indicating that this form of stimulation was aversive. Later work showed that monkeys will perform simple bar-press tasks for BSR in several structures, including the orbitofrontal cortex, lateral hypothalamus, amygdala, medio-dorsal nucleus of the thalamus and nucleus accumbens (NAc) (Bichot et al. 2011; Bowden et al. 2015; Briese and Olds 1964; Rolls et al. 1980; Routtenberg et al. 1971).
An interesting open question is whether monkeys can learn to control activity of single neurons with intracranial electrical stimulation as the sole source of reinforcement. This would allow operant conditioning to be performed during prolonged periods of free behavior, providing extended time and behavioral range to learn volitional control of neural response patterns. BSR would enable delivery of reinforcement that is temporally more precise than food or water rewards, and less disruptive of ongoing behavior. In this study, we sought to operantly condition activity of motor cortex neurons and electromyographic (EMG) activity of proximal limb muscles, using activity-contingent BSR at sites confirmed to sustain behavior in a target-tracking task. To compare the effects of the environment, we conditioned these activities, both in the training booth and as the monkeys moved freely about their home cage.
Subjects and Training
We used three male Macaca nemestrina monkeys: P, D and J (4–6 yr old, weight 6.0, 5.6 and 4.0 kg). All surgical, training and handling procedures were approved by the University of Washington Institutional Animal Care and Use Committee.
Before surgeries, monkeys were trained to perform a one-dimensional center-out force target-tracking (FTT) task in which isometric wrist torque controlled the position of a cursor on a screen. When the cursor entered a target and remained inside for the required time (≤1 s), a fruit sauce reward signaled completion of the trial. Target placement on the screen determined the required direction and magnitude of flexion or extension torque about the wrist. Peripheral targets were presented in random order with equal probability. Training was complete when monkeys moved directly from center to each target and held it inside for at least 1 s. During experiments, the FTT task was performed daily to elicit task-related cell firing in motor cortex.
Surgery and Implantation
Cranial microwires and arrays of up to 16 cannulas were implanted in each monkey. The microwire arrays (Jackson and Fetz 2007) were positioned to advance along layer V in the caudal bank of the precentral gyrus, where somata of many force-correlated cells (including corticomotoneuronal cells) have been identified (Rathelot and Strick 2009; Smith and Fetz 2009). The cannulas were positioned stereotaxically to guide subsequent stimulating electrodes to the NAc. Cannula-length stylets were placed in all guide tubes, and the protruding surface of the array was sealed in silastic. The open space between craniotomy and array was packed with antibiotic-infused gelfoam. An acrylic base around the implantation site and surrounding cranial screws formed the base for a cylindrical titanium chamber enclosing the microwire cannula arrays and neurochip (NC) (Zanos et al. 2011). Rhodes SNEX-100 concentric bipolar electrodes were inserted subsequently into the cannulas after cold-sterilization of the chamber interior and electrodes with cidex.
To identify potential intracranial reinforcement electrode implant sites, we coregistered a magnetic resonance image (MRI) and digitized brain atlas data (National Primate Research Center, 1991-present; Bowden et al. 2017) to determine the stereotaxic coordinates of prospective midbrain reinforcement loci (Fig. 1). Monkey P underwent MRI scanning before surgical implantation. Monkeys D and J were of similar size as atlas subjects, so MRIs were not deemed necessary. We selected coronal image slices located +3 mm rostral from the anterior commissure that contained the largest cross section of the NAc. Stereotaxic coordinates of the target locus were measured relative to medial-lateral center and ear-bar zero. A straight-line diagonal path to the target locus (center of NAc) that was 15° lateral right with respect to the dorsoventral axis in the right hemisphere avoided major blood vessels and regions governing autonomic function. To address the possibility of positioning error of entry sites, we implanted an array of 16 parallel cannulas spaced 1-1.5 mm apart in a 10 × 10 mm grid, centered at the best point of entry. Thus, in cases of slight misalignment of angle or entry location, the target locus might still be reachable by an electrode inserted in one of the neighboring cannulas. Following implantation, unused cannulas were occluded with stylets and sealed with silastic to block potential cranial infection.
In monkey P, in addition to cranial implant procedures, we implanted pairs of EMG wires in three proximal muscles of the monkey’s right arm: the biceps brachii, triceps brachii and lateral deltoid. Muscle activity was first operantly conditioned to verify efficacy of BSR in free behavior. The EMG wires were routed subcutaneously around the shoulder, up the back and neck and terminated in connectors located inside the cranial chamber for signal processing by the NC.
Verification of BSR
To identify intracranial brain sites whose stimulation sustains operant responding, we compared response rates occurring during reinforcement (R) and visual feedback-only (FO) blocks in a FTT task. During R blocks, each completed flexion or extension target hold triggered BSR. In FO blocks, no stimulation was delivered, regardless of task performance, but the FTT task could be performed. R and FO blocks were interleaved with non-reinforcing (NR) blocks in which neither feedback nor reward were available. Stimulation consisted of trains of symmetric biphasic square-wave current pulses. A low-frequency tone during R blocks served as a discriminatory stimulus (in addition to FTT task auditory cues for target acquisition). Candidate sites were considered to be “positively reinforcing” when monkeys performed wrist FTT at significantly greater rate during R blocks than during FO blocks.
Rate-Contingent Spike-Triggered Stimulation
Validated BSR sites were used to operantly condition cortical cell and muscle activity in two different settings: a traditional in-booth setting using rack-mounted equipment for recording and stimulation, and an in-cage setting using the NC system (Fig. 2, A and B). The NC employs an autonomous, battery-powered computer chip programmed to detect and reward cell and muscle activity while monkeys moved freely about their cages (Mavoori et al. 2005). It discriminated cortical cell or EMG activity patterns using dual time-amplitude window discrimination and delivered stimuli contingent on discriminated events in real time. The high-voltage neurochip 2 (NC2-HV) is a second-generation version with improved capabilities for storage, processing and stimulus range (Zanos et al. 2011). Alternating R/NR reinforcement schedules were used to distinguish the effects of BSR in the operant conditioning paradigm. FO blocks were not used during these experiments. The in-booth experiments utilized audio and visual feedback to distinguish between the periods, whereas the in-cage experiments relied solely on audio feedback. The in-booth experiments lasted between 1 and 6 h, whereas the in-cage free-behavior sessions lasted considerably longer: 3–20 h.
During alternating R/NR conditioning, we approximated instantaneous firing rate in real-time using two methods, depending on the environment (Fig. 2). For most in-booth sessions, spikes were discriminated with two time-amplitude windows, and each spike event triggered a 1-ms-wide square pulse. The pulse train output (Fig. 2C, bottom) was low-pass filtered (τ = 50 ms) and amplified using an analog leaky integrator. These operations produced a continuous signal (Fig. 2C, green trace) that controlled cursor movements on the display in front of the animal, providing visual feedback of rate relative to target (Fig. 2A). When the activity-controlled cursor entered the target, all subsequent in-target spike events triggered stimulation of the reinforcement site. Stimulation events were often also used to trigger auditory clicks. We initially set the target position just above baseline firing rate and gradually raised its position over the course of conditioning to elicit higher spike rates. Targets were presented only during R periods of the alternating R/NR task.
For in-cage sessions (Fig. 2B), we preprogrammed the NC to perform a real-time sliding window operation to estimate instantaneous spike rate (Fig. 2C, blue trace). The NC counted the number of spike events within a 500-ms-wide moving window that advanced every 10 ms. The NC delivered spike-triggered stimuli on spike events that occurred when this estimated rate exceeded a threshold frequency (Fig. 2C, red dashed line). Threshold was determined from FTT or in-booth R/NR task response averages that revealed baseline and maximum firing rates of the particular cell. Typically, in-cage stimulation thresholds were set at 75% of the observed maximum firing rate of the candidate cell. In later sessions, the NC governed operant conditioning sessions in both the training booth and cage, to directly compare the effects of environment.
Before conditioning, durations of alternating R and NR periods were randomly selected, with replacement, from uniform distributions spanning 1–2 min for R and 3–5 min for NR. We employed random period durations, within limits, to reduce the monkeys’ ability to anticipate transitions in the reinforcement schedule.
Time series analysis detects rate changes in the alternating R/NR task.
To determine whether firing rates during R and NR periods were significantly different, we calculated time-averaged rates during R and NR periods over each conditioning session (e.g., see Fig. 5, left) and pooled them to show rate difference between R and NR periods overall (see Fig. 5, right). Confidence intervals for the time-averaged means were computed using a nonparametric bootstrap method based on the Poissonian property of independent interspike intervals (Dayan and Abbott 2001). Specifically, interspike intervals from each period were randomly drawn with replacement and then summed until their cumulative duration nearly matched the period duration. The number of events comprising the drawn sample divided by period duration produced an estimate of time-averaged rate. Repeating the process 499 times generated a bootstrap distribution of time-averaged rates from which the surrounding 95% confidence interval was determined for each period (T-bars, see Fig. 5). To detect statistically significant patterns in neural activation produced by reinforcement, we computed serial correlation and von Neumann ratio test statistics on the sequence of alternating R-NR-R … time-averaged rates for each conditioning session. These statistics and methods of significance appraisal are described in detail in Eaton (2014).
Peritransition spike activity plots and spike shuffling.
To document changes in neural activity around the transitions between R and NR periods, we compiled peritransition histograms of spike activity (see Figs. 6–8). Snippets of the spike trains from 75 s before to 75 s after each transition were extracted and combined into perievent spike histograms (binwidth = 50 ms) (e.g., see Fig. 6, black histograms) and consolidated into a single dense train that was convolved with a Gaussian kernel (see Fig. 6, solid red). Shuffled spike rates were obtained by drawing samples with replacement from the list of observed spike events and similarly smoothed (see Fig. 6, solid gray). The process was repeated 199 times to generate a bootstrapped distribution of rate traces from which confidence interval boundaries were calculated (Davison and Hinkley 1997; Eaton 2014) (see Fig. 6, dashed gray). Domains in which the observed rates diverged outside the confidence interval of the shuffled rates indicate features in peritransition spike activity that could not be explained as random fluctuation.
Accumbens Stimulation Reinforces Target-Tracking Behavior
We tested the efficacy of candidate reinforcing sites by measuring the monkeys’ rate of responding in a manual FTT task, which they had been trained to perform with applesauce reward. At effective sites, trains of stimuli (twenty-five 1-mA pulses at 50 Hz) delivered upon completion of 1-s force holds reinforced further responding. As shown in Fig. 3A, the monkey responded at regular rates during R periods when target completions triggered trains of BSR. Response rates during R periods were significantly higher (P < 0.001) compared with interleaved periods during which only feedback was presented and no stimulation was delivered (FO periods). At the onset of the R periods, which were cued by a tone, response rates often returned quickly to those of the previous R period. As a comparison, FTT task response rates for applesauce reward typically ranged between 10 and 13 responses per minute for the three monkeys.
Target-Tracking Rates as Function of BSR Parameters
To determine appropriate stimulation parameters for conditioning cortical cell activity, we documented rates of target-tracking responses for different values of three BSR parameters: current intensity, pulse frequency and number of pulses per stimulus train. Each of these parameters was varied, while the other two remained fixed. Fixed values were 1 mA for current intensity, 50 Hz for pulse frequency and 25 pulses per train. For each varied parameter, the values in the desired range were repeated 10 times, delivered in a randomized sequence, to eliminate possible “history effects.”
Figure 3B depicts target-tracking response rates as a function of each stimulus parameter in monkeys P and D. In all cases, the response rates R as a function of the tested stimulation parameter r were well characterized by nonlinear-regression-fitted curves of the Law of Effect model:(1)where rth is the threshold level, or lowest value at which the stimulus parameter supported self-stimulation, and re represents the aggregate reinforcement for all nonoperant responses (Herrnstein 1970). Table 1 summarizes fit statistics for each of the plots. The response curves indicate that ∼80–90% of maximal responding (horizontal asymptote of each plot) occurred for stimulation parameters 1 mA and 50 Hz.
In subsequent cell and muscle conditioning experiments, pulse amplitude was set to 1 mA. Bursts of elevated spike rates triggered pulse trains at frequencies approaching 50 Hz. For slowly firing cells (e.g., <10 Hz), multiple stimulus pulses (delivered at 50 Hz) were triggered for each rate-contingent spike-triggered (RCST) stimulus event.
Muscle Activity Reinforced during Free Behavior with BSR
To confirm the efficacy of BSR sites during free behavior, we tested in-cage conditioning using EMG activity of upper limb muscles as the operant in Monkey P. The time-amplitude window discriminator detected biphasic waveforms in the multiunit EMG signal (Fig. 4A, right) and generated acceptance pulses whose frequency increased with intensity of muscle contraction. During R periods, the mean rates of biceps EMG-generated pulses were significantly larger than during intervening NR periods (Fig. 4A, left), and the monkey was observed to flex his arm during R periods. With biceps conditioning, these differences were maintained for up to 20 h of conditioning. Significant differences were also seen with triceps conditioning (Fig. 4A).
The transitions between periods of R and NR showed further evidence of learning to perform the biceps responses. Separate averages around these transitions for the initial, middle and final third of the session (Fig. 4B) show progressive changes in responding over the course of the conditioning session. For the NR-to-R transitions, rate increases were comparatively low and gradual during the first 6 h, moderate during the middle period, and greatest and fastest during the last 6 h. Interestingly, the R-to-NR transitions exhibited a brief increase in responding after the cessation of R for the first and middle thirds of conditioning (arrow), and no such peak in the last third. Since the monkey had no discriminative stimulus to distinguish R and NR, this behavior is consistent with initial attempts to sustain reinforcement that drop out after sufficient experience with the transition. The raster plots in Fig. 4C show color-coded rates for the individual transitions and their variability in more detail. These data confirm that BSR can effectively reinforce an operant, muscle activity, for long periods of time during free behavior.
Overview of Cell Conditioning Sessions
Table 2 summarizes results from all sessions in which cortical cell activity was conditioned with BSR for the three monkeys, categorized by environment: booth or cage. Given sufficient stability and unit isolation, we often conditioned the same cell over repeated sessions. Determining the appropriate conditioning procedures included ~70% of in-cage attempts that were deemed invalid for one or more of the following reasons: 1) NC malfunction; 2) loss of action potential isolation; and 3) improper conditioning parameters.
Spike-Triggered NAc Stimulation Reinforces Increased Motor Cortex Cell Activity
During R periods, the monkeys received spike-triggered BSR when the instantaneous spike rate exceeded a predetermined threshold. Table 3 summarizes conditioning parameters used for each of the illustrated sessions.
Figure 5, A–C, shows average motor cortex neuron spike rates during three representative conditioning sessions performed in the training booth. Robust increases in firing rates were observed during R periods compared with the intervening NR periods, showing successful acquisition of the neural operant. In all plots, rates were significantly greater in R than NR periods, as indicated by predominantly non-overlapping confidence intervals. Figure 5D shows an in-cage conditioning session in which monkey J moved freely about his home cage, and the NC2 delivered RCST accumbens stimulation in an alternating R/NR schedule over 8 h. Average firing rates were statistically greater in R period compared with NR periods; however, these differences were smaller than those observed for typical in-booth-conditioning sessions.
The alternating rate patterns described above give rise to robust, statistically significant time series measures, namely serial correlation and von Neumann’s ratio (Eaton 2014). Alternating rates are obvious from inspection of in-booth conditioning sessions, but are less apparent for the in-cage session. Serial correlation and von Neumann’s ratios measure pattern in time series from which statistical significance can be approximated through randomization and Monte Carlo approximation methods. These analysis techniques confirm significant patterns in these series of time averages that might otherwise not be evident (Eaton 2014).
Box plots (Fig. 5, right) summarize distributions of R and NR time averages across each session. For both monkeys, NR distributions have lower medians and were less variable than the R group distributions. These differences are statistically significant in all four examples, as assessed by the Kruskal-Wallis test.
Peritransition Activity Patterns
For further insight into behavioral mechanisms, we documented the changes in firing rates associated with transitions between R and NR periods. Figure 6 shows histograms and smoothed rate traces of neuron spike trains during NR-R and R-NR transitions. For comparison, the overall average rates and 95% confidence intervals are illustrated by gray solid and dashed lines, respectively. Statistically significant deviations from chance occur where the red rate trace exceeds the “chance band.” Two sets of peritransition averages, one for monkey J and one for monkey D, exemplify robust rate increases observed across NR-R transitions while the animals underwent RCST stimulation conditioning while under restraint in the training booth. In session J1–1 (Fig. 6B), monkey J produced a fourfold increase in motor cortex cell spike rate and kept rates elevated, on average, for the full duration of R. During in-booth sessions, activity peaked early, usually within 10 s following the NR-R schedule transition, and then decayed over the remainder of each reinforced period. During in-cage conditioning, activity peaked later in the R period. Spike activity dropped quickly following R-NR transitions, both in-booth and in-cage. However, as shown in Fig. 6C, NR spike activity tended to be more variable in the cage than in the booth.
Instrumentation in the training booth allowed us to record wrist torque during unit conditioning. In all examples, motor cortex neurons modulated their activity during dynamic and/or static phases of the FTT task. Peritransition averages of the isometric torque signals show increased torques during R periods that accompanied spike rate increases and corresponding reduction of torque generation during NR with lower cortical spike rates (Fig. 6, A and B).
Consistent with the parallel analysis of sequential time-averages (Fig. 5, left), the increases in spike rates across NR-R transitions were greater in cells conditioned in-booth than in cells conditioned in-cage.
Rate Changes of Motor Cortex Cell Spike Activity Conditioned In-Cage
As with EMG activity (Fig. 4), for in-cage unit conditioning, the relative increases in BSR-reinforced spike activity were smallest, compared with NR period activity, during the first third and greatest during the final third of the session (Fig. 7). A transient increase in spike rate also followed R-NR transitions, when high-frequency spike bursts no longer triggered NAc stimulation. A similar postextinction burst effect was seen in R-NR peritransition averages of in-cage conditioned biceps activity (Fig. 4B) of the first and middle third session averages. Unlike muscle conditioning, however, the extinction burst in spike activity, although markedly reduced, did not completely disappear during the final third of the unit-conditioning session.
Cell Conditioned in Both Environments Reveals Greater Efficacy of In-Booth Conditioning
The above evidence suggests that greater conditioning effects were obtained during in-booth conditioning with restraint and visual feedback than during in-cage sessions with free behavior. This could have been due to the slight difference in reinforcement paradigms (Fig. 2), as well as environment. For a definitive comparison, we conditioned the same cell, using identical conditioning parameters, both in the training booth and as monkey J moved freely about his cage. Figure 8 shows rates when spikes from a motor cortex neuron triggered NAc stimulation during elevated firing rates. Stimulation was available during 2-min R periods alternating with 5-min NR periods. During the first hour, the monkey underwent unit conditioning while he moved freely about his cage; he was then transferred within 6 min to his training booth and restrained. The NC delivered identical conditioning stimulation in both environments. During R periods, single 1-mA biphasic pulses were delivered to NAc on each event that exceeded 30 counts within a 500-ms-wide sliding window updated every 10 ms. Figure 8A plots cell spike activity as time-averaged rates. Horizontal dashed lines show group means of R and NR intervals for each environment (red and black, respectively). The NC generated an auditory click on each stimulation pulse event to provide a discriminative stimulus. No visual feedback was provided in either environment.
The progression of alternating time averages of R and NR cortical cell firing rates show statistically significant increases during periods of BSR R compared with the intervening NR periods, both in the training booth and the end of in-cage conditioning. Comparisons between distributions of pooled R and NR time averages show statistically significant increases during R (Fig. 8C), in both the cage and the booth. The group median of NR period averages during in-cage conditioning (25 Hz) was substantially greater than the median of the NR group during in-booth conditioning (13 Hz), indicating higher baseline rates during free behavior. Peritransition firing rates (Fig. 8B) also show higher baseline activity during in-cage than in-booth NR periods and show that cell firing peaked midway through the 2-min reinforcement interval.
Accumbens Stimuli Do Not Evoke Cortical Responses
Recent anatomical investigations (Miyachi et al. 2005, 2006) suggest a pathway through which input from the NAc could reach primary motor cortex more directly than the well-established striatal-pallidal-thalamo-cortical circuit (Alexander et al. 1990; Parent and Hazrati 1995). To address this possible confound of direct stimulus-evoked effects in cell firing, we delivered continuous 5-Hz test pulses to the BSR site while recording spike activity of the candidate cell before each conditioning session. None of the candidate cells exhibited statistically significant increases in firing probability at any latency between 0 and 200 ms following single-pulse stimuli delivered to NAc at the current intensity (1 mA) used for BSR. The four representative cases in Fig. 9 show that the 95% confidence intervals surrounding kernel-smoothed traces of the observed spike event sequences (red) did not exceed chance levels (gray), indicating that the modest transient fluctuations in spike probability in these histograms did not achieve statistical significance. Thus striatal-cortico linkage did not contribute directly to increases in cortical cell spike activity during unit conditioning with BSR.
This study shows that firing rates of motor cortical neurons and muscle activity can be operantly reinforced through delivery of rate-contingent stimulation of ventral striatum in nonhuman primates.
We identified BSR sites in NAc whose stimulation reinforced performance of a target-tracking task with reward efficacy comparable to fruit sauce. Systematic testing of stimulus parameters (width, amplitude and frequency) with the FTT task demonstrated response rates consistent with the Law of Effect (Herrnstein 1970). Our stimulation of NAc probably activated fibers that evoked dopamine release, including fibers from the medial forebrain bundle, which connects the ventral tegmental area to NAc and whose stimulation supports operant responding (German and Fetz 1976). Axon terminals of the medial forebrain bundle release dopamine within the NAc on receipt of unconditioned rewards (Hernandez and Hoebel 1988; Wise 1978). Moreover, the reinforcing effects of stimuli that are normally rewarding, such as food, water, drugs of abuse and stimulation of the medial forebrain bundle, are blocked in animals given dopamine antagonists (Wasserman et al. 1982). A significant proportion of macaque NAc neurons modulated their activity during task-contingent delivery of juice rewards (Apicella et al. 1991). Thus the reinforcing effects of our stimuli were likely mediated by activating fibers that released dopamine.
Functional Relationships between Motor Cortex and Striatum
The functional relations between the ventral striatum and motor cortex have been elucidated by anatomical electrophysiological and behavioral studies. Polysynaptic projections from NAc to motor cortex have been revealed by retrograde transsynaptic transport of rabies virus (Miyachi et al. 2006). Conversely, the motor cortex is one of the cortical areas from which the ventral striatum receives input (Takada et al. 1998; Tokuno et al. 1999). Simultaneous recordings of cortical surface electrocorticography and local field potentials in NAc showed evidence for electrophysiological interactions, in a study demonstrating that NAc plays a significant role in recovery of motor function after corticospinal lesions (Sawada et al. 2015). Temporally precise coherence between output-relevant neuronal populations in motor cortex and dorsal striatum developed during learning to control cortical cell activity (Koralek et al. 2013). Despite this evidence for close relations, we found no evidence that our NAc stimuli produced any poststimulus modulation of motor cortex neurons, indicating that the effect of stimulation on firing rates was mediated by behavioral reinforcement.
Activity Correlated with Conditioned Neurons
While BSR was delivered contingent on increases in firing of a single motor cortex cell, larger neuronal populations would obviously have to be coactivated; in particular, other neurons that provide direct and indirect input to the conditioned neuron would also be recruited to drive its rate increases. Such coactivation of large populations was evidenced by associated muscle contractions and neighboring cell activity. During in-booth sessions, the monkey’s conditioned changes in neural activity were often correlated with isometric torques produced around the wrist. This is not surprising, since the neurons chosen for conditioning were modulated during the wrist task. A previous study found that chaired animals allowed to move limbs freely generated a variety of movements associated with operant bursts of the same cell (Fetz and Baker 1973). Given this variability, we did not attempt to document the monkeys’ movements during the in-cage neural conditioning sessions. A more systematic analysis of movements related to operant bursts during free behavior could be pursued using simultaneous neural and video recordings.
In some sessions the activity of a neighboring cell was recorded simultaneously with the reinforced neuron. As illustrated in Eaton 2014, neurons whose cross-correlograms had central peaks indicative of common synaptic drive from upstream sources to both cells could be coactivated or modulated reciprocally in the R/NR periods. These results are consistent with previous studies of synaptic linkages between motor cortex neurons, showing that common inputs are seen for both coactivated and reciprocally activated pairs (Smith and Fetz 2009).
Comparison of Neural Conditioning In-Booth and In-Cage
Learning to control neural activity progressed more slowly during in-cage than in-booth conditioning sessions. In addition, rate increases were smaller and harder to discern for in-cage R period vs. NR periods. Several differences between the two conditioning environments could have contributed to this disparity. First, during in-booth sessions, the monkeys were restrained, with their head and contralateral arms secured. We believe such restraint effectively reduced activity of the movement-related cells during NR periods, providing a lower “baseline” against which increases were measured. Second, most in-booth sessions involved stronger discriminative stimuli (e.g., auditory clicks and a rate-controlled computer cursor) than the barely audible clicks produced by the NC during in-cage sessions. More intense discriminative stimuli are more likely to be effective secondary reinforcers during the conditioning task. Third, the lack of restraint during in-cage conditioning permitted monkeys to explore a much broader range of motor activities. The greater behavioral repertoire provided more distractions when forming response-reward associations, thus requiring longer time to demonstrate acquisition. In contrast, in the training booth, where monkeys had spent many hours performing the FTT task for both food reward and BSR, monkeys likely drew from a much smaller pool of potential reward-eliciting responses when forming neural-response-reward associations. Fourth, the low-pass filtering of neural activity used for most in-booth experiments may have been more effective than the sliding-window method used for in-cage NC sessions (Fig. 2C). This possibility was disproven in a control session in which the sliding-window method was used for both environments: the monkey’s performance was still more robust in the booth, where baseline firing rate was lower (Fig. 8).
Finally, consistent with the re parameter of the Law of Effect model, the in-cage environment introduced additional reinforcers, for example, food, toys, presence of neighboring monkeys and grooming activities, that served to increase competing behaviors to the spike-rate operant. As the collective contribution from all nontask reinforcers, re, increases, the influence of the task-associated reinforcer, r (BSR in our case), on operant responding is effectively reduced, as shown by the mathematical expression of the Law of Effect for response rate (Eq. 1), in which the sum of the two terms r + re comprise the denominator. Since fewer nontask-reinforced response alternatives are available to monkeys in the training booth, the Law of Effect predicts that the rewards paired to the operant response should be more effective than in the cage, where there are many distractions.
Most of the above reasons that efficacy of conditioning during free behavior would be reduced should also have applied for EMG conditioning. However, increased EMG responses proved quite robust for almost 20 h (Fig. 4A). This difference raises the possibility that conditioning of neural activity might be more difficult than that of muscle activity; however that conclusion would be contradicted by many successful unit conditioning studies using conventional rewards (Fetz and Baker 1973; Fetz and Finocchio 1975; Moritz and Fetz 2011). It may be possible that task acquisition itself was faster for EMG conditioning, specifically in the context of free behavior. Thus, while the target muscles were normally active in the monkey’s natural movement repertoire, the relevant neural activity may not have been as readily discoverable in the cage. Since bursts of motor cortex neurons are typically related to many different movements (Fetz and Baker 1973; Fetz and Finocchio 1975), these diverse relations could have undermined the acquisition of any particular effective movement. These hypotheses clearly deserve further investigation.
Investigating neural coding.
Reinforcement of neural activity with BSR during free behavior has the potential of investigating mechanisms of neural coding. In contrast to the conventional coding of information in neural firing rates, the hypothesis that information could be coded in the precise timing of spike activity remains to be proven. The operation of such temporal coding would significantly expand the bandwidth for neural computation (Fetz 1997). While we have demonstrated the ability of BSR to reward increases in firing rates, BSR could also be used to test the volitional control of precise spatiotemporal patterns. If the brain uses such patterns during normal behavior, many of them should be volitionally controllable. The use of BSR to instantly reward the appearance of specific patterns under free conditions would provide ample time for the monkey to discover and repeat the relevant behavioral or cognitive state. This would represent a significant test of the existence of temporal coding in the brain.
This work was supported by National Institutes of Health Grants NS-12542 and RR-00166, and by National Science Foundation Grant DGE-1256082.
No conflicts of interest, financial or otherwise, are declared by the authors.
R.W.E. and T.L. performed experiments; R.W.E. and T.L. analyzed data; R.W.E., T.L., and E.E.F. interpreted results of experiments; R.W.E. and T.L. prepared figures; R.W.E., T.L., and E.E.F. drafted manuscript; R.W.E., T.L., and E.E.F. edited and revised manuscript; R.W.E., T.L., and E.E.F. approved final version of manuscript; E.E.F. conceived and designed research.
We thank Steve Perlmutter, Chet Moritz, Timothy Lucas, Andrew Jackson, and Yukio Nishimura for surgical assistance. Stavros Zanos helped run the in-cage muscle conditioning experiments. Zachary Roberts and Gerick Lee assisted with monkey handling and recording, and Leah Bakst assisted with analysis. We thank Douglas Bowden, Christopher Fiorillo, and Paul Phillips for helpful discussions.
- Copyright © 2017 the American Physiological Society