Choice behavior on simple sensory-motor tasks can exhibit trial-to-trial dependencies. For perceptual tasks, these dependencies reflect the influence of prior trials on choices that are also guided by sensory evidence, which is often independent across trials. Here we show that the relative influences of prior trials and sensory evidence on choice behavior can be shaped by training, such that prior influences are strongest when perceptual sensitivity to the relevant sensory evidence is weakest and then decline steadily as sensitivity improves. We trained monkeys to decide the direction of random-dot motion and indicate their decision with an eye movement. We characterized sequential dependencies by relating current choices to weighted averages of prior choices. We then modeled behavior as a drift-diffusion process, in which the weighted average of prior choices provided an additive offset to a decision variable that integrated incoming motion evidence to govern choice. The average magnitude of offset within individual training sessions declined steadily as the quality of the integrated motion evidence increased over many months of training. The trial-by-trial magnitude of offset was correlated with signals related to developing commands that generate the oculomotor response but not with neural activity in either the middle temporal area, which represents information about the motion stimulus, or the lateral intraparietal area, which represents the sensory-motor conversion. The results suggest that training can shape the relative contributions of expectations based on prior trends and incoming sensory evidence to select and prepare visually guided actions.
Performance on trial-based sensory-motor tasks can exhibit numerous forms of sequential dependence. For example, both response times and choice are sensitive to effects of the previous trial, including repeated actions, task switching, and errors (Botvinick et al. 2001; Cho et al. 2002; Laming 1979 1968; Luce 1986). For tasks in which reward probability depends on sequential patterns of choices, these dependencies can reflect rational strategies for choosing future outcomes based on prior history (Barraclough et al. 2004; Behrens et al. 2007; Corrado et al. 2005; Davidson and McCarthy 1988; Herrnstein 1961; Kennerley et al. 2006; Lau and Glimcher 2005; Sugrue et al. 2004; Williams 1988). In contrast, for perceptual tasks in which the present stimulus is the exclusive factor determining the rewarded choice, these dependencies can only hinder performance. The goal of this study was to better understand how these dependencies evolve as perceptual sensitivity to sensory cues that instruct the rewarded choice improves with training.
We trained monkeys on a one-interval, two-alternative direction-discrimination task in which they decided the direction of motion and indicated their decision with an eye movement to a visual target located in the chosen direction. After learning the visuomotor association, the monkeys became increasingly able to discriminate weaker motion signals as training progressed over many months, a form of perceptual learning (Law and Gold 2008). This improvement in sensitivity was quantified by fitting performance data from individual sessions to psychometric functions describing the relationship between the motion stimulus and choice, assuming the independence of choices across trials. In the present study we extended these analyses to examine dependencies in the trial-by-trial sequences of choices. Our approach included adapting for use with perceptual data a form of Wiener kernel analysis that has been useful for describing sequential choice behavior in nonperceptual or “free-choice” tasks (Corrado et al. 2005; Dayan and Abbott 2001).
We show that the monkeys' strategies for distributing choices at least early in training on the direction-discrimination task were similar to strategies used in free-choice tasks, taking into account both the choices and outcomes from recent trials (Corrado et al. 2005; Kennerley et al. 2006; Lau and Glimcher 2005; Lee et al. 2004). However, for our task information from past trials did not provide a reliable cue for the rewarded alternative on the present trial, which was determined exclusively by the direction of motion of the visual stimulus. Accordingly, as training progressed over many months the monkeys learned to use more effectively information from the motion stimulus to govern their choices and, with a similar time course, suppress the sequential dependencies. We present electrophysiological data suggesting that this interplay between prior history and sensory evidence does not directly affect the representation or interpretation of the stimulus in the brain, but is evident in commands that generate the appropriate oculomotor response. The results imply that training can help to calibrate the relative contributions of sensory and nonsensory factors to select and prepare actions.
We used two adult male (monkeys At and Cy) and two adult female (monkeys Av and ZZ) rhesus monkeys (Macaca mulatta). All behavioral, surgical, and electrophysiological procedures were in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals and were approved by the University of Pennsylvania Institutional Animal Care and Use Committee.
Preparation for experiments
All four monkeys were naïve to behavioral and neurophysiological experiments prior to this study. In preparation for these experiments, each monkey was, in chronological order: 1) trained to sit comfortably in a custom-built primate chair; 2) surgically implanted with a head-holding device, recording cylinder(s) (Crist Instrument, Damascus, MD), and (for monkeys Av and At) eye coil, and given time to recover; 3) imaged using magnetic resonance imaging (MRI) to visualize the three-dimensional trajectories of the surgically implanted recording cylinders relative to the underlying cortical targets to help guide and confirm electrode placement (Kalwani et al. 2008); and 4) trained for several weeks to perform simple visually guided saccade tasks.
The monkeys performed a one-interval, two-alternative forced-choice task that required them to decide the direction of random-dot motion and indicate their decision with an eye movement (Fig. 1). For two of the monkeys (At and Av), behavioral testing was paired with a technique for assessing ongoing oculomotor activity via electrical microstimulation-evoked eye movements (Gold and Shadlen 2000, 2003). For the other two monkeys (Cy and ZZ), behavioral testing was paired with electrophysiological recordings in the middle temporal visual (MT) and lateral intraparietal (LIP) areas. In both cases, the electrophysiological technique both limited the time spent performing the task, which occurred only when a microstimulation or recording site was found, and affected the geometry of the stimulus used, which was typically designed to match certain properties of the microstimulation or recording site (Law and Gold 2008).
The visual display was generated in MATLAB on a Macintosh computer, using the Psychophysics Toolbox extensions (Brainard 1997; Pelli 1997) and our own software to draw the motion stimulus on a 21-in. CRT (Viewsonic) positioned 60 cm directly in front of the monkey. The motion stimulus, which is described in detail elsewhere (Gold and Shadlen 2003), consisted of bright white dots (19 cd/m2 luminance, 16.7 dots per degree2 per second density) on a black background. A percentage of the dots were plotted, erased, and replotted at fixed displacements over time in three interleaved sets to promote the perception of coherent motion; the remainder were replotted at random locations. The lifetimes of individual coherently moving dots were minimized to avoid spatially localized features in the stimulus. On a given trial, a control computer running REX software (Hays et al. 1982) on the QNX operating system (http://www.qnx.com) pseudorandomly (using the built-in rand function) chose the direction of coherent motion (from two equally balanced alternatives separated by 180°), percentage of coherently moving dots (0, 3.2, 6.4, 12.8, 25.6, 51.2, or 99.9%, with the lower of these values being introduced gradually as training progressed), and viewing duration (chosen from an exponential distribution bounded between 0.1 and 1.5 s, to avoid trials that were too brief or too long but also approximate a flat hazard function and therefore minimize the ability to anticipate stimulus offset; the distribution had a mean value of between 0.2 and 0.8 s in a given session, with smaller values used later in training).
Task timing and feedback were customized for each monkey to maximize their motivation and productivity. Each trial began with onset of the fixation point, which would remain on for ≤10 s either fixed (monkeys Cy and ZZ) or pseudorandomly changing color and diameter every 2 s (monkeys At and Av) until the monkey attained fixation. Failure to attain fixation or broken fixation during a trial would result in a “time-out” period of about 2 s. Correct responses were rewarded with an audible tone paired with 0–5 drops of apple juice (median value = 3 drops per correct trial for each monkey) chosen at random. The volume of juice per drop was adjusted by trial-and-error to maximize each monkey's motivation. Correct trials were followed by a brief intertrial interval of 1–2 s. Erroneous responses were followed by an additional time-out period of 1–3 s.
We fit behavioral choice data from each session to a psychometric function describing the relationship between the strength, duration, and direction of the motion stimulus and the monkey's choices. The function is based on a drift-diffusion process with a drift term that decays exponentially as a function of viewing time and has been shown to provide good fits to the behavioral data (see Eckhoff et al. 2008, especially “ddExp3a” of Eq. 37, for more detailed descriptions of the behavioral models and fitting procedures). Briefly, choice is based on a decision variable x, the value of which evolves as a function of a time-varying drift rate A(t) and a noise term cdW ∼ N(0, c2dt) (1a) where the drift rate depends on motion coherence (C) as a power law and decays exponentially with time (with free parameters a, m, and α) (1b) and the noise scales by a factor φ with the average drift rate (with free parameters φ and r0) (1c)
Thus the decision variable x is a normally distributed random variable with a mean and variance that both scale with several factors including motion strength and viewing time. Choice depends on the value of x at the end of motion viewing: the correct choice is made when x >0, an error otherwise. The psychometric function describing accuracy as a function of coherence and time is therefore (2) where μ(T) = ∫0∞ A(s)ds and v(T) = c2T. Finally, lapses (λ, errors at the highest motion strengths) are accounted for by scaling the entire function to match its measured upper asymptote (3)
We fit session-by-session behavioral data to P(C, T) using three free parameters a, α, and λ. The remaining parameters were set to values used previously (m = 1.25, φ = 0.3, and r0 = 10 spikes/s; see Eckhoff et al. 2008; Gold and Shadlen 2000, 2003).
We measured sequential choice dependencies by analyzing the trial-by-trial residuals from the fits to the psychometric function (Fig. 2). The residuals can be thought of as the portion of the monkeys' trial-by-trial choices that were not accounted for by the average effects of the motion stimulus. The psychometric function provided a predicted outcome for each trial, expressed as a value between 0 and 1 describing the probability of making a rightward choice for the given stimulus. The choice residuals were the differences between these predictions and the actual, binary choices (0 for a leftward choice, 1 for a rightward choice). Thus the values of the residuals spanned the range of −1 (a leftward choice on a trial in which a rightward choice was predicted) to +1 (a rightward choice on a trial in which a leftward choice was predicted). For example, a correct leftward choice on a low-coherence, short-viewing-time trial with a predicted proportion correct of 0.62 would correspond to a choice residual of −0.38. Note that the results were not affected by instead computing the residual deviance, based on the log-likelihood ratio of the best-fitting and saturated models (Wichmann and Hill 2001).
We also fit the behavioral data to several versions of the diffusion model that explicitly accounted for choice biases. Biases were assumed to correspond to offsets to the initial (or, equivalently, final) positions of the drift-diffusion process by adding an additional term to Eq. 2 (Eckhoff et al. 2008) (4) where D is −1 for leftward motion and 1 for rightward motion and the bias B is computed in one of five different ways (see Fig. 7): 1) as a constant, representing an overall bias for an entire session, B = βs, where βs is a fit parameter; 2) directly from the previous trial, B = βcCc + βeCe, where βc and βe are fit parameters, Cc is the choice on the previous trial if it was correct (−1 for leftward, +1 for rightward, 0 for an error), and Ce is the choice on the previous trial if it was an error (−1 for leftward, +1 for rightward, 0 for a correct choice); 3) using a reinforcement learning (RL) rule–like algorithm that updates an estimate of bias based on the previous trial (Sutton and Barto 1998), Bt+1 = β0Bt + β1Cc,t + β2Ce,t, where β1, β2, and β3 are fit parameters; Bt is the value of the bias on trial t; and Cc,t and Ce,t are the choices on trial t if it were correct or an error, respectively, as above; 4) using the biases computed from the Wiener kernel analysis (St, the “sequential bias” on trial t, described in the following text), B = βwkSt, in which βwk is a fit parameter; and 5) a combination of models 1 and 4, B = βs + βwkSt, where βs and βwk are fit parameters.
Note that model 1 assumes a constant bias across the entire session, whereas the other methods include terms that compute biases based on the recent trial history. Models 3 and 4 are derived differently but result in similar functions: the weighted sum of two filtered sequences of choices (Cc,t and Ce,t). For model 3, β1 and β2 are the weighting factors and the filter kernel is a decaying single-exponential function with a time constant specified by β0: WRL = e−t/τ, where t is the trial lag and τ is −log (β0)−1. Model 4 is described in the following text (see in particular Eq. 6). Because the decision variable x can be thought of as an accumulation of evidence in units of spikes/s (from r0 in Eq. 1c), the units of B can be thought of as a change in spikes/s that can be either positive (a bias toward rightward choices) or negative (a bias toward leftward choices). For the average magnitude of bias within a given session, we report the mean ± SE of the absolute value of B in Eq. 4.
The goal of the sequential analysis of behavior was to determine the extent to which past trials could predict the current choice residual. The primary assumption we made was that the (output) sequence of residuals y(t) was related to the (input) sequence of choices x(t − τ) via a causal, linear filter g(τ). We computed g(τ) as the first-order kernel of the Wiener expansion of the functional G relating the input and output sequences, y(t) = G[x(t − τ)] (Rieke et al. 1999), using the Wiener–Hopf equation in matrix form (5) where Wopt are the optimal weights (coefficients) of the finite impulse-response filter that minimizes mean-squared error between the filtered input sequence and the output sequence, RXX is the autocorrelation matrix of the input sequence, and RyX is the cross-correlation matrix of the input and output sequences.
Each computed kernel was fit using least-squares fitting to the following double-exponential equation (6) where t is the trial lag; a1, a2, τ1, and τ2 are free parameters; and n1 and n2 are normalization constants such that ∑i=1N (1/na) e−i/τa. The kernel Wfit was truncated at lags between 1 and 801 trials in 10-trial increments and then used to filter the input sequence. The final version of Wfit used was the shortest truncated version that corresponded to the maximum correlation coefficient between the filtered input sequence and the actual output sequence (the choice residuals).
Two versions of the fit kernel Wfit were computed for each session. For the first kernel, which measured the effect of past correct choices on the sequence of residuals, the input sequence was encoded as −1 for correct leftward choices, 0 for errors, and 1 for correct rightward choices (e.g., Fig. 5A). For the second kernel, which measured the effect of past error choices on the sequence of residuals, the input sequence was encoded as −1 for incorrect leftward choices, 0 for correct trials, and 1 for incorrect rightward choices (e.g., Fig. 5B). The final predicted sequence of residuals—the “sequential bias” (St for trial t in Eq. 4, models 4 and 5)—was computed as the sum of the filtered outputs from the two kernels.
Oculomotor measurements and analysis
For monkeys At and Av, eye position was monitored using a scleral search coil system (CNC Engineering, Seattle, WA) sampled at 1,000 Hz. For monkeys Cy and ZZ, eye position was monitored using a video-based system (Applied Science Laboratories, Bedford, MA) sampled at 240 Hz. While the fixation point was illuminated (e.g., throughout motion viewing), fixation was enforced to within <3°. Following fixation-point offset, choice was determined by comparing the endpoint of the first voluntary saccade (required to occur between 80 and 500 ms following fixation-point offset) to the locations of the two choice targets. Trials with broken fixations or saccadic endpoints located >3.5° from either target were excluded from further analysis.
For monkeys At and Av, behavioral testing was combined with a technique for assessing oculomotor preparation (Fig. 1A; for more details, see Gold and Shadlen 2000, 2003). A single, glass-covered tungsten microelectrode (Alpha Omega USA, Atlanta, GA) was advanced into the frontal eye field (FEF) using a NAN microdrive (Plexon, Dallas, TX) until a site was found where electrical microstimulation could elicit saccadic eye movements with a consistent trajectory using <50 μA of current (0.25-ms-long biphasic pulses applied at a rate of about 350 Hz for 60 ms) applied in darkness. Once a site was found, the task geometry was adjusted for that session such that the axis of motion of a foveally presented stimulus was roughly perpendicular to the trajectory of the evoked saccade. Eye movements were evoked on 10–90% of trials in a given session, chosen at random. Microstimulation pulses started at the simultaneous offset of the motion stimulus and fixation point and typically evoked a saccade with a latency of about 40 ms, which was followed within about 100 ms by a second, voluntary saccade to one of the two choice targets. Evoked saccade endpoints were measured from the stable eye position between the evoked and voluntary saccades.
Evoked-saccade trajectories were quantified as the magnitude of deviation, in degrees of visual angle, of their endpoints along the axis of motion. For most, but not all, sessions (159/162 for At, 158/213 for Av), the mean evoked saccade deviated in the same direction as the subsequent voluntary saccade. For these sessions, deviations toward the chosen target were assigned positive values; the rest were assigned negative values. For the remaining sessions, deviations toward the chosen target were assigned negative values; the rest were assigned positive values. Thus in all cases a positive deviation implied the same direction as the average deviation measured for that session.
We measured the relationship between the evoked-saccade deviations and sequential choice dependencies by computing the Spearman's (partial) rank correlation between the trial-by-trial magnitudes of deviation and the choice dependencies computed using the Wiener kernel analyses (St). St was signed according to the actual choice made on the given trial: a positive value for biases in the direction of the actual choice made, a negative value for biases in the other direction. We computed this correlation separately for left and right choices for each session, to account for possible differences in deviation magnitude for the two choices and avoid the confounding influence of sequential dependencies on choice behavior. We used rank correlations to standardize across different average magnitudes of both variables across sessions. We used partial correlations to account for effects of the strength and duration of the motion stimulus on the evoked-saccade deviations (Gold and Shadlen 2000, 2003). Specifically, we computed the correlation coefficient after controlling for the effects of both viewing time alone and the multiplicative interaction between motion strength and viewing time (this multiplicative interaction is consistent with an accumulation of motion information over time, as in the psychometric functions; Eckhoff et al. 2008).
Saccade latency, velocity, and accuracy were measured from the first voluntary saccade for all trials from Cy and ZZ and only for trials without electrical microstimulation from At and Av. Spearman (partial) rank correlations were computed to describe the trial-by-trial relationships between these parameters and sequential bias (St), using the same procedures as the deviation data, described earlier.
Electrophysiological measurements and analysis
For monkeys Cy and ZZ, behavioral testing was combined with recordings of neural activity in areas MT and LIP. To begin each session quartz-coated platinum–tungsten microelectrodes were advanced into MT and LIP via a pair of Mini Matrix microdrive systems (Thomas Recording, Giessen, Germany). Extracellular action potential waveforms were stored and sorted off-line (Plexon). If a direction-tuned MT neuron was found, the motion stimulus was placed in its receptive field and shown at the neuron's preferred direction (and 180° opposite) and speed. If no MT neuron was found, the modal location, direction, and speed from previous sessions were used. If an LIP neuron with spatially tuned activity during the delay period of a delayed saccade task was found, one of the two choice targets was placed in its response field. If no LIP neuron was found, the targets were placed at their modal locations from previous sessions. The monkeys performed the task only while at least one MT or LIP neuron was recorded. Also, unlike in the version of the task used in the microstimulation experiments, there was a delay period of 0.3–0.8 s between offset of the motion stimulus and offset of the fixation point.
We quantified the relationship between MT and LIP activity and sequential choice dependencies (St) using Spearman's (partial) rank correlations. These correlations were computed separately for each choice, using correct trials only. For responses measured during motion viewing, partial correlations were computed by first controlling for the effects of motion strength. For LIP responses during motion viewing, responses were quantified not using raw spike rates but instead of the rate of rise (parameter γ1 in Eq. 7) of the responses as a function of viewing time (T), as determined from a trial-by-trial fit to a simple piecewise linear model (7) where γ0, γ1, γ2, and τ are fitted parameters (γ0 and γ1 were fit to spike-rate data smoothed using an alpha function from each trial; τ and γ2 were fit using average spike-rate data from each coherence for the given session). Choice indices were computed as the area under the region of overlap condition (ROC) curve obtained from the two distributions of spike rate from correct trials corresponding to saccade choices made into and away from the neuron's response field (Green and Swets 1966; Shadlen and Newsome 2001). These indices were computed separately for trials with high (>20%) and low (<20%) motion coherence, independent of choice bias, and separately for trials in which the choice bias (B from Eq. 4, model 5) was toward and away from the actual choice, independent of coherence (similar results to those shown in Fig. 14 were found using only trials with high coherence or only trials with low coherence).
We measured sequential dependencies of choice behavior in four monkeys learning the direction-discrimination task (Fig. 1; monkey At: 281,638 trials in 187 sessions over 518 days; Av: 382,788 trials in 232 sessions over 637 days; Cy: 114,404 trials in 160 sessions over 641 days; ZZ: 69,028 trials in 130 sessions over 416 days). Below, we first identify sequential choice dependencies in individual sessions by analyzing the trial-by-trial residuals of choice data fit to a psychometric function that assumes trial independence. We then incorporate these dependencies into a model of decision formation that allows us to compare directly the relative contributions of prior choices and incoming sensory information to performance as perceptual sensitivity improves with training. We finally compare the behavioral choice dependencies to signals related to the preparation and execution of the oculomotor response and to neural activity in cortical areas MT and LIP, two brain regions linked to task performance.
Analysis of choice behavior
Behavioral choices tended to exhibit biases within individual sessions. Figure 3 illustrates data from a single session. Overall for this session, the monkey tended to choose rightward more often than leftward (57.1% rightward choices), despite nearly balanced stimulus presentations (49.5% rightward stimuli). This effect is seen in smoothed versions of both the trial-by-trial choices and trial-by-trial choice residuals from the unbiased model (Eqs. 1–3) plotted in chronological order, which tended toward positive values (Fig. 3A). The residuals are particularly informative because they have taken into account the average effects of the stimulus shown on the current trial and thus are a more sensitive measure of the relationship between choices across trials. Autocorrelation functions of both the choice and choice-residual sequences also reflect similar patterns, in both cases peaking at a lag of one trial and then declining slightly but remaining in general positive, implying a tendency to make the same (in this case, rightward) choice (Fig. 3B).
All four monkeys exhibited choice biases, which we summarize using two statistics (Fig. 4). The first statistic is the absolute value of the mean magnitude of choice (or choice residual) within a session, which reflects the overall preponderance to make one choice versus the other. The second statistic is the absolute value of the autocorrelation function at a lag of one trial, which is a measure of similarity between pairs of trials (the signed value of this statistic was >0 for 153/161 sessions for At, 206/219 sessions for Av, 79/151 for Cy and 64/127 sessions for ZZ, consistent with a tendency of At and Av to repeat choices but of Cy and ZZ to both repeat and alternate choices; compare closed and open symbols in Fig. 4). These two measures are not necessarily independent, because pairwise autocorrelation is expected when one choice predominates. However, the two measures serve as useful benchmarks for later analyses that consider the source of these biases (see Fig. 7). In all four monkeys, both statistics were larger for choices than for stimulus directions (Wilcoxon paired-sample test for equality of medians of choice vs. stimulus direction or choice residual vs. stimulus direction, P < 0.01). Thus the monkeys all tended to make more of one choice than the other in individual sessions and, on a shorter timescale, either repeat or alternate choices.
To account for these choice biases, we computed first-order Wiener kernels describing a causal transformation from the sequences of past choices to the sequence of choice residuals for each session. This kernel describes a linear filter that minimizes the mean-squared error between the filtered input (the sequence of choices on previous trials convolved with the kernel) and the output (the choice residuals). This computation involved several assumptions. First, the kernel was used to predict the sequence of analog choice residuals, not the sequence of binary choices. Thus the kernel represents the relationship between the sequence of past choices and the portion of the current choice not accounted for by the average effects of the current motion stimulus. Second, the kernel was assumed to be causal and therefore took into account only those choices occurring at least one trial in the past. Third, the filter was assumed to be linear. Higher-order kernels can be computed but were not considered here. Fourth, kernels were computed separately for each session. This procedure reflected an effort to provide enough trials for a reliable estimate of the kernel but not so many to obscure possible changes in the kernel over time.
Two example kernels computed from a single session are depicted in Fig. 5, A and B. One was computed using data only from past correct choices (that is, sequences consisting of +1 for correct rightward choices, −1 for correct leftward choices and 0 for incorrect choices; Fig. 5A), the other using data only from past error choices (sequences consisting of 1 for incorrect rightward choices, −1 for incorrect leftward choices and 0 for correct choices; Fig. 5B). In both cases, the kernel coefficient was relatively large at a lag of one trial and then tended to be noisy but on average declined steadily toward zero. Such a kernel with mostly positive values suggests that the current choice reflects a running, weighted average of past choices. Figure 6 summarizes the session-specific kernels computed separately using correct (left) or error (right) trials for each of the four monkeys. On average, the kernel coefficients tended to be largest at the shortest lag and then steadily approach zero over lags of tens of trials, although individual kernel coefficients could take a fairly wide range of values at all lags.
A critical problem in interpreting these kernels is overfitting the data. The problem is illustrated in Fig. 5C, which shows the average correlation coefficients between the filtered past choices (that is, the outputs of the kernel convolved with past choices) and the actual choice residuals, a measure of how well the kernels describe the data. These correlation coefficients are plotted as a function of the size of the kernel, which determines how far in the past there were trials that could, in principle, exert a measurable influence on the current choice. For all four monkeys, the correlation increased monotonically up to values >0.5 as a function of kernel size, implying that increasing the kernel size always takes advantage of idiosyncratic structure in the data. A consequence of this problem is that choosing the kernel size becomes arbitrary, with larger sizes providing better fits to these finite-length data sets—but only by capturing specious sequential patterns.
To overcome the problem of overfitting while still providing session-specific estimates of the kernel and capturing its shape at both short (∼1 trial) and longer lags, we fit each raw kernel with a double-exponential function (e.g., dashed lines in Fig. 5, A and B). This function typically provided better fits than either a single-exponential (F-test, P < 0.05 for 505/588 sessions from all four monkeys) or power-law (the evidence ratio of Akaike's information criterion [AIC] was >20 for 367/588 sessions; Burnham and Anderson 2004) function and was similar in shape to the raw kernels averaged across sessions (compare green and black curves in Fig. 6). The fit kernels produced outputs that correlated with the actual choice residuals in a manner that rose with kernel size initially but, in contrast to the raw kernels, not indefinitely: the correlation (rseq) between filtered input and actual choice residuals reached a plateau for kernels using lags of <600 trials for all monkeys and all sessions, with a median [interquartile range, or IQR] value of the minimum lag providing the maximum rseq of 61  trials for monkey At, 81  trials for Av, 41  trials for Cy, and 31  trials for ZZ (Fig. 5, C and D).
The double-exponential kernels accounted for at least some of the sequential structure in the choice residuals. The value of rseq was higher than would be expected by chance from randomly ordered sequences of the same trials for 554 of 693 (80%) total sessions from all four monkeys (Monte Carlo simulations, P < 0.05), with a median [95% CIs] value across sessions of 0.16 [0.05–0.40] for monkey At, 0.11 [0.03–0.33] for Av, 0.14 [0.03–0.34] for Cy, and 0.12 [0.03–0.25] for ZZ. These values of rseq, which were computed using kernels determined separately from correct and error trials, were higher than when correct and error trials were combined together (that is, using a single sequence of choices, encoded as −1 for a leftward choice and +1 for a rightward choice; Wilcoxon test using the distributions of rseq computed from individual sessions, P < 0.05 for Av, Cy, and ZZ, P = 0.23 for At). This result implies that, at least for three of the four monkeys, the sequential dependencies tended to differ following correct versus error trials, an effect that can be seen by comparing the shapes of the average kernels for the two conditions (Fig. 6, compare left and right columns; the kernel coefficient at a lag of one trial had a different sign when computed using correct vs. error trials in 18% of sessions for At, 28% of sessions for Av, 51% of sessions for Cy, and 46% of sessions for ZZ). The kernels were not affected by using choices scaled by the amount of reward received on correct trials or the difficulty (coherence) of each trial (P > 0.05 for both cases, all four monkeys). Thus kernels computed separately using choice data from past correct and error trials appeared to be the best predictors of future choices.
Inclusion of choice bias in the psychometric function
We incorporated the sequential dependencies identified by the Wiener kernel analysis as a bias in a drift-diffusion model of performance (Eqs. 1–4). This procedure allowed us to analyze the relative contributions of the sequential choice dependencies and incoming sensory information on the monkeys' behavioral performance and to determine how these contributions change with training.
The magnitude of the bias in the drift-diffusion model was determined by two fit terms that together accounted for the behavioral biases (model 5 in Fig. 7). One term (βs) was fixed for an entire session and accounted primarily for the overall asymmetry between left and right choices in a given session (compare models 1 and 5 in Fig. 7). The other term (βw) scaled the trial-by-trial sequence of choices filtered by the Wiener kernels and accounted for sequential dependencies in the choice residuals and, to a lesser extent, the overall asymmetry between left and right choices (compare models 4 and 5 in Fig. 7).
We fit the behavioral data using two other models to provide further intuition into the nature of the bias term. For one model, the bias was based on the choice and outcome of the previous trial only (model 2 in Fig. 7). The fits to this model tended to show the least improvement over the unbiased model. Thus despite the fact that the Wiener kernel coefficients tended to be strongest at a lag of one trial (Fig. 6), information from that trial alone did not appear to be sufficient to account for the behavioral biases. The second model was based on an RL algorithm in which the bias term was updated on each trial based on the choice and outcome of the most recent trial (model 3 in Fig. 7). This model appeared to perform approximately as well as a model using only the Wiener kernels to compute biases (model 4), reflecting the fact that the bias terms in both models are consistent with weighted averages of recent correct and error choices (see methods for details). In fact, the trial-by-trial biases estimated by the two models were highly correlated (the median [IQR] correlation coefficient between the trial-by-trial values of B in Eq. 4 using the two models fit to data from individual sessions was 0.98 [0.04] for At, 0.98 [0.03] for Av, 0.95 [0.06] for Cy, and 0.90 [0.08] for ZZ). This analysis suggests that the biases might arise, at least in part, from a relatively simple updating process that takes into account the choice and outcome of the previous trial.
We also considered two ways of integrating the bias term in the model. The first caused an offset in the initial (or, equivalently, final) position of the decision variable (Eq. 4), which is consistent with previous models of choice bias (Ashby 1983; Carpenter and Williams 1995; Link 1992; Ratcliff et al. 1999). The second method caused an offset in the drift rate (a in Eq. 1b) that governs stimulus sensitivity, which might be expected if the choice dependencies arose from a process, like attention, that directly affected the representation of sensory evidence used as input to the decision variable (see Fig. 6 in Gold and Shadlen 2007). The first method provided a consistently better fit to the behavioral data than the second (the evidence ratio of the AIC was >1, indicating that the first model was more likely than the second, for 574 of 652 total sessions from all four monkeys). Thus the sequential choice dependencies were consistent with a process that provided a dynamically modified offset to the decision variable that converts sensory evidence into the categorical choice.
Changes in bias with training
The magnitude of the choice bias tended to decline as sensitivity to the motion stimulus improved with training. Two example sessions from early and late in training are shown in Fig. 8. The psychometric functions depict the percentage of rightward choices as a function of signed coherence (negative values indicate leftward motion, positive values rightward motion) at long viewing times. This function is steeper for the later versus the earlier session, indicating increased sensitivity to the motion stimulus. The trial-by-trial choice biases (computed here and in subsequent analyses from B in Eq. 4, model 5) show decreased fluctuations in the later versus the earlier session.
The relationship between choice bias and perceptual sensitivity throughout training is summarized in Fig. 9. We quantified sensitivity per session as the best-fitting value of the scale factor a of the drift rate A(t) (Eq. 1b), which is inversely related to discrimination threshold and is the most informative parameter of the model for describing changes in sensitivity with training (Eckhoff et al. 2008). For all four monkeys, the value of a increased steadily with training (Fig. 9, A–D; weighted linear regression of a vs. session number, H0: slope = 0, P < 0.01). Note that a was estimated using Eq. 4, which includes a bias term; not doing so would treat the sequential structure as noise and thus tend to underestimate a (median [IQR] percentage reduction in a computed without the bias term = 6% [10%] for At, 3% [5%] for Av, 7% [13%] for Cy, and 12% [19%] for ZZ). We quantified bias per session as the mean absolute value of B in Eq. 4 (model 5), which corresponds to the average magnitude of the offset to the decision variable. For all four monkeys, the bias decreased steadily with training (Fig. 9, E–H; weighted linear regression, P < 0.01). Consistent with these trends, the relative contributions of the bias and sensory-driven activity to the decision variable [B/μ(T) from Eq. 4] declined over the course of training, eventually appearing to approach an asymptote by the end of training for Cy but not for the other three monkeys (Fig. 9, I–L; single-exponential fits, tau = 329 sessions for At, 253 sessions for Av, 36 sessions for Cy, and 106 sessions for ZZ).
To give a better understanding of how the bias term in the model affected choice behavior, we computed the percentage of choices that were predicted correctly using only the value of the bias term (i.e., predict a rightward choice for B >0, otherwise a leftward choice; Fig. 10). The accuracy of this prediction was highest when the information from the motion stimulus was weakest and declined steadily as stimulus information increased. Moreover, consistent with the analyses in Fig. 9, this prediction was strongest early in training, when the magnitude of B was largest, and then declined steadily as training progressed. Thus the biases tended to have the strongest effects on behavior when the stimulus was weak and perceptual sensitivity was low.
For monkeys performing the direction-discrimination task, there appears to be a close relationship between forming the direction decision and preparing the oculomotor response. For example, the temporal accumulation of motion information used to form the direction decision is reflected in neural activity in several brain regions linked to oculomotor preparation, including LIP, FEF, and the superior colliculus (Horwitz and Newsome 1999, 2001; Kim and Shadlen 1999; Roitman and Shadlen 2002; Shadlen and Newsome 2001). We tested whether the sequential dependencies are also reflected in oculomotor-related signals.
To assess oculomotor preparatory activity, we measured the trajectories of saccadic eye movements evoked with electrical microstimulation of the FEF at the end of motion viewing. These trajectories are determined primarily by the site of microstimulation but are sensitive to ongoing oculomotor activity, for example, deviating in the direction of a planned saccade (Mays and Sparks 1980; Schlag et al. 1989). Saccades evoked at the end of motion viewing during the direction-discrimination task reflect the link between decision formation and oculomotor preparation, deviating in the direction of the monkey's impending saccadic choice (Fig. 11, A and B) with a magnitude that depends on the strength and duration of the motion evidence used to arrive at that choice (Fig. 11C; Gold and Shadlen 2000, 2003). To test whether these evoked-saccade deviations also reflect the choice bias inferred from behavior, we computed partial rank correlations between the trial-by-trial magnitudes of deviation and bias (see methods).
There was a slight, positive relationship between the deviation magnitude and choice bias magnitude in both monkeys. For the example site shown in Fig. 11D, the deviations in the direction of the chosen target tended to be larger on trials in which the monkey was biased in that direction. For monkey At, the correlation coefficient was significantly >0 (P < 0.05) for 70 of 253 cases (each of the two choices from 168 sessions with >50 trials) and <0 (P < 0.05) for only 6 cases, and for all sessions had a median value that was significantly >0 (Mann–Whitney test, P < 0.01; Fig. 11E and Table 1). For monkey ZZ, these values were >0 for 101 of 346 cases (P < 0.05) and <0 for 53 cases (P < 0.05), and for all sessions also had a median slope that was significantly >0 (P < 0.01; Fig. 11E and Table 1). Moreover, there was a positive, linear relationship between the value of this correlation coefficient and the absolute magnitude of bias measured on the same trials. Thus there was a tendency for the evoked eye movements to deviate slightly more toward the right target when the monkey was biased in that direction and slightly more toward the left target when the monkey was biased in that direction.
There was a similar, slight relationship between the latency and velocity of the voluntary eye-movement response and choice bias in monkeys At and Av (Table 1). These monkeys tended to initiate their saccadic response sooner (following the instruction to do so via simultaneous offset of the motion stimulus and fixation point) and execute it more slowly when their choice was in the same direction as the current estimate of choice bias. In contrast, these trends were not apparent in monkeys Cy and ZZ, possibly because their version of the task had a delay period between dots offset and fixation-point offset that effectively separated decision processing from oculomotor preparation. There was also no systematic relationship between bias and saccade accuracy in any of the four monkeys. These results are consistent with the idea that when saccade initiation occurs immediately following the process of saccade selection, biases that affect the selection process can be reflected in the time to initiate and execute the saccade.
We looked for correlates of choice bias in areas MT and LIP. Neurons in area MT are tuned for the direction of visual motion and are thought to provide the sensory evidence for the direction decision (Gold and Shadlen 2007). In principle, trial-by-trial shifts in MT responsiveness of particular subsets of MT neurons could affect choices in a manner analogous to the effects of electrical microstimulation, which biases choices in the preferred direction of the stimulated neurons (Salzman et al. 1990, 1992). Neurons in area LIP have been associated with sensory, motor, and cognitive functions and in decision tasks are thought to represent the accumulation of sensory evidence into a decision variable that governs the monkey's behavioral choice (Roitman and Shadlen 2002; Shadlen and Newsome 2001). In principle, choice biases could result from trial-by-trial changes in the value of this decision variable, such as an additive offset predicted by the diffusion model analysis.
MT activity was not correlated with sequential choice dependencies in a systematic manner before, during, or after motion viewing. Figure 12, A and B shows an example MT neuron. Its responses were modulated strongly by the motion stimulus during motion viewing, with increasingly strong leftward motion eliciting increasingly strong responses and increasingly strong rightward motion eliciting increasingly weak responses. We computed rank correlations between MT activity and sequential biases, using a partial correlation for data from the stimulus-viewing epoch that accounted for the (linear) relationship between response magnitude and motion strength. The value of this correlation did not differ significantly from zero in any of the three epochs (P > 0.05).
The relationship between MT activity and choice bias across the population of 71 neurons recorded in monkey Cy and 38 neurons recorded in monkey ZZ is summarized in Fig. 12, C and D. For each session, sequential bias (St) was encoded with respect to the preferred direction of the given MT neuron. Thus a positive (negative) correlation between MT activity and sequential bias would imply that stronger MT responses were associated with an increasing (decreasing) tendency to choose the preferred direction of the neuron. For each of the three epochs analyzed (shaded regions in Fig. 12A) for both monkeys, the median of the distribution of correlation coefficients did not differ significantly from zero (Mann–Whitney, P > 0.4). Moreover, there was no systematic, linear relationship between the correlation coefficient and the magnitude of bias measured from the same trials (i.e., the mean, absolute value of B from Eq. 4, model 5).
Likewise, LIP activity was not correlated with choice bias in a systematic manner. Figure 13, A and B shows an example LIP neuron. Its responses were modulated by the motion stimulus during motion viewing and remained separated by choice until the saccadic response. After accounting for these task-related modulations, the partial rank correlations between LIP activity and sequential bias were not significantly different from zero in any epoch before, during, or after motion viewing (P > 0.05).
The relationship between LIP activity and choice bias across the population of 123 neurons recorded in Cy and 99 neurons recorded in ZZ is summarized in Fig. 13, C and D. For each session, sequential bias (St) was encoded with respect to the choice associated with the target in the response field of the given LIP neuron; thus a positive (negative) correlation between LIP activity and bias would imply that stronger LIP responses were associated with an increasing (decreasing) tendency to choose the target in the neuron's response field. For each of the six epochs analyzed (shaded regions in Fig. 13A) for both monkeys, the median of the distribution of correlation coefficients did not differ significantly from zero (Mann–Whitney, P > 0.05). Moreover, there was no systematic, linear relationship between the correlation coefficient and the magnitude of bias measured from the same trials (i.e., the mean, absolute value of B from Eq. 4, model 5).
Because LIP activity during motion viewing has been shown to relate closely to task performance (Roitman and Shadlen 2002; Shadlen and Newsome 2001), we examined in more detail how LIP responses during this epoch related to sensory input, sequential bias, and choice behavior. For each neuron, we computed an ROC-based choice index that quantifies the degree to which the LIP responses are separate for choices into versus out of the neuron's response field (Shadlen and Newsome 2001). A value of 0.5 implies that the distributions of responses were completely overlapping; a value of 1.0 implies that the distributions were completely nonoverlapping. To compare the effects of motion evidence and choice bias on the responses, we computed the difference in choice indices between trials with high versus low motion strengths and between trials that resulted in a choice toward versus away from the direction of the choice bias (Fig. 14). Consistent with previous reports, the coherence dependence grew over the first several hundred milliseconds of viewing time then tended to remain positive throughout motion viewing, particularly later in training (Kiani et al. 2008; Law and Gold 2008; Roitman and Shadlen 2002; Shadlen and Newsome 2001). In contrast, there was no consistent dependence on choice bias at any time in either monkey, even early in training and early in motion viewing when an initial, additive offset would be expected to dominate the value of the decision variable.
We measured sequential dependencies in choice behavior of monkeys learning a demanding perceptual task that required them to indicate the direction of visual motion with a saccadic eye movement. Their saccadic choices were determined primarily by the direction, strength, and duration of the motion stimulus, which were chosen randomly from trial to trial. Nevertheless, there were clear dependencies of choices across trials. These dependencies tended to be strongest early in training and then diminished gradually as perceptual sensitivity to the motion stimulus improved. In addition, the dependencies were slightly correlated with an inferred plan to generate the saccadic response and the latency and velocity of the response itself. In contrast, there was no consistent relationship between the sequential choice dependencies and neural activity in MT, which encodes visual motion, and LIP, which represents the transformation of motion information into a saccadic choice. The results suggest that training can calibrate the relative influence of sensory and nonsensory factors used for action selection, particularly during perceptual learning when perceptual sensitivity to relevant sensory signals improves.
Behavioral choice dependencies
The behavioral choice dependencies persisted throughout months of training despite the fact that they provided no benefit to the monkeys in terms of obtaining reward. In fact, because reward depended only on whether the monkey chose the correct direction of motion for the given trial and directions were chosen at random, biasing choices based on past trials could only hinder performance. However, because the sequential dependencies reflected in most cases only a fraction of the influence of the given stimulus on choice (Fig. 9, I–L), their effect on the amount of reward received was perhaps not large enough to motivate a more rapid change in strategy (the percentage of correct responses in individual sessions estimated from the behavioral fits to Eq. 4 were improved when assuming no biases by median [IQR] of only 1 % for At, 2 % for Av, 1 % for Cy, and 1 % for ZZ even when considering only the first 20 sessions for each monkey, when the biases were greatest). Instead, the dependencies appeared to reflect a default strategy that did not require reinforcement to persist and thus seems likely to be present under a wide variety of behavioral conditions.
Consistent with this idea, the sequential choice dependencies we measured for the motion-discrimination task were similar to those measured on free-choice tasks. The most striking similarity is the form of the weighting function describing the relationship between past events and the current choice, with the strongest weights corresponding to one or two trials in the past and decaying weights moving further into the past (Figs. 5 and 6; Corrado et al. 2005; Lau and Glimcher 2005). Such a function, when all the weights are positive and applied to past events, generates a running average that can be used to estimate recent rates of particular choices or rewards (Killeen 1994). The temporal decay might reflect limitations in memory, an innate emphasis on recent events that has evolved in a dynamic world, and possibly a common mechanism for discounting both past and future rewards (Corrado et al. 2005; Cowie 1977). The decay also highlights the local nature of the underlying computations that can drive behavior (Herrnstein and Vaughan 1980; Vaughan 1981).
There were also differences between our results and previous reports using free-choice tasks, including the source of past information used to guide choices. For free-choice tasks, behavior is typically analyzed with respect to various forms of reward rate (e.g., Corrado et al. 2005; Gallistel et al. 2001; Sternberg 2001). However, a recent study highlighted the usefulness of accounting separately for the history of rewards and choices, demonstrating that monkeys performing a free-choice task tended to make choices biased toward recently reinforced choices but biased away from recent choices in general (Lau and Glimcher 2005). For our task, behavior was most strongly predicted by a function of past choices, reflecting in some cases a tendency to repeat and in other cases a tendency to switch choices (see Fig. 6). Taking into account whether the past choices were rewarded (i.e., were correct or incorrect) improved the predictions, but taking into account the magnitudes of previous rewards did not.
Our modeling results are consistent with the idea that these dependencies correspond to an additive offset to a quantity, known as a decision variable, that accumulates motion information to arrive at a decision (Ashby 1983; Carpenter and Williams 1995; Link 1992; Ratcliff et al. 1999; this mechanism is also closely related to criterion changes in signal detection theory: Green and Swets 1966; Maddox 2002). In our decision model (Eqs. 1–4), the accumulated motion information is thought to represent the logarithm of the likelihood ratio describing the relative probabilities of obtaining the current motion evidence given the possible directions of motion (e.g., left vs. right; Gold and Shadlen 2001). An additive offset to the decision variable that is related to the relative frequencies of recent occurrences of each alternative implies an ongoing estimate of prior probability used in the context of Bayesian inference (Kersten et al. 2004; Rao 1999; Tassinari et al. 2006; Tenenbaum and Griffiths 2001). In this framework, the sequential dependencies might be thought of as a necessary component of a Bayesian decision process and not simply a reflection of a suboptimal or lazy strategy, thus explaining their prevalence in perceptual and cognitive tasks (Botvinick et al. 2001; Cho et al. 2002; Kirby 1976; Laming 1968; Remington 1969; Soetens et al. 1985).
Our search for oculomotor correlates of the sequential choice dependencies was motivated by the notion that high-order cognitive and perceptual functions are intimately linked to behavioral output (Clark 1997; Merleau-Ponty 1962; O'Regan and Noe 2001). For example, both spatial attention and perceptual decision making have been shown to be reflected in signals related to the preparation of eye movements (Gold and Shadlen 2000, 2003; Kustov and Robinson 1996; Moore et al. 2003). Even more relevant to the present study, saccadic latencies on a simple visually guided saccade task can reflect prior probabilities in a manner roughly consistent with our decision model, causing an additive offset to the value of a decision variable that builds up over time (Carpenter and Williams 1995).
The present results provide further support for these ideas, demonstrating that not just sensory but also sequential information used to select a saccadic response is reflected in signals related to the preparation and execution of that response. The effects were quite weak, which is consistent with the idea that the sequential effects played a minor role in determining where and when to plan the saccadic response relative to the influence of stimulus information from the current trial. Nevertheless, the presence of these sequential effects implies that the oculomotor system has at least some access to the constellation of factors that govern the monkey's saccadic choices.
We targeted areas MT and LIP because of previous studies linking their activity to performance on the direction-discrimination task (reviewed in Gold and Shadlen 2007). MT represents motion evidence used to form the direction decision (Britten et al. 1992; Newsome and Paré 1988; Pasternak and Merigan 1994; Salzman et al. 1990). LIP is thought to represent the decision variable that converts sensory evidence into a plan to generate the appropriate oculomotor response (Roitman and Shadlen 2002; Shadlen and Newsome 2001). LIP has also been implicated in a host of other perceptual, oculomotor, and cognitive functions, including the valuation of recent rewards on a free-choice task that further suggested it might reflect the sequential choice dependencies we measured from behavior (Platt 2002; Snyder et al. 1997; Sugrue et al. 2004). Moreover, motion-driven responses in LIP, but not MT, accompany improvements in perceptual sensitivity on the direction-discrimination task (Law and Gold 2008).
However, we found no systematic effects in either MT or LIP. The lack of effects in LIP was particularly striking, given its close relationship to task performance both during and after training (Hanks et al. 2006; Law and Gold 2008; Roitman and Shadlen 2002; Shadlen and Newsome 2001). It is possible that the effects were too small or noisy to measure from individual neurons or reflected more complex influences than monotonic changes in spike rates in either area. Conversely, the lack of effects in MT and LIP might imply that the sequential dependencies are processed separately from the judgment about motion direction. That is, task-related activity in LIP might represent only part of the decision variable that ultimately governs saccadic choices for this task, especially early in training when the sequential effects are strongest.
We do not know where or how the sequential aspects of the decision variable are computed. The oculomotor results suggest that activity in other structures that prepare and execute the saccadic response—possibly the superior colliculus or FEF—is at least influenced by the sequential choice dependencies. The fact that the dependencies typically involve multiple trials in the past suggest memory and strategic requirements often attributed to the dorsolateral prefrontal cortex (Barraclough et al. 2004; Funahashi et al. 1989; Fuster 2000; Goldman-Rakic 1995; Tanji and Hoshi 2001). The differences in the effects of previous rewarded versus unrewarded trials suggest brain regions involved in predicting and evaluating rewards or punishments like the orbitofrontal and anterior cingulate cortex or midbrain dopamine neurons (Nakahara et al. 2004; Rolls 2004; Seo and Lee 2007; Shidara and Richmond 2002; Tremblay and Schultz 1999). Imaging or multisite recording studies will ultimately be needed to understand the roles that these different brain areas play in computing and expressing sequential biases on choice behavior.
This work was supported by National Institutes of Health Grants EY-015260, MH-062196, and P30 EY-001583; the McKnight Foundation; the Burroughs-Wellcome Fund; and the Sloan Foundation.
We thank L. Ding, C.-L. Teng, P. Holmes, and P. Eckhoff for helpful comments and J. Zweigle, F. Letterio, M. Supplick, and A. Callahan for technical assistance.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Copyright © 2008 by the American Physiological Society