Journal of Neurophysiology

Relationships Between the Threshold and Slope of Psychometric and Neurometric Functions During Perceptual Learning: Implications for Neuronal Pooling

Joshua I. Gold, Chi-Tat Law, Patrick Connolly, Sharath Bennur

Abstract

Perceptual learning involves long-lasting improvements in the ability to perceive simple sensory stimuli. Some forms of perceptual learning are thought to involve an increasingly selective readout of sensory neurons that are most sensitive to the trained stimulus. Here we report novel changes in the relationship between the threshold and slope of the psychometric function during learning that are consistent with such changes in readout and can provide insights into the underlying neural mechanisms. In monkeys trained on a direction-discrimination task, perceptual improvements corresponded to lower psychometric thresholds and slightly shallower slopes. However, this relationship between threshold and slope was much weaker in comparable, ideal-observer “neurometric” functions of neurons in the middle temporal (MT) area, which represent sensory information used to perform the task and whose response properties did not change with training. We propose a linear/nonlinear pooling scheme to account for these results. According to this scheme, MT responses are pooled via linear weights that change with training to more selectively read out responses from the most sensitive neurons, thereby reducing predicted thresholds. An additional nonlinear (power-law) transformation does not change with training and causes the predicted psychometric function to become shallower as uninformative neurons are eliminated from the pooled signal. We show that this scheme is consistent with the measured changes in psychometric threshold and slope throughout training. The results suggest that some forms of perceptual learning involve improvements in a process akin to selective attention that pools the most informative neural signals to guide behavior.

INTRODUCTION

Performance on simple perceptual tasks can improve with training, a phenomenon called perceptual learning (Fahle 2005; Gilbert et al. 2001; Goldstone 1998; Seitz and Watanabe 2005). Perceptual learning is often measured as increasing discriminability for a given stimulus or decreasing stimulus strength required for a given level of performance, corresponding to horizontal shifts of the psychometric function describing performance accuracy as a function of stimulus strength (Fig. 1; Fine and Jacobs 2002; Gilchrist et al. 2005; Strasburger 2001). Here we examine how training can also affect the slope of this function. Based on previous studies linking psychometric slope to uncertainty about which signals in the brain to use to guide task performance, we hypothesized that changes in slope might accompany decreases in threshold that arise from training-induced changes in how sensory activity in the brain is read out to guide behavior (Kontsevich and Tyler 1999; Pelli 1985, 1987; Tyler and Chen 2000).

Fig. 1.

Psychometric functions. Top row: time-independent cumulative Weibull function (Eq. 1), fit to data binned by viewing time. Middle row: time-dependent cumulative Weibull function (Eq. 2), fit to data as a function of both motion strength and viewing time. Bottom row: decision model (Eq. 3), fit to data as a function of both motion strength and viewing time. Left column: percentage correct plotted vs. stimulus strength. Middle column: discriminability (d′) plotted vs. signal strength on log–log coordinates. Right column: relationship between psychometric slope and threshold for changes in the values of key parameters of each model, as indicated (arrows point to larger values for each parameter). The dash–dotted line in A indicates psychometric slope, defined as the steepness of the function plotted on a logarithmic abscissa at threshold. The dashed lines in left and middle columns indicate threshold, defined as the stimulus strength corresponding to d′ = 1. The grayscale in D, E, G, and H depicts viewing time (darker lines correspond to longer times). The inset in B depicts the relationship between percentage correct and d′ according to signal-detection theory: for a 2-alternative task, percentage correct is the proportion of a normally distributed random variable >0 (gray) (Green and Swets 1966; Klein 2001; Macmillan and Creelman 2004). In this case, the random variable is assumed to reflect the net accumulated evidence in favor of the correct (positive values) vs. the incorrect (negative values) choice. d′ is the mean divided by the SD of this random variable (scaled by Embedded Image because it is assumed to represent a difference between signals representing the 2 choices). Parameters of the cumulative Weibull functions correspond directly to threshold and slope and are therefore useful for describing the data. Parameters of the decision model are more complicated and are thought to more closely reflect the underlying neural mechanisms.

We trained monkeys to decide the direction of random-dot motion and respond with an eye movement. Their discrimination thresholds decreased steadily with training (Law and Gold 2008). These improvements in sensitivity corresponded to changes in motion-driven responses in the lateral intraparietal (LIP) area, which encodes sensory, motor, and cognitive signals and is thought to represent the conversion of motion evidence into a decision that guides the saccadic response, but not in the middle temporal area (MT), which encodes the motion evidence itself (Britten et al. 1992; Hanks et al. 2006; Newsome and Paré 1988; Pasternak and Merigan 1994; Platt 2002; Roitman and Shadlen 2002; Salzman et al. 1990; Shadlen and Newsome 2001; Snyder et al. 1997; Sugrue et al. 2004). These results suggest that for this task, training shapes how MT output is read out to form the decision (Law and Gold 2008). Such changes in readout are consistent with a pooling process that learns to weigh outputs selectively from the most informative sensory neurons (Jacobs 2009; Law and Gold 2009; Petrov et al. 2005). However, the nature of this selective pooling process remains unknown.

In principle, selective pooling could be implemented via dynamic linear weights that scale the outputs of individual neurons, depending on the strength of their contribution to the decision (Geisler and Albrecht 1997; Hol and Treue 2001; Jazayeri and Movshon 2007; Parker and Newsome 1998; Pouget et al. 2003; Seung and Sompolinsky 1993). These changes in pooling weights might act to reduce “channel uncertainty” or “distraction of attention” that affects the relative contributions of informative and uninformative signals to a perceptual judgment. Combined with certain nonlinear pooling operations, such a scheme can affect psychometric slope (Kontsevich and Tyler 1999; Pelli 1985, 1987; Tyler and Chen 2000). Here we report a novel finding, involving slight decreases in psychometric slope that accompanied perceptual learning in the absence of comparable changes in the underlying sensory representation. We also show for the first time that these learning-induced changes are consistent with a pooling process with a dynamic linear component and a static nonlinear component that generates an increasingly selective readout of the most informative sensory neurons with training.

METHODS

We trained four adult rhesus monkeys (Macaca mulatta) on a direction-discrimination task, two male (monkey At: 281,638 trials in 187 sessions over 518 days; Cy: 114,404 trials in 160 sessions over 641 days) and two female (Av: 382,788 trials in 232 sessions over 637 days; ZZ: 69,028 trials in 130 sessions over 416 days). For two of these monkeys, we also recorded the activity of individual MT neurons before and during training (n = 50 neurons recorded before training and 71 recorded during training for Cy; 47 before and 38 during training for ZZ; Law and Gold 2008). All were naïve to behavioral and electrophysiological testing before the experiments began. All behavioral, surgical, and electrophysiological procedures were carried out in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals and were approved by the University of Pennsylvania Institutional Animal Care and Use Committee.

Each monkey was prepared for experimental testing in a single surgical session, in which a head-holding device, recording cylinder (Crist Instrument, Damascus, MD), and single scleral search coil used to monitor eye movements (Judge et al. 1980; Robinson 1963) were implanted. For monkeys Cy and ZZ, training was combined with unit recordings in areas MT and LIP (Gold et al. 2008; Law and Gold 2008). For At and Av, training was combined with a technique for assessing ongoing oculomotor activity during decision formation using saccadic eye movements evoked with electrical microstimulation of the frontal eye field (FEF; Connolly et al. 2009; Gold et al. 2008).

Behavioral task and training

The direction-discrimination task is described in detail elsewhere (Fig. 2; Gold and Shadlen 2003; Gold et al. 2008; Law and Gold 2008). Briefly, visual stimuli were generated in MATLAB on a Macintosh computer, using the Psychophysics Toolbox software (Brainard 1997; Pelli 1997) with custom additions to draw the motion stimulus, and presented on a 21-in. cathode ray tube monitor (Viewsonic) positioned 60 cm directly in front of the monkey. The task required the monkey to fixate a central spot while viewing a random-dot motion stimulus and then, after fixation-point offset, make a saccadic eye movement to foveate one of two choice targets located along the axis of motion. A correct choice of the target in the direction of motion was followed immediately by one or more audible tones and zero to five drops of juice, based on a reward schedule that encouraged multiple correct responses. An incorrect choice was followed by a time-out period of 1–3 s. Trials in which the monkey broke fixation early or failed to select one of the two choice targets were also followed by a time-out period of 1–3 s but were not included in the data analysis. For each trial, a control computer running REX software (Hays et al. 1982) on the QNX operating system (QNX Software Systems) pseudorandomly chose the direction of coherent motion (from two equally balanced alternatives separated by 180°), percentage of coherently moving dots (0, 3.2, 6.4, 12.8, 25.6, 51.2, or 99.9%), and viewing time (chosen from an exponential distribution with a mean of between 0.2 and 0.8 s and bounded between 0.1 and 1.5 s, to approximate a flat hazard function and thus minimize the ability to anticipate stimulus offset). As performance improved with training, the distributions of motion coherences and viewing durations were adjusted to maintain a relatively stable overall percentage of correct responses (∼70–75% correct), and thus overall feedback and motivation, for each session.

Fig. 2.

Direction-discrimination task. A: task design. While the monkey fixates a central spot, 2 targets appear, indicating the 2 possible directions of motion. The motion stimulus is then shown for a variable duration, controlled by the experimenter. Once the fixation spot is extinguished, the monkey must shift gaze to the target in the direction of coherent motion to receive a liquid reward. BE: distributions of viewing times used in all sessions for each of the 4 monkeys, as indicated.

Eye position was monitored throughout each experimental session using a scleral search coil technique (CNC Engineering, Seattle, WA) with a coil implanted monocularly in each monkey. Eye position signals were sampled at 1,000 Hz. During motion viewing, the monkeys were required to maintain fixation to within ±2.5° of visual angle. After offset of the motion stimulus and fixation point, the monkeys had 80–700 ms in which to initiate a voluntary saccadic response to one of the two targets.

MT recordings

For monkeys Cy and ZZ, behavioral testing was combined with recordings of neural activity in area MT (and area LIP, not described here; for more details, see Gold et al. 2008; Law and Gold 2008). To begin each session, quartz-coated platinum–tungsten microelectrodes were advanced into MT via a Mini-Matrix microdrive system (Thomas Recording, Giessen, Germany). Extracellular action potential waveforms were stored and sorted off-line (Plexon, Dallas, TX). If a direction-tuned MT neuron was found, the motion stimulus was placed in its receptive field and shown at the neuron's preferred direction (and 180° opposite) and speed. If no MT neuron was found, the approximately modal location, direction, and speed from previous sessions were used. We include for analysis only those cells with responses to strong motion that differed substantially for the two directions [i.e., the direction of motion could be determined from neural responses to 99.9% coherence stimuli >70% of the time, using an ideal-observer receiver operating characteristic (ROC) analysis (Britten et al. 1992)].

Psychometric analyses

Behavioral data were analyzed using three different forms of psychometric function (Fig. 1). The first two, which were based on a cumulative Weibull function, were used because they fit these kinds of data well and are parameterized in a way that relates directly to the threshold and slope of the psychometric function (Fig. 1, AF; Britten et al. 1992; Gold and Shadlen 2003; Law and Gold 2008; Quick 1974). Thus fits to these functions provided a clear, descriptive account of changes in psychometric threshold and slope with training. In contrast, the third function, which is a form of sequential sampling or “drift diffusion” model, is based on a more complicated parameterization that is thought to relate more directly to the underlying neural mechanisms (Fig. 1, GI; Eckhoff et al. 2008; Gold and Shadlen 2007).

The first function characterized psychometric threshold and slope as a function of motion strength but independent of viewing time (Fig. 1, AC). Data from each session were divided into four equally populated bins of viewing time and each bin was fit using the following function of motion strength P(C)=0.5+(0.5λ)[1k(C/α1)β1] (1) where C is the fraction of coherently moving dots, λ is the lapse rate measured as the fraction of errors at the highest motion strength (typically 99% coherence [coh]) and long viewing times (> the median value for the given session), k was set to 2.0851 (so that the point of maximum slope of the function, corresponding to its value at α1, was at d′ = 1, or P ≅ 76%), and α1 and β1 are fit values corresponding to the threshold and slope, respectively, of the function. We used these fits to assess how each parameter depended on viewing time by computing for each session weighted linear fits (Draper and Smith 1998), using the uncertainty associated with the maximum-likelihood estimates of each parameter (Meeker and Escobar 1995), of the logarithm of α1 or β1 versus the logarithm of the median viewing time for each bin. For these and other weighted linear fits, we used a likelihood ratio test of a nested model to test the null hypothesis H0: slope = 0 (Draper and Smith 1998; Mendenhall et al. 1986).

The remaining two functions characterized performance as a function of both motion strength and viewing time. These functions were fit to behavioral data from nonoverlapping bins of five sequential sessions, which included 1,986–15,340 trials (individual sessions included 110–3,658 trials). This binning procedure was used for two reasons. First, it provided sufficiently high sample sizes to minimize SEs in fit parameter estimates and sufficiently high resolution to capture systematic changes in performance over time (Eckhoff et al. 2008). Second, we found that smaller sample sizes tended to bias measurements of psychometric slope upward, with stable estimates occurring only for sets of >1,000 trials: fitting Eq. 2 to randomly selected subsets of data from individual sessions with ≥1,500 trials yielded slope estimates that were higher than estimates from the full data set by a median (interquartile range [IQR]) percentage of 7.4 [47.4]% when using <500 trials, 0.9 [20.5]% when using <1,000 trials, 0.0 [9.8]% when using <1,500 trials, and 0.0 [1.6]% when using >1,500 trials. Because many individual sessions had <1,000 trials, binning data across sessions helped to avoid this problem.

Because the results of analyses using Eq. 1 suggested that viewing time affected α1 but not β1, the second function extended Eq. 1 to include a time-dependent threshold (Fig. 1, DF) P(C,T)=0.5+(0.5λ)[1k(C/(α2Tγ))β2] (2) where T is viewing time (in seconds); λ and k are as in Eq. 1; and α2, β2, and γ are fit parameters that govern the time-dependent threshold (α2 and γ) and time-independent shape (β2) of the function.

The third psychometric function was based on a model of decision formation that has been shown to relate closely to both behavior and physiology for this task (Fig. 1, GI; Bogacz et al. 2006; Ditterich 2006; Eckhoff et al. 2008; Shadlen et al. 2007). In this model, the decision variable represents an accumulation of noisy motion information over time. Because the accumulation is noisy, the decision variable can be thought of as a random variable with a mean and variance that grow as a function of both motion strength and viewing time. For the version of the discrimination task used in this study, in which viewing time was controlled by the experimenter, the model assumes that choice accuracy depends on the distribution of this random variable at the given time. More specifically, the model assumes that a correct response occurs when the value of this normally distributed decision variable is >0 (see Fig. 1B, inset). Thus the resulting psychometric function describing the probability of a correct response at time T for stimulus S can be expressed with respect to the z-score of the mean net integrated evidence and associated SD P[z(S,T)]=12[1+erf(z2)] (3a) See Eckhoff et al. (2008) for details of this derivation and Ricciardi and Sacerdote (1979) and Bogacz et al. (2006) for more in-depth analyses of the relationship between these kinds of dynamic processes and neural decision variables.

Discriminability (or d′, where d′ = z 2)—and thus performance—is determined by the signal-to-noise ratio of the decision variable at the time the choice is made (Green and Swets 1966; Macmillan and Creelman 2004). Here we model this decision variable after the accumulated difference between the pooled responses of MT neurons (in spikes/s) tuned to the two directions of motion (Britten et al. 1992; Gold and Shadlen 2003; Law and Gold 2008; Shadlen et al. 1996), using a form that has been used previously to model these kinds of data. Specifically, we assume that the expected value of this decision variable rises with motion coherence (C) governed by a linear (a) and a nonlinear (m) term (aCm), which provides a good match to real MT data (Britten 2003; Law and Gold 2008). The temporal accumulation is based on a time-varying (power-law) drift rate (Tn), which can account for improved behavioral sensitivity as stimulus viewing time increases (Eckhoff et al. 2008; Gold and Shadlen 2000, 2003). The variance of the pooled signal includes both an additive (κ) and multiplicative (ϕ) component (Shadlen et al. 1996; Zohary et al. 1994b). Thus d=z2=aCmTnκ+ϕaCmTn2 (3b)

We also included in the decision model several parameters to account for possible choice biases—i.e., a propensity to choose one of the two alternatives, independent of stimulus direction. Such biases, which we previously showed were apparent in the data from all four monkeys, particularly early in training (Gold et al. 2008), can in principle affect both psychometric threshold and slope. As described in detail in Gold et al. (2008), we modeled biases as an additive offset to the decision variable (Eq. 3b): BD, where D is −1 for leftward motion and 1 for rightward motion, and the bias B = Bs + BwkSt, where Bs is a fit parameter describing the overall offset per session and Bwk is another fit parameter that scales St, the “sequential bias” describing a weighted average of recent choices on the current choice.

Fits of Eq. 3 to five-session bins had nine free parameters: a, m, and n from Eq. 3b and six bias parameters (a separate Bs for each of the five sessions and a single parameter Bwk), described earlier. The lapse rate λ was computed as for Eq. 1. Multiplicative noise (ϕ in Eq. 3b) did not appear to change with training (Law and Gold 2008) and was therefore fixed to 0.3, a reasonable value for these kinds of weakly correlated neurons (Law and Gold 2008; Shadlen et al. 1996; Zohary et al. 1994a). Additive noise (κ in Eq. 3b) essentially trades off directly with a, the linear scaling of signal strength (Fig. 1I), and therefore was fixed to 10 spikes/s, which was derived from the average activity of MT neurons to 0% coherence motion.

For all three forms of psychometric function (Eqs. 13), we computed discrimination threshold (α) as the motion coherence corresponding to d′ = 1. For Eqs. 1 and 2, α is simply a fit parameter. For the decision model (Eq. 3), we can rearrange Eq. 3b and solve for α3 2a2T2n(α3m)2+d2ϕaTn(α3m)+κd2=0 (4a) and therefore (for positive α3) α3=[d2ϕ+dd2φ2+8κ4aTn]1/m (4b) We also computed the slope of each function (β′) as the change in percentage correct as a function of a change in loge (coherence) computed at α. For Eqs. 1 and 2, β′ can be computed directly from β (Gilchrist et al. 2005; Strasburger 2001) β=βloge(k)k (5) For the decision model, finding the slope β′ (assuming no choice biases) requires computing the derivative of P with respect to log (C). By the chain rule, dP/dC = (dP/dz) × (dz/dC) and dP/dC = [dP/d log (C)] × [d log (C)/dC]. Therefore [and using d log (C)/dC = 1/C and evaluating at α3, where d′ = 1] dPdlogC=dPdzdzdCα3=12πe((d)2/4)×d[aα3mTnκ+ϕaα3mTn]dCα3=m2πe(1/4)κ+0.5ϕaα3mTnκ+ϕaα3mTn (6) Maximum-likelihood fits were computed for each function to determine the best value of each free parameter (Watson 1979). We also report the SE for each parameter, computed using analytic methods (Meeker and Escobar 1995). Confidence intervals (CIs) computed using bootstrap methods (Wichmann and Hill 2001b) yielded similar results, typically reflecting less precise estimates of psychometric slope than of threshold; e.g., for fits to Eq. 1, the ratio of SE to the best-fitting value of α had median [IQR] values using analytic methods of 0.08 [0.02], 0.07 [0.03], 0.11 [0.05], and 0.14 [0.05] and using bootstrap methods of 0.08 [0.02], 0.07 [0.03], 0.11 [0.05], and 0.14 [0.06] for monkeys At, Av, Cy, and ZZ, respectively (Wilcoxon paired-samples test for H0: median difference = 0, P > 0.01 in each case); the ratio of SE to the best-fitting value of β′ had median [IQR] values using analytic methods of 0.13 [0.05], 0.12 [0.04], 0.18 [0.10], and 0.25 [0.18] and using bootstrap methods of 0.13 [0.04], 0.12 [0.05], 0.18 [0.11], and 0.24 [0.13] for monkeys At, Av, Cy, and ZZ, respectively (P > 0.01 in each case, except monkey Cy).

We plot psychometric functions in two ways: 1) percentage correct versus motion strength with a logarithmic abscissa, which yields a sigmoid-shaped function (e.g., Fig. 1, A, D, and G); and 2) d′ versus motion strength with a logarithmic scale on both axes, which yields a roughly linear function whose position and orientation correspond primarily to the threshold and slope, respectively, of the sigmoidal function (e.g., Fig. 1, B, E, and H; Gilchrist et al. 2005; Klein 2001; Strasburger 2001). Discriminability was computed from the probability of a correct response (P), such that d′ = 2 erf (2P − 1) (Green and Swets 1966; Klein 2001; Macmillan and Creelman 2004).

Neurometric analyses

Neural data were first analyzed using an ideal-observer technique that determines the probability of correctly determining the direction of motion based on an ROC analysis of the responses of the given neuron (Britten et al. 1992; Hanley and McNeil 1982; Parker and Newsome 1998). These probabilities, computed as a function of viewing time (using cumulative spike counts up to the given time) and coherence, were then fit to Eqs. 13. We included for analyses only cells that yielded reliable estimates of discrimination threshold (i.e., α2 from Eq. 2 <99.9% coh, which was obtained for 106 of 110 cells for Cy and 72 of 76 cells for ZZ). We fit data from individual cells, as opposed to data binned across sessions (or cells), because 1) different cells, even when recorded in sequential sessions, could have quite different response properties and therefore were not necessarily appropriate to be combined directly, and 2) unlike the behavioral data, which yielded a single data point per trial (the choice for the given coherence and viewing time), the neural data yielded multiple data points (cumulative spike counts corresponding to a single coherence but multiple viewing times) and therefore did not suffer as badly from the problem of estimating slopes from single-session data.

Pooling model

To test whether pooled MT responses could account for the relationship between psychometric threshold and slope measured during training, we implemented two pooling models, one analytical and the other simulated.

The analytical model used several simplifying assumptions to test the effects of a nonlinear pooling scheme on the predicted psychometric function. In particular, we assumed that the pooled response includes contributions from a single motion-sensitive neuron and many independent, motion-insensitive ones. We tested how changes in a static nonlinearity that governs how the response of each neuron contributes to the decision variable affect the shape of the psychometric function, as increasingly fewer insensitive neurons are included in the pooled response.

Responses of the motion-sensitive neuron (S) were modeled as a linear function of coherence (C, [0…1]): 〈S〉 = κ + aC, where 〈·〉 denotes expectation, κ is additive noise, and a is a linear scale factor (Britten et al. 1993). Responses of the motion-insensitive neurons were modeled as 〈N〉 = κ. Responses were assumed to be Poisson, with variances that scaled with their expected values. To generate a decision variable with an expected value of zero for 0% coherent motion, the pooled response (R) was based on the difference in activity between two equal-sized populations of neurons: one that includes the signal neuron and many noisy ones, the other that includes just noisy neurons. Moreover, the pooling scheme included a power-law nonlinearity (q). Thus R=SqNq (7a) var[R]=var[Sq]+(2n+1)var[Nq]=S2qSq2+(2n+1){N2qNq2} (7b) We computed the expected value of the Poisson-distributed random variable raised to the qth power (i.e., the qth raw moment) using Dobinski's formula (http://mathworld.wolfram.com/PoissonDistribution.html). We then computed d=R/var[R] and determined psychometric threshold (the coherence at d′ = 1) and slope (Δ% correct/Δcoherence at threshold) numerically.

The simulations used MT responses measured previously to generate pooled signals that, in turn, were used to generate predicted psychometric functions throughout training (Law and Gold 2008). The MT responses were taken from the complete data set from each monkey, including cells recorded during and after training, and used to compare with performance for the same monkey. For each monkey, we generated a database of responses to 1-s duration stimuli for each coherence, including simulated responses of neurons tuned to 18 different directions of motion spaced equally around 360°. For each direction, 200 neurons were chosen randomly from the raw data, with replacement. If the direction corresponded to the preferred or null direction tuning of the given neuron (i.e., the directions used to measure the responses), its responses were stored directly. All other direction-specific responses were determined by interpolating the given responses with the neuron's responses to 0% coherence motion, using a scale factor based on a Gaussian-shaped direction tuning curve (width = 30°) describing how the responses decline as a function of motion direction.

For example, to simulate responses to 25.6% coherence motion of a neuron tuned 20° away from the preferred direction of the recorded neuron (corresponding to a point on the direction tuning curve whose height is 41% of the maximum value), we started with two distributions of responses from the recorded neuron: 1) to 0% coherence motion and 2) to 25.6% coherence motion in the preferred direction. Each distribution consisted of 100 trials selected randomly with replacement from the real data. We then sorted each distribution in ascending order of spike rates. Finally, we generated an interpolated data set that took each corresponding pair of responses from the two distributions and computed the value 41% of the distance from the first to the second response. For responses of simulated neurons tuned to within 90° of the null direction of the recorded neuron, the interpolation was between recorded responses to 0% motion and responses to null-direction motion (which were typically suppressed relative to baseline). This procedure produced a population response as a function of direction preference that matched the direction tuning properties of the recorded neurons.

The order of the trials per neuron was then shuffled and a pooled response was computed per trial using Eq. 8. Initially, most neurons with a rightward direction preference had a uniform positive linear weight (w in Eq. 8), whereas neurons with a leftward direction preference had a uniform negative weight. Final weights were proportional to the neurometric sensitivity (d′) of the given neuron for 25.6% coherence stimuli. In addition, the pooled response was computed using four free parameters. The first parameter added noise to the initial weights to degrade initial performance, by determining the fraction of randomly selected weights whose sign was flipped (set to 0.36 for Cy and 0.43 for ZZ). The second parameter reflected the fact that the neural responses were assumed to be weakly correlated. For simplicity, the correlations were assumed not to depend on signal strength and thus could be modeled as a simple scale factor on the signal-to-noise ratio (discriminability) of the overall population response (set to 0.20 for both Cy and ZZ; Bair et al. 2001; Shadlen et al. 1996; Sompolinsky et al. 2001; Zohary et al. 1994b). This simplified scheme leaves open the possibility that other, more complicated correlations exist that affect the pooled signal in a nonlinear fashion with respect to signal strength, which are exactly the effects we have tried to identify using these simulations. The third parameter was an additive offset to the SD of the distribution of pooled signals across trials to simulate pooling noise (set to 0.022 for Cy and 0.030 for ZZ in units of spikes/s, scaled by the normalized linear weights; Shadlen et al. 1996). The fourth parameter was the nonlinear term q in Eq. 8, which was set as a single free parameter whose value was determined by finding the maximum-likelihood fit of the output of the pooling model to the psychometric slope versus threshold data (with SE computed analytically; Meeker and Escobar 1995). Discriminability was computed as the ratio of the mean to SD of this pooled response across trials.

RESULTS

We analyzed the effects of training on psychometric and neurometric functions in monkeys learning a visual motion direction-discrimination task. In the following text we first describe fits of behavioral data to three different psychometric functions (Eqs. 13 and Fig. 1) to quantify how threshold and slope changed with training. We then describe similar analyses of MT neurometric functions measured before and during training. Finally, we show that these behavioral and neural data are consistent with a pooling scheme that reads out MT responses using linear weights that change, but a nonlinear component that remains constant with training.

Training affected psychometric threshold and slope

We previously showed that training causes systematic decreases in psychometric threshold for monkeys performing the direction-discrimination task (Connolly et al. 2009; Law and Gold 2008). In this section we show that these training-induced decreases in threshold were accompanied by slight decreases in psychometric slope.

Characterizing the slope of the relationship between performance accuracy and motion strength was complicated by the fact that not just motion strength but also viewing time were varied throughout each experiment (Fig. 2). Longer viewing times typically result in lower thresholds, but their effects on slope are unknown for this task (Britten et al. 1992; Gold and Shadlen 2000, 2003; Roitman and Shadlen 2002). Therefore we characterized performance first in terms of both motion strength and viewing time and then as a function of training.

Example psychometric fits of Eqs. 1 (dashed lines) and 2 (solid lines) to data from monkey Av are shown in Fig. 3. Both early and late in training, increasing viewing times caused lower psychometric thresholds but no systematic changes in slope, corresponding to leftward shifts of psychometric functions describing log (d′) versus log (coherence). In contrast, training caused a decrease in psychometric threshold from the early to the late sessions, which was accompanied by a slight decline in slope. This decline in slope occurred despite the fact that the data were slightly noisier early in training, which typically results in shallower slopes (mean residual deviance of fits to Eq. 2: 1.29 for A, 1.11 for B; Wichmann and Hill 2001a).

Fig. 3.

Example psychometric functions plotted as log (d′) vs. log (coherence) from blocks of 15 sessions early (A) and late (B) in training for monkey Av. Points are data binned by viewing time. Error bars are 68% confidence intervals (CIs), assuming binomial errors. Dashed lines are fits of data from each time bin to Eq. 1. Insets indicate for each of these time-binned fits the mean viewing time (T, in ms) and best-fitting values of α1 (in units % coh) and β1 [in units % correct/loge (coh)]. Solid lines are fits of the full data set to Eq. 22 = 32% coh, β2 = 0.27% correct/loge (coh) for At; α2 = 13% coh, β2 = 0.25% correct/loge (coh) for Av].

For all four monkeys, increasing viewing time tended to decrease psychometric threshold without systematically affecting the slope (Fig. 4). For each session, we analyzed the relationships between viewing time and the best-fitting values of threshold (α1) and slope (β1) from Eqs. 1 and 5, describing a time-independent psychometric function, fit to time-binned data. We computed a linear fit to α1 versus the mean viewing time in the given bin on a log–log plot and report the slope of this fit, describing a power-law relationship between sensitivity and viewing time (Fig. 4A). The slope of these fits had a median [IQR] value across all sessions of −0.20 [0.28] (Wilcoxon signed-rank test for H0: median = 0, P < 0.01) for At, −0.38 [0.24] (P < 0.01) for Av, −0.10 [0.43] (P < 0.01) for Cy, and −0.33 [0.30] (P < 0.01) for ZZ. These results imply that sensitivity tended to increase (i.e., discrimination threshold tended to decrease) as viewing time increased, but not quite at the rate of time (i.e., a log–log slope of −0.5) predicted for a perfect accumulation of incoming motion information (Gold and Shadlen 2007). In contrast, a linear fit of log (β1) versus log (viewing time) had a median [IQR] slope of 0.08 [0.42] (P = 0.02) for At, 0.03 [0.31] (P = 0.06) for Av, 0.05 [0.66] (P = 0.81) for Cy, and 0.07 [0.66] (P = 0.12) for ZZ, implying minimal systematic changes of slope with viewing time (Fig. 4, CF).

Fig. 4.

Summary of the effects of viewing time on behavior. A: log–log plot of threshold vs. viewing time from an individual session for monkey Av. Points and error bars are best-fitting values and SE, respectively, of α1 from Eq. 1 fit to time-binned data. Line is a weighted linear fit (slope [95% CIs] = −0.65 [−0.83 −0.47]). B: log–log plot of slope vs. viewing time from the same session. Points and error bars are best-fitting values and SE, respectively, of β1 from Eqs. 1 and 5 fit to time-binned data. Line is a weighted linear fit (slope [95% CIs] = −0.01 [−0.16 0.14]). CF: summary of the time dependence of psychometric threshold and slope for the 4 monkeys, as indicated. Each point represents data from one session. Abscissa is the slope of the linear fit to loge1) vs. loge (viewing time), as in A. Negative values imply threshold decreased with increasing viewing time. Ordinate is the slope of the linear fit to loge(β1) vs. loge (viewing time), as in B. Negative values imply slope decreased with increasing viewing time. Arrows indicate medians.

For all four monkeys, training tended to decrease psychometric threshold while also slightly decreasing slope (Fig. 5). We analyzed changes in the best-fitting values of threshold (α2) and slope (β2) from Eqs. 2 and 5 fit to data from bins of five consecutive sessions across training (see methods). Weighted linear fits of log (α2) versus mean session number had negative slopes that were significantly different from zero for all four monkeys (Fig. 5, AD; weighted linear regression, H0: slope = 0, P < 0.01), as reported previously (Connolly et al. 2009; Law and Gold 2008). Weighted linear fits of β2 versus session number also had slightly negative best-fitting slopes that were significantly different from zero in all four monkeys (Fig. 5, EH; weighted linear regression, H0: slope = 0, P ≤ 0.02). Consistent with these results, direct comparisons of session-by-session values of log (α2) and β2 showed a significant, linear relationship between the two, with lower thresholds corresponding to shallower slopes (Fig. 5, IL; weighted total least-square linear regression, H0: slope = 0, P < 0.01 for all four monkeys).

Fig. 5.

Summary of the effects of training on behavior. Columns are data from each monkey, as indicated. In each panel, points indicate best-fitting values of the given parameter from nonoverlapping blocks of 5 contiguous sessions; error bars are SE. AD: threshold (α2 in Eq. 2, plotted on a logarithmic axis) vs. session number. EH: slope (β2 from Eqs. 2 and 5, plotted on a linear axis) vs. session number. IL: β2 vs. α2. For each panel, lines are weighted linear fits (least squares for AH; total least squares for IL). The slope [in units of log (coh)/100 sessions for AD, % correct/log (coh)/100 sessions for EH, and % correct/log (coh)2 for IL] of this line, [95% CIs], and the adjusted r2 are shown in each panel. The P values for H0: slope = 0 were <0.01 for all panels except G, where P = 0.02.

We also characterized the relationship between psychometric threshold and slope using a decision model (Eq. 3), for three reasons. First, these analyses demonstrated that this relationship was not specific to a particular psychometric function. There was a similar, significant relationship between psychometric threshold and slope using fits to the decision model (Eq. 3) as for the time-dependent Weibull function (Eq. 2): weighted total least-square linear regression of slope versus threshold using Eq. 3 had slopes [95% CIs] of 0.15 [0.10 0.20], 0.17 [0.13 0.22], 0.13 [0.06 0.21], and 0.26 [0.13 0.40]% cor/log (coh)2 for At, Av, Cy, and ZZ, respectively, H0: slope = 0, P < 0.01 for all four monkeys (compare with Fig. 5, IL for Eq. 2).

Second, we used the decision model to account for sequential choice biases. Such trial-by-trial fluctuations in performance, which we previously identified in these data sets (Gold et al. 2008), can affect psychometric slope. In particular, trial-by-trial, stimulus-independent changes in choice behavior can add variability to psychometric data, resulting in shallower slopes. However, because these biases tended to decrease with training in our monkeys (Gold et al. 2008), their expected effects (steeper slopes with training) were opposite to what we measured (shallower slopes with training). Accordingly, the fits to Eq. 3, which accounted for choice biases (see methods), tended to show a stronger relationship between threshold and slope than the fits to Eq. 2, which did not account for choice biases. Thus we conclude that our measured decreases in psychometric slope with training did not result from changes in choice biases.

Third, we used the decision model to better understand how the decision process changes with training. This model is a form of sequential-sampling model that has been used extensively to infer properties of the internal decision variable used by the monkeys to perform this task (Bogacz et al. 2006; Ditterich 2006; Eckhoff et al. 2008; Grossberg and Pilly 2008; Shadlen et al. 2007; Wong and Wang 2006). We were particularly interested in understanding how changes in slope related to both linear and nonlinear properties of the inferred decision variable. As indicated by Eq. 6, the psychometric slope derived from the decision model is directly proportional to m from Eq. 3b, the exponent describing the power-law relationship between motion strength and the decision variable, but related in a more complex manner to other parameters of the model. Accordingly, psychometric slope was correlated nearly perfectly with m (Pearson's r >0.99, H0: r = 0, P < 0.01, for all four monkeys), but not systematically correlated with any other fit parameter of the decision model (P values for Pearson's r were >0.05 for all other parameters in all four monkeys except for Bwk for At, n for Av, and a for Cy). Thus changes in slope with training appeared to involve changes in the nonlinearity of the pooled signal, in contrast to changes in threshold that likely reflected changes in the linear component a (Eckhoff et al. 2008).

Training had little effect on MT neurometric threshold and slope

We quantified the motion sensitivity of each MT neuron measured before and during training in monkeys Cy and ZZ using an ideal-observer “neurometric” analysis. This method is based on an ROC analysis that describes the ability of an ideal observer to determine the direction of motion solely from the accumulated responses of the neuron to stimuli moving toward and away from its preferred direction, computed separately for different coherences and viewing times (Britten et al. 1992; Green and Swets 1966). As we did for the psychophysical data, we fit these data to all three models (Eqs. 13) and analyzed neurometric threshold (α) and slope (β′). Previous studies showed that MT neurometric threshold decreases with increasing viewing time but does not change systematically with training (Britten et al. 1992; Law and Gold 2008). In this section we analyze how viewing time and training affect also affect neurometric slope.

Example neurometric fits to a single MT neuron recorded from monkey Cy are shown in Fig. 6. Ideal-observer analyses of cumulative spike counts measured over increasing viewing times corresponded to systematic decreases in neurometric threshold. Like the psychometric data, these increases in sensitivity with viewing time did not correspond to systematic changes in psychometric slope.

Fig. 6.

Example neurometric functions plotted as log (d′) vs. log (coherence) for a single MT neuron from monkey Cy. Points are data binned by viewing time. Error bars are SE (Hanley and McNeil 1982). Dashed lines are fits of data from each time bin to Eq. 1. Insets indicate for each of these time-binned fits the mean viewing time (T, in ms) and best-fitting values of α1 (in units % coh) and β1 [in units % correct/loge (coh)]. Solid lines are fits of the full data set to Eq. 22 = 13% coh, β2 = 0.28% correct/loge (coh)].

For MT recordings in both monkeys, increasing viewing time tended to decrease neurometric threshold without systematically affecting the slope (Fig. 7). For each neuron, we computed spike rates in increasingly long epochs, all starting at motion onset and ending before motion offset, and analyzed the relationships between epoch duration and the best-fitting values of neurometric threshold (α1) and slope (β1) from Eqs. 1 and 5. We computed linear fits to α1 or β1 versus epoch duration on log–log plots and report the slope of these fits, describing a power-law relationship between sensitivity or slope versus duration (example fits are shown in Fig. 7, A and B). For the population of MT neurons, linear fits of log (α1) versus log (time) had a median [IQR] slope of −0.36 [0.40] (Wilcoxon signed-rank test for median = 0, P < 0.01) for Cy (Fig. 7C, abscissa) and −0.39 [0.41] (P < 0.01) for ZZ (Fig. 7D, abscissa), implying that threshold decreased as duration increased. In contrast, linear fits of log (β1) versus log (time) had a median [IQR] slope of −0.07 [0.37] (P = 0.06) for Cy (Fig. 7C, ordinate) and 0.05 [0.39] (P = 0.30) for ZZ (Fig. 7D, ordinate), implying minimal systematic changes of neurometric slope with viewing time.

Fig. 7.

Summary of the effects of viewing time on MT neurometric functions. A: log–log plot of threshold vs. viewing time from an individual MT neuron for monkey Cy. Points and error bars are best-fitting values and SE, respectively, of α1 from Eq. 1 fit to time-binned data. Line is a weighted linear fit [slope [95% CIs] = −0.69 [−0.75 −0.63] log (coh)/ms]. B: log–log plot of slope vs. viewing time from the same MT neuron. Points and error bars are best-fitting values and SE, respectively, of β1 from Eqs. 1 and 5 fit to time-binned data. Line is a weighted linear fit [slope = −0.01 [−0.09 0.07] % cor/log (coh)/ms]. C and D: summary of the time dependence of neurometric threshold and slope for the 2 monkeys, as indicated. Each point represents data from one session. Abscissa is the slope of the linear fit to loge1) vs. loge (viewing time), as in A. Ordinate is the slope of the linear fit to loge (β1) vs. loge (viewing time), as in B. Arrows indicate medians.

There did not appear to be a consistent, systematic relationship between the threshold and slope of MT neurometric functions across the population of neurons recorded before and during training for monkeys Cy and ZZ that could directly explain the psychometric data (Fig. 8). As reported previously, neurometric threshold was slightly lower during than that before training, probably because of differences in attention, although the difference was not significant in either monkey (geometric median α2 = 20.0% coh before and 18.5% coh during training, Wilcoxon test for H0: equal geometric medians, P = 0.57 for Cy; 15.3% coh before and 15.6% coh curing training, P = 0.38, for ZZ) (Law and Gold 2008; Seidemann and Newsome 1999). There was also no systematic difference in slope before versus during training [median β2 = 0.25% correct/log (coh) before and 0.24% coh during training, Wilcoxon test for H0: equal medians, P = 0.89 for Cy; 0.24% correct/log (coh) before and 0.23% correct/log (coh) during training, P = 0.39, for ZZ]. For Cy, threshold and slope tended to increase slightly as a function of session number (Fig. 8, A and C), but there was no significant linear relationship between threshold and slope (Fig. 8E). For ZZ, there was no systematic relationship between threshold or slope and session number (Fig. 8, B and D) but a slight, positive relationship between threshold and slope that was substantially smaller in magnitude than the relationship measured from psychometric data (compare Figs. 8F and 5L).

Fig. 8.

Summary of the relationship between MT neurometric threshold and slope before and during training. Left column: monkey Cy; right column: monkey ZZ. In each panel, each point represents the best-fitting value of the given parameter from a single neuron; error bars are SE. A and B: threshold (α2 in Eq. 2, plotted on a logarithmic axis) vs. session number. Vertical dashed line separates pretraining (in which the monkeys merely maintained fixation while the motion stimulus was shown) from training. C and D: slope (β2 from Eqs. 2 and 5, plotted on a linear axis) vs. session number. Vertical dashed line separates pretraining from training. E and F: β2 vs. α2. For each panel, lines are weighted linear fits (least squares for AD; total least squares for E and F). The slope [in units of log (coh)/100 sessions for A and B; % correct/log (coh)/100 sessions for C and D; and % correct/log (coh)2 for E and F] of this line; [95% CIs], P value for H0: slope = 0, and the adjusted r2 are shown in each panel.

Like psychometric slopes, neurometric slopes can be sensitive to fluctuations in activity within a session. To test whether such fluctuations might have biased our results, we analyzed data from the first and last 200 trials from each session. We used these data previously to show that MT neurometric thresholds were slightly higher at the end compared with those at the beginning of each session, consistent with previous reports (Law and Gold 2008; Zohary et al. 1994a). We found that, despite these changes in threshold, there was no difference in neurometric slope measured in these early and late epochs compared with the slope measured from data from the full session (Wilcoxon paired-sample test for H0: median difference in slope = 0, P > 0.54 for early vs. late, early vs. all, and late vs. all for both monkeys). Therefore it seems unlikely that fluctuations in MT activity within a session had a substantial effect on the neurometric thresholds and slopes we report in Fig. 8.

We also fit MT data using the decision model (Eq. 3) to show that the relationships between threshold and slope shown in Fig. 8 were, like the comparable psychometric analyses, not specific to a particular neurometric function. For Cy, there was no significant linear relationship between threshold and session number (H0: slope of a weighted linear regression = 0, P = 0.20, adjusted r2 = 0.01) or slope and session number (P = 1.0, r2 = 0.00), but a slight negative relationship between slope and threshold [slope [95% CIs] = −0.03 [−0.09 0.02]% cor/log (coh)2, P < 0.01, r2 = 0.01] from these fits. For ZZ, there was likewise no significant, linear relationship between threshold and session number (P = 1.0, r2 = 0.00) or slope and session number (P = 1.0, r2 = 0.00), but a slight positive relationship between slope and threshold [slope = 0.14 [0.05 0.22]% cor/log (coh)2, P < 0.01, r2 = 0.12]. Like that for the psychometric data, neurometric slope was correlated nearly perfectly with m in Eq. 3 (Pearson's r > 0.99, H0: r = 0, P < 0.01, for both monkeys).

A decision model with dynamic linear weights and a static nonlinearity can account for the results

Training caused systematic decreases in psychometric slope and threshold, consistent with changes in both linear and nonlinear components of an inferred decision variable that uses pooled motion information to guide behavior. Training did not have comparable effects on the response properties of MT neurons, which are thought to provide sensory input to the decision variable. Therefore one possible explanation is that training causes changes in both linear and nonlinear components of the pooling scheme that converts MT responses into the decision variable. Here we show that an even simpler scheme can account for the results. According to this scheme, changes in linear scaling of MT responses in the presence of a static nonlinearity can lead to appropriate linear and nonlinear changes in the pooled signal that governs behavior.

This scheme is related to the effects of uncertainty and attention on psychometric functions under assumptions of probability summation. According to probability summation, a signal is detected if it is detected by any one of several independent sources of information in the brain (“channels”). Under this scheme, the slope of the psychometric function is unaffected by changes in the number of equally sensitive channels, but can vary with the number of uninformative channels that can contribute to the decision (Pelli 1985, 1987; Tyler and Chen 2000). This idea suggests that an increasingly selective readout of the most informative neurons in MT might similarly affect psychometric slope as threshold improves, if the pooling scheme includes nonlinear effects that are similar to probability summation.

To test this idea, we simulated direction decisions using real MT responses, recorded in our previous study (Law and Gold 2008), as input to the following pooling scheme R=wixiqxiq1 (8) where R is the pooled response, xi is the response of the ith neuron, wi is the linear weight associated with that neuron (normalized such that ∑ wi2 = 1), and q controls the nonlinear scaling. This kind of pooling scheme can account for numerous nonlinear neural computations (Britten and Heuer 1999; Carandini and Heeger 1994; Carandini et al. 1997; Kouh and Poggio 2008; Miller and Troyer 2002; Purushothaman and Bradley 2005; Rust et al. 2005; Yu et al. 2002). An appealing property of Eq. 8 is that as q grows from unity to increasingly positive values, R changes from representing the mean to the maximum value of the weighted signals, two common forms of pooling (Parker and Newsome 1998). Performance was modeled by computing d′ from R as a function of motion strength for values of wi that changed with training and a fixed value of q.

The values of the weights wi, which like a of the pooled decision variable in Eq. 3b cause a linear scaling of the decision variable, were chosen to simulate changes across training. For relatively poor performance early in training, each wi was set to a common value (positive for neurons with a rightward component of direction tuning and negative otherwise, with a fraction of these values randomly selected to flip in sign to add noise, then normalized as before). For better performance later in training, each wi was proportional to d′ of the associated neuron for 25.6% coherence, thus giving the most sensitive neurons the greatest influence on the decision variable (Gu et al. 2007; Law and Gold 2008). Intermediate stages of training were modeled as weights scaled linearly between these two extremes. The values of xi were based on MT responses measured in our data set, simulated, and compared with behavioral data separately for each monkey (Law and Gold 2008).

When q = 1 in Eq. 8, this scheme adds no further nonlinearity to the pooled signal than what is already present in the individual neural responses. Thus changes in the linear weights cause improvements in psychometric threshold without altering the slope. In contrast, when q >1, the pooling scheme acts more like a max-rule, similar to probability summation, and thus can affect slope as the contribution of insensitive neurons to the pooled response declines.

To illustrate how this kind of pooling scheme can affect the psychometric function, we developed a simple analytic model relating the pooled signal to task performance (Eq. 7, Fig. 9). In this model, a single motion-sensitive neuron is combined with many motion-insensitive ones. Training involves reducing the number of insensitive neurons that contribute to the pooled response (by setting the appropriate wi to zero), which causes improvements in predicted discrimination threshold. The range of predicted thresholds depends on the linear scaling of the motion-sensitive neuron (the curves in Fig. 9B are shifted left compared with those in Fig. 9C). However, changes in the psychometric slope depend on the value of the nonlinear factor q. When q = 1, changes in threshold occur with little change in slope. As q increases, changes in threshold are accompanied by increasingly larger changes in slope (see the family of curves in Fig. 9, B and C).

Fig. 9.

Analytic pooling model. A: simple scheme for analytic evaluation. Responses (x) from a single motion-sensitive (s) and many insensitive (ni) neurons are pooled via linear weights (wj) and a static nonlinearity (q). B and C: adjusting the linear weights to more selectively read out responses from the most sensitive MT neuron (i.e., setting increasing numbers of weights wj>1 to zero) causes systematic improvements in psychometric threshold (curves move leftward along the abscissa), with concomitant changes in psychometric slope (ordinate) that depend on the magnitude of the static nonlinearity (values of q shown for each curve). The linear scaling of the responses of the motion-sensitive neuron as a function of motion strength is twice as large in B as in C.

We used this pooling scheme to directly compare the neural and behavioral data from monkeys Cy and ZZ. We first simulated the responses of a population of MT neurons based on the real data; an example pattern of activation to a strong rightward stimulus is shown in Fig. 10 A (Law and Gold 2008). We then simulated learning by computing a pooled signal based on changes in the linear weights (wi) from Eq. 8: early in training, rightward- and leftward-sensitive neurons were weighed uniformly but with opposite signs (corrupted by noise); later in training, the most sensitive neurons had the largest absolute weights (Fig. 10, B and C). The pooled signal was based on four additional parameters that remained static throughout simulated training (see methods for details): a term governing the initial values of the linear weights, a linear scaling factor, an additive offset, and the nonlinear term q in Eq. 8. Of these parameters, only q systematically affected the relationship between the predicted psychometric threshold and slope. The remaining parameters were used to simulate the appropriate ranges of psychometric threshold (Fig. 10, DG).

Fig. 10.

Pooling model simulations. Responses of 3,600 simulated MT neurons, determined from previously recorded data (Law and Gold 2008), are pooled using linear weights (wi) that can change with training and a power-law nonlinearity (q) that remains constant. A: responses from a single simulated trial to strong rightward motion. Neurons are sorted according to sensitivity (measured as d′ for 25.6% coherence stimuli; ordinate) and simulated direction preference (the 2 directions of motion to discriminate are marked with arrows; abscissa). B and C: linear weight profiles for the neuronal pool depicted in A. B: early in training, most neurons with a direction preference with a rightward component have a uniform positive linear weight, whereas most neurons with a direction preference with a leftward component have a uniform negative weight. C: late in training, the linear weights are scaled according to the discriminability of each neuron. DG: effect of 4 free parameters on the predicted relationship between threshold and slope. In each panel, the 3 colors represent 3 different values of the given parameter. D: fraction of initial weights multiplied by −1. E: linear scale factor on the pooled signal. F: additive pooling noise (in units of spikes/s scaled by the normalized linear weights). G: nonlinearity (q from Eq. 8). H and I: comparison of the predicted relationship between psychometric threshold and slope based on the MT pooling scheme (black line) and the actual relationship measured from the behavioral data (gray points and error bars from Fig. 5, K and L) for monkeys Cy (H) and ZZ (I). J and K: real (gray) and predicted (black) psychometric functions for behavior divided into 4 bins from early to late in training for monkeys Cy (J) and ZZ (K). Predicted functions were derived from the black lines in H and I.

This kind of pooling scheme, with a static nonlinearity (q >1) in tandem with increasingly optimal linear weights, can account for the relationship between threshold and slope we measured in the psychometric data. For Cy, a best-fitting value of q = 2.3 ± 0.38 yielded a pooled signal that, as the linear weights (wi) were adjusted, caused simulated performance to change from a psychometric threshold and slope of roughly 40% coherence and about 0.29% correct/log (coh), respectively, to about 10% coherence and about 0.23% correct/log (coh), respectively (Fig. 10H). Similarly, for ZZ, a best-fitting value of q = 1.59 ± 0.42 corresponded to changes in simulated performance from a psychometric threshold and slope of about 33% coherence and nearly 0.24% correct/log (coh), respectively, to about 14% coherence and about 0.18% correct/log (coh), respectively (Fig. 10I). For both monkeys, these pooling parameters generated simulated psychometric functions describing log (d′) versus log (coh) that were qualitatively similar to the real functions measured throughout training (Fig. 10, J and K).

DISCUSSION

We analyzed both behavioral and neural sensitivities to visual motion stimuli during perceptual learning, to better understand how the neural signals are pooled to influence behavior. We trained monkeys to decide the direction of random-dot motion and characterized changes in the psychometric function describing performance accuracy as a function of stimulus strength. As discrimination threshold improved, the functions became slightly shallower. In contrast, neuronal sensitivity to the motion stimulus in area MT did not change consistently with training and there was no consistent relationship between neurometric threshold and slope across the population of recorded neurons in two monkeys (Law and Gold 2008). We reconciled these findings with a simple pooling model, in which the direction decision is based on an increasingly selective readout of the most sensitive MT neurons using dynamic linear weights and a static nonlinear operation.

Evidence for changes in readout of MT

The idea that the behavioral improvements might arise from changes in the readout of MT responses has several lines of support. Motion-driven responses of MT neurons, which contribute sensory evidence to the direction decision, do not change systematically with training (Britten et al. 1992; Law and Gold 2008; Newsome and Paré 1988; Salzman et al. 1990). However, choice probability, which is a measure of the correspondence between trial-to-trial fluctuations in neural activity and in the monkey's choices, tends to increase slightly in MT as performance improves, particularly for the most sensitive neurons (Britten et al. 1996; Law and Gold 2008). Moreover, neurons in LIP, which receive direct and indirect input from MT and represent formation of the direction decision from the evidence, become increasingly sensitive to the motion stimulus with training (Law and Gold 2008; Lewis and Van Essen 2000; Roitman and Shadlen 2002; Shadlen and Newsome 2001).

These results are consistent with a refinement of functional connectivity between MT and LIP that, with training, generates an increasingly selective readout of the most sensitive MT neurons to form the decision that guides behavior (Law and Gold 2009). However, it seems likely that the real neural implementation is substantially more complicated. LIP is just one of several brain areas, including the superior colliculus and frontal eye field, that exhibit similar decision-related activity in monkeys performing the discrimination task (Horwitz and Newsome 2001; Kim and Shadlen 1999). Moreover, other brain areas such as the medial superior temporal area (MST) are likely also to contribute evidence to the direction decision (Celebrini and Newsome 1994, 1995). Therefore the changes in sensory readout that accompany perceptual learning on this task are likely to involve extensive changes in connectivity between numerous brain areas. Nevertheless, given the importance of MT and LIP to task performance, it is instructive to consider a simplified scheme examining how changes in readout possibly implemented by these two areas could, in principle, affect perceptual performance.

Evidence for a linear/nonlinear pooling scheme

Our analyses suggest that the pooling scheme that converts MT activity into a decision variable includes a linear component that changes with training and a nonlinear component that does not necessarily change with training. The linear weights are appealing because of their computational tractability and capacity to solve this kind of problem. In a two-layer network like that we consider here, such weights effectively allow the patterns of activity in MT to be divided into two sets, corresponding to the two directions of motion (Law and Gold 2009). Changing such linear weights could, in principle, be implemented in a variety of ways, including changes in anatomical connections or synaptic efficacy from MT to its targets including LIP (Destexhe and Marder 2004; Knudsen 2002; Stanton 1996).

The nonlinear pooling scheme that we used (Eq. 8) is similar to a canonical circuit proposed recently to implement numerous nonlinear operations in neural circuitry (Britten and Heuer 1999; Kouh and Poggio 2008; Sclar et al. 1990). At one extreme (q ≫ 1), our version produces a max-like operation, choosing the largest of the pooled responses. At the other extreme (q = 1), this scheme is equivalent to linear pooling. This parameterization is appealing because these two extremes represent well-established forms of neuronal pooling used in different circuits in the brain (Parker and Newsome 1998). We found that the relationship between MT activity and behavior throughout training was consistent with a pooling scheme using a value of q ≃ 2, implying an expansive nonlinearity that tends to emphasize the largest responses in the pooled signal. This value is also consistent with pooling models that include a squaring term, like that found in several models of motion processing (Adelson and Bergen 1985; Reichardt 1986; Rust et al. 2005).

There are several possible sources of this kind of nonlinearity. On the level of single neurons, such nonlinearities can result from noise in the value of the membrane potential that affects spike generation (Anderson et al. 2000; Miller and Troyer 2002). Populations of neurons are also thought to implement similarly complex transformations between input and output, possibly resulting from interneuronal correlations that might depend on signal strength (de la Rocha et al. 2007; Shamir and Sompolinsky 2004). Numerous neural maps, especially in visual cortex, are also thought to implement various forms of divisive normalization as found in Eq. 8, possibly via shunting inhibition, that might act as a nonlinear transformation of the pooled signal (Heeger 1993).

Relationship to other studies of perceptual learning and attention

The motivation for this study comes from analyses of the slope of the psychometric function under probability summation. Probability summation assumes that detection of a signal occurs when it is detected by one of several independent channels (Graham 1985). Several further assumptions—including the idea that only signals, and not noise, are detected (the “high-threshold” assumption) and that the channels are homogeneous—have led to widespread use of the cumulative Weibull psychometric function to describe performance on tasks thought to involve spatial or temporal probability summation (Quick 1974; Watson 1979). Under these assumptions, this is the only function for which the shape of the psychometric function describing performance accuracy as a function of stimulus strength using any one channel is identical to that describing performance for many channels, with only its horizontal position (i.e., threshold) affected by changing the number of channels (Green and Luce 1975).

However, it has been noted that empirical psychometric functions do not always conform to the assumptions that underlie the use of the cumulative Weibull to describe probability summation. For example, changes in psychometric slope and threshold appear to vary with the “guessing” parameter, one of several lines of evidence discrediting the high-threshold assumption (Green and Swets 1966). Second, psychometric functions can have quite different slopes for different tasks and different subjects, discrediting a strong assumption of homogeneity for all channels (Nachmias 1981). Thus more recent uses of the cumulative Weibull function probably have more to do with the fact that they tend to fit empirical data well than for their analytic appeal under assumptions of probability summation (e.g., Britten et al. 1992).

Nevertheless, more realistic assumptions about the heterogeneity of neural channels and the role of signal and noise in perceptual processing has led to a reinterpretation of the shape of the cumulative Weibull and other psychometric functions (Kontsevich and Tyler 1999; Pelli 1985, 1987; Tyler and Chen 2000). Specifically, psychometric slope is sensitive to the number of uninformative channels that are combined with informative ones via probability summation to make the perceptual decision. This phenomenon has been interpreted in terms of either uncertainty about or lack of attention to the appropriate signal-driven channels. Here we showed that a similar phenomenon might underlie some forms of perceptual learning, in which improvements in perceptual sensitivity involve an increasingly selective readout of the most informative sensory neurons (Jacobs 2009; Law and Gold 2009).

Several models have been proposed to implement such a dynamic readout scheme. For example, feedback-driven changes in linear weighting functions that control the input to decision-making mechanisms can account for several forms of visual perceptual learning (Law and Gold 2009; Petrov et al. 2005). However, the explanatory scope of such schemes remains unclear. For example, several forms of perceptual learning, including those involving fine discriminations between similar features like nearby directions of motion or orientations, have been shown to involve at least some changes in the tuning properties of sensory neurons thought to provide the sensory evidence for the decision in monkeys and humans (Crist et al. 2001; Furmanski et al. 2004; Maertens and Pollmann 2005; Raiguel et al. 2006; Schoups et al. 2001; Schwartz et al. 2002; Walker et al. 2005; Watanabe et al. 2002; Yang and Maunsell 2004; Yotsumoto et al. 2008). Moreover, other forms of perceptual learning do not appear to require the kind of explicit feedback used in the readout models (Watanabe et al. 2001, 2002), although internal evaluations and task-irrelevant feedback might help to drive learning (Herzog and Fahle 1998; Seitz and Watanabe 2003; Seitz et al. 2009). More work is needed to determine how these results relate to the kind of dynamic readout scheme described herein.

The forms of perceptual learning that do involve changes in readout like those we describe appear to be closely related to some mechanisms of selective attention (Ahissar and Hochstein 1993; Dosher and Lu 1999; Ito et al. 1998; Yu et al. 2004). In particular, experience appears to shape the process of pooling the activity of a population of sensory neurons that, like feature-based attention, increasingly takes into account the most informative neurons for the given task (Maunsell and Treue 2006). When this pooling process includes particular kinds of nonlinear operations, these selective changes in readout seem to have identifiable effects on the slope of the psychometric function as perceptual sensitivity improves with training. These effects might serve as a useful diagnostic tool for identifying similar forms of selective pooling for other stimuli and task conditions.

GRANTS

This work was supported by National Eye Institute Grants EY-015260 and P30 EY-001583, the McKnight Endowment Fund for Neuroscience, the Burroughs-Wellcome Fund, and the Sloan Foundation.

ACKNOWLEDGMENTS

We thank L. Ding, A. Churchland, B. Heasly, P. Holmes, J. Nachmias, and C.-L. Teng for helpful comments and discussions and J. Zweigle, F. Letterio, M. Supplick, and A. Callahan for technical assistance.

REFERENCES

View Abstract