Our ability to make rapid decisions based on sensory information belies the complexity of the underlying computations. Recently, “accumulator” models of decision making have been shown to explain the activity of parietal neurons as macaques make judgments concerning visual motion. Unraveling the operation of a decision-making circuit, however, involves understanding both the responses of individual components in the neural circuitry and the relationships between them. In this functional magnetic resonance imaging study of the decision process in humans, we demonstrate that an accumulator model predicts responses to visual motion in the intraparietal sulcus (IPS). Significantly, the metrics used to define responses within the IPS also reveal distinct but interacting nodes in a circuit, including early sensory detectors in visual cortex, the visuomotor integration system of the IPS, and centers of cognitive control in the prefrontal cortex, all of which collectively define a perceptual decision-making network.
Perceptual decision making is a fundamental aspect of cognition in which sensory information provides the basis for the selection of one among many possible actions. In the simplest case, the link between sensory evidence and a behavioral choice is a binary one: an oncoming car suddenly veers into one's lane and a quick decision must be made—to swerve left or right. Much of our understanding of this critical ability to link sensation and action in the service of decisions is built on quantitative models of information processing in which evidence accumulates over time until a threshold indicating one or another choice is reached. Such “accumulator” models, including Ratcliff's “diffusion” model (Ratcliff 1978), can successfully account for the speed and accuracy with which subjects make simple binary decisions (Ratcliff 2006; Ratcliff and McKoon 2008).
Over the past ten years, work in nonhuman primates has revealed that such models may also explain the neurophysiological implementation of perceptual decisions. In a series of studies by Shadlen and others (Huk and Shadlen 2005; Roitman and Shadlen 2002; Shadlen and Newsome 2001), neuronal firing rates in the lateral intraparietal area (LIP) of macaques engaged in a dot-motion coherence task increased over time in proportion to the weight of the evidence for the two alternative directions and peaked at roughly the same magnitude for different levels of motion coherence. These results suggest that neurons in LIP integrate sensory input encoded by the motion-sensitive area MT (middle temporal) in a manner consistent with the accumulator models (Gold and Shadlen 2007). Consequently, such dot-motion stimuli, in concert with the accumulator model, provide a particularly compelling system for studying perceptual decision making in humans. Much is known of primate visual processing—and specifically of the responses of neurons within LIP, MT, and other areas implicated in decision making (e.g., Schall 2003)—thereby providing both greater control over and more constrained predictions of the responses within these areas in human functional magnetic resonance imaging (fMRI) studies. Recently, studies in humans have begun to search for evidence of accumulation using paradigms in which the quality of sensory inputs is varied (Forstmann et al. 2008; Heekeren et al. 2004, 2006; Philiastides and Sajda 2007; Philiastides et al. 2006; Ploran et al. 2007). Here we combine the motion-coherence paradigm developed in the macaque with the specific parametric manipulations possible in both the diffusion model and the dot-motion coherence paradigm to define the full visual decision-making network.
In the experiments described here, using this same dot-motion decision-making paradigm, we derived predictions for the differential blood oxygenation level–dependent (BOLD) responses in the presumed human homologues of macaque LIP within the intraparietal sulcus, including middle IPS (mIPS) (Astafiev et al. 2003; Grefkes and Fink 2005; Sereno et al. 2001; Snyder et al. 1997, 2000; Stark and Zohary 2008; Tosoni et al. 2008). Specifically, the average rate at which sensory evidence accumulates to threshold (the “drift rate”) should vary inversely with integrated neural activity, for the simple reason that slower drift rates lead to longer integration times. Because the hemodynamic response should correlate with summed neural activity, the BOLD signal should vary inversely with the average drift rate. We therefore predicted that if mIPS functions as an accumulator in humans, BOLD responses in mIPS should show an inverse relationship with increasing motion coherence and that mIPS activity should decrease with the drift rate across trials in which motion coherence was held constant. Moreover, again based on the diffusion model (Ratcliff and McKoon 2008), we predicted that activity in this area should be greater for errors than that for correct responses.
We were able to confirm each of these hypotheses, lending further experimental support to the role of accumulator models in the explanation of perceptual decision making. In addition, we demonstrated that mIPS functions in the context of a wider network of areas involved in the decision process. By using the parametric mIPS response as a signature of task-related activity, we were able to fractionate BOLD responses in additional regions, such as the middle frontal gyrus (MFG) and occipital pole, by their sensitivities to sensory (i.e., motion coherence) and behavioral (i.e., reaction time and accuracy) variables. In so doing, we were able to preliminarily locate them, with mIPS and MT+, along a sensory–integrative–motor continuum involved in the production of a simple decision.
Six subjects (ages 23–37 yr; four males) participated in the study and gave written informed consent in accordance with the Committee for the Protection of Human Subjects at the University of California, Berkeley. All subjects had normal neural anatomy, were right-handed, and had normal or corrected-to-normal vision. Before scan sessions, subjects were trained on the task for a minimum of four 1-h sessions to reduce both the number of invalid trials (see following text) and learning effects in the scanner. Once trained, all subjects underwent five 1-h fMRI sessions, each of which consisted of four runs of 70 trials for a total of 4 × 70 × 5 = 1,400 trials per subject.
Subjects performed a visual dot-motion task in which they viewed coherent dot motion on a background of randomly moving dots. They were required to identify the direction of motion as quickly and accurately as possible. A trial began with an interstimulus interval during which subjects fixated a central cross for 4,000–12,000 ms (Fig. 1). At the end of the interval, the fixation cross faded and the stimulus appeared for 2,500 ms. Both dot-motion coherence and the direction of motion (either leftward or rightward) were consistent throughout the trial. To indicate their choice, subjects made a left or right button-press before the end of the 2,500-ms interval. At the end of the stimulus, the central fixation cross returned to its baseline contrast over 500 ms and the next interstimulus interval began.
Of note, the stimulus persisted for the entire interval, regardless of the timing of the subject's response, to avoid otherwise perfectly confounding our dependent measures (i.e., BOLD signal) with stimulus duration. This design permitted us to distinguish brain areas whose activity was more sensitive to the timing of the response, from areas whose activity was more sensitive to the duration of the stimulus. We note that this design does not strongly constrain postdecision activity, although a recent primate experiment in which stimulus duration was controlled by the experimenter has begun to address this issue (Kiani et al. 2008). Moreover, our subjects responded via button-press, not saccade; the effect of response modality on accumulation-related spiking activity remains an open question, not evaluated here (but see Heekeren et al. 2006).
In keeping with a previous study (Palmer et al. 2005), subjects were trained to maintain an eye position within 3° of visual angle of the central fixation cross, to refrain from blinking, and to make a button-press response during the 2,500-ms window. By ensuring fixation and blink inhibition during trials, we attempted to reduce the possibility that BOLD responses were related to either eye movements or a lack of visual input. Eye-movement data from previous experiments for three subjects (101–103) were acquired at the Neuroimaging Center at the UCSF Medical Center with an ASL Eye-Trac 6 LRO (http://www.a-s-l.com). Eye-movement data for another two subjects (105 and 106) were collected during preliminary scan sessions at the same facility. Another subject (104) had undergone extensive eye-movement training at a different facility for experiments not conducted by the current authors and was not further evaluated here.
Because our current scanner configuration at the U.C. Berkeley Imaging Center did not permit us to monitor eye movements during the scan sessions themselves, relatively stringent criteria were used to train subjects outside the scanner to maintain fixation and to avoid blinks. Blinks were classified as any instances in which the pupil aspect ratio was equal to zero for >8.3 ms. Eye movements were defined as any period lasting >180 ms in which the eye position was >3° from fixation. Finally, eye position was correlated with the displayed direction of motion for each trial to ensure that even subthreshold eye movements were not correlated with the direction of stimulus motion. By these criteria, the subjects performed quite well (Table 1).
For fMRI sessions, the ordering of dot-motion trials was computed for 20 separate sessions using OptSeq (http://surfer.nmr.mgh.harvard.edu/optseq/) (Dale 1999). Each motion stimulus was presented for 2,500 ms, regardless of subject response time. Stimuli were programmed in Matlab in the PsychToolbox environment (Brainard 1997; Pelli 1997), adapted from code originally written by McKinley and Shadlen and downloaded from the PsychToolbox website (http://psychtoolbox.org/PTB-2/). Dot-motion frames were presented within a 7.5° aperture at 60 frames/s (fps). Dot density was fixed at 16.7 dots·deg−2·s−1 and dot velocity was fixed at 5°/s. To avoid “blurring” effects from dot motion, the set of dots was broken into three subsets, interleaved such that the dot positions for only one subset were updated for each display frame. (This interleaved presentation, however, was not perceptible to subjects due to the rapid frame rate.) Dot-motion coherence was set at 0, 2, 4, 8, 16, 32, or 64% for a given trial. Actual coherence for a single display frame was determined by sampling from a uniform distribution, which was then thresholded by the coherence of the current trial to yield an integral number of coherently moving dots for each frame. All other dots (i.e., the complement of the coherently moving subset) were given a random motion direction, from 0 to 360°. Over the course of many display frames in a given trial, the mean coherence across all frames approached the desired coherence for the trial, with an empirically determined SD of 1.4% for the full 2,500 ms across all coherence levels. For each frame, the subset of coherently moving dots was randomly selected from the full dot set. Thus the percept of motion was distributed across the full dot set instead of being confined to a particular subset of dots.
To ensure that additional cues related to directionality or coherence were not present in our stimuli, we placed additional constraints on dot position. Dots were initially located randomly (based on a uniform two-dimensional distribution) throughout the viewing window. To ensure an even starting distribution of dots throughout the aperture, we divided the viewing window into 16 quadrants and counted the number of dots within each quadrant. If the counts within the quadrants showed a ≥95% chance of deviating from the expected chi-square distribution, a new random placement of the dots was generated. This process was repeated until the above constraint was satisfied. Additionally, to ensure that motion energy was uniform throughout the viewing window, each dot was constrained to move the same distance from frame to frame (but in a random direction for those dots not part of the coherently moving subset). Finally, if the motion vector applied to a dot placed it outside the plot area, it was repositioned in a random position along the opposite side. This boundary condition prevented dots from collecting in any particular region of the aperture over time.
Following the work of Palmer and colleagues (2005), we fit the behavioral data in this experiment with a proportional rate-diffusion model. This model provided a principled way of fitting both reaction time (RT) and accuracy data with a single set of parameters, thereby simultaneously constraining the fit of both behavioral variables and providing a parsimonious explanation for the data. As compared with Ratcliff's model, the model of Palmer and colleagues provides a further constraint, in that, by tying the drift rate linearly to the motion coherence level, it requires these same parameters to fit all the behavioral data, at the expense of modeling the full RT distribution at each level (Palmer et al. 2005; Ratcliff and McKoon 2008). In this special case of Ratcliff's model, the data are thus described by three parameters: 1) A′, bearing on the decision threshold; 2) k, a constant describing the relationship between the motion coherence in the stimulus and the mean drift rate in the model; and 3) tR, the mean residual time in seconds, representing a fixed processing duration independent of evidence accumulation (e.g., for low-level sensory processing or implementation of motor commands). Accuracy and RT data for each subject were used as input. Accuracy was calculated as the percentage of correct responses for all trials in which the subject made a response, whereas only RTs from correct responses were analyzed, in accord with Palmer and colleagues. Parameters were determined by an iterative procedure in which the maximum likelihood of multiple parameter sets was determined at each step. Once the most likely parameter set for a given iteration was determined, that set was chosen as the center of a smaller but more finely sampled search space. Ten steps of this procedure sufficed to identify a well-fitting model. The mean values for A′, k, and tR determined by Palmer et al. were used as the starting point for the process, but the final parameter values were not dependent on a precise initial condition (data not shown).
To generate predictions for the relationship between beta values and RT for IPS, we assumed that the BOLD response could be estimated by the integral under the curve for the mean diffusion rate. Although there exist more complex and realistic models of the link between neural activity and BOLD response (reviewed in Buxton et al. 2004), the above-cited formulation embodies the simple assumption, consistent with the more complex models, that this relationship is monotonic. To generate predictions for the BOLD response of area MT+, we applied the work of Britten et al. (1993) as incorporated in the work of Rees and colleagues (2000) in humans and the recent modeling work of Niwa and Ditterich (2008). In short, the population response across the 360° representation of motion direction for 0% motion coherence was assumed to be untuned and the population response for 100% motion coherence was assumed to be a well-tuned Gaussian. To model intermediate motion coherences, we linearly interpolated between these two endpoints across each motion direction, in keeping with data describing the linear component of the change in firing rate across motion coherence at preferred and null directions (Britten et al. 1993). To model the effect of changes in motion coherence at off-preferred orientations, we took advantage of the fact that Britten and colleagues measured a minimal, slightly negative change in tuning curve bandwidth with increasing motion coherence (Britten and Newsome 1998). With knowledge of these three empirically derived linear components—the changes in preferred and null firing rates across motion coherence and the change in tuning curves across motion coherence—we were able to generate a linear estimate of the space of MT responses, plus/minus a constant baseline firing rate (see following text). Because the stimulus presentation time was always 2,500 ms, time served only as a linear scaling factor in this simple formulation and could be omitted. Additionally, the responses of MT+ are known to be more complex—e.g., due to attentional effects (Treue and Maunsell 1996), sublinear summation (Britten 1999), and stimulus adaptation (Kohn and Movshon 2004), among other factors—but the above-cited parameters provided an empirical account of the data that sufficed to generate testable predictions within the scope of the current experiment.
For this simple formulation, the condition defining an unchanging neural response across motion coherence could be determined for the maximum spike rate response to random motion (Rrandom), the maximum spike rate response to motion in the preferred direction (Rpref), and the full-width at half-maximum (FWHM) of the population response in MT+ (FWHMPOP–MT+) For example, setting a maximum spike rate (Rrandom) of 10 spikes/s above the baseline for the response to the random noise, and a maximum spike rate (Rpref) of 40 spikes/s above the baseline for the response to each cell's preferred motion direction, based on Britten and colleagues (1993) and on a personal communication from Britten noted in the work of Rees and colleagues (2000), a FWHMMT+ of 67.6° divides the parameter regime between monotonically decreasing (FWHMMT+ <67.6°) and monotonically increasing BOLD responses. [Note that we implicitly use a baseline of zero spike/s. With this background rate, further suppression—and therefore increasingly smaller responses—at high motion coherences for those neurons tuned around the null direction would not be possible because of the rectification nonlinearity. However, the linear estimates provided by Britten and colleagues are such that neural responses reach (exactly) zero only at 100% motion coherence.] Rees and colleagues used a value of about 53° for the FWHM of the MT+ population response, one that in our model would argue for monotonically decreasing responses. Further assuming, as they do, an independent error of ≤30% in the measurement of each of these parameters, a monotonically decreasing result was found throughout 80% of the parameter space. Larger/smaller estimates for the FWHM of the population response and its estimation error will correspondingly decrease/increase the percentage of the parameter space in which this monotonically decreasing result is seen.
MRI scanning was conducted on a Siemens MAGNETOM Trio 3T MR Scanner at the Henry H. Wheeler Jr. Brain Imaging Center at the University of California, Berkeley. Anatomical images consisted of 160 slices acquired using a T1-weighted magnetization-prepared rapid acquisition with gradient echo (MP-RAGE) protocol (repetition time [TR] = 2,300 ms, time to echo [TE] = 2.98 ms, field of view [FOV] = 256 mm, matrix size = 256 × 256, voxel size: 1 × 1 × 1 mm). Functional images consisted of 24 slices acquired with a gradient echoplanar imaging protocol (TR = 1,370 ms, TE = 27 ms, FOV = 225 mm, matrix size = 96 × 96, voxel size: 2.3 × 2.3 × 3.5 mm). A projector (Avotec SV-6011, http://www.avotec.org/) was used to display the image on a translucent screen placed within the scanner bore behind the head coil. A mirror was used to allow the subject to see the display. The distance from the subject's eye to the screen was 28 cm.
fMRI functional images were converted to 4D NIfTI format and corrected for slice-timing offsets using SPM5 (http://www.fil.ion.ucl.ac.uk). Motion correction was carried out with the AFNI (Analysis of Functional NeuroImages) program 3dvolreg with a reference volume specified as the mean image of the first run in the series. Images were then smoothed with a 6-mm FWHM Gaussian kernel. Coregistration was performed with the AFNI program 3dAllineate using the local Pearson correlation cost function optimized for fMRI to structural MRI alignment. The inverse transformation was then used to warp the high resolution MRI to the functional image space, after which it served as an anatomical underlay for the display of statistical parametric maps.
Cortical surface generation and intersubject registration
Freesurfer version 4.0 (Dale et al. 1999; Fischl et al. 1999) was used to create cortical surfaces from each subject's high-resolution MRI. Each surface mesh was inflated to a sphere and registered to a spherical template representing the average sulcal and gyral curvature across a sample of normal brains. The AFNI program MapIcosahedron was then used to resample each subject's spherically registered surface mesh onto a regularly sampled icosahedron to achieve a one-to-one mapping between the nodes of each subject's spherically aligned surface. Volumetric data could then be mapped for each subject to this standard regularly sampled surface and group statistics computed for every node on the mesh.
fMRI data analysis
ANALYSIS OF PARAMETRIC EFFECT OF MOTION COHERENCE.
A series of voxelwise fMRI statistical analyses, each with a different aim, were carried out for each subject using the general linear model (GLM) framework implemented in the AFNI program 3dDeconvolve. To assess the overall effect of motion coherence, the seven levels of coherence (0, 2, 4, 8, 16, 32, and 64%) and two levels of direction (left, right) were modeled with separate regressors, each of which was derived by convolving a gamma probability density function (peak = 6 s) with a vector of stimulus onsets for each of the conditions. Tests of linear trends were carried out using the appropriate contrast vectors (linear vector: [−0.32, −0.28, −0.25, −0.17, −0.04, 0.25, 0.81]) applied to the estimated beta coefficients in each voxel for each motion-coherence level. The resulting t-statistics were mapped to each subject's spatially normalized cortical surface and group level analyses were performed.
Because we collected a large amount of data on a relatively small number of subjects, statistical power was relatively weak at the group level (when using a “random effects” approach) relative to the single-subject level. Thus for the purposes of a group activation summary, we assessed significance using a fixed-effects summary statistic. We computed a t-statistic for the linear contrast for every node on the surface and divided this value by the square root of the number of subjects (n = 6), which was compared against a standard normal null distribution (McNamee and Lazar 2004) using an alpha value of P < 0.0001 for the full group. To ensure that this group map did not obscure inconsistent responses across subjects, we also evaluated parametric responses on a single-subject level (Supplemental Fig. S1).1 In contrast to the whole-brain group summary analyses, all statistics performed on region-of-interest (ROI)–extracted data were submitted to random effects t-test or repeated-measures ANOVA on independently defined voxels (see following text).
COMPARISON OF CORRECT VERSUS INCORRECT TRIALS.
Conditions for which there were significant numbers of errors (>20) were split into correct and incorrect sets. Over the course of the entire experiment, all subjects had >20 errors for 2 and 4% motion coherence and all subjects but one exceeded 20 errors for 8% motion coherence. Conditions for which the number of errors did not meet this criterion were modeled separately to exclude them from contributing to the baseline activity, but they were not further addressed in this analysis. A comparison between correct and incorrect trials was computed as a contrast between the estimated coefficients for correct and incorrect regressors, averaged across motion coherence. For the one subject with <20 error trials for the 8% condition, the analysis was performed in the same way except that correct and incorrect trials for only 2 and 4% motion coherence were compared.
ESTIMATION OF BOLD TIME COURSES.
To estimate the temporal profile of the hemodynamic response across different levels of motion coherence and in different regions of the brain, we used a deconvolution approach with (“tent”) basis functions convolved with the stimulus event onsets (Saad et al. 2006). This method allowed for an unbiased estimate of the time course of the fMRI response for each of the seven motion coherences. Note that stimulus onset times were not locked to the onset of the TTL pulse. Consequently, we were able to sample the time courses at multiple points (i.e., not solely at integer values of the TR), which permitted us to estimate these average BOLD responses across the experiment at a time resolution much smaller than the TR itself. The deconvolution procedure was performed separately for each scanning run due to computer memory constraints, yielding one time course estimate for each condition and each run. In subsequent ROI analyses, all of the estimated time courses were concatenated, averaged within the ROI, and then fit with a locally smooth polynomial regression estimator (Loader 1999). The use of a flexible regression fit avoids assumptions that the shape of the hemodynamic response must remain consistent across varying reaction times. Time to peak was identified as that time corresponding to the maximum amplitude of the response between 4 and 10 s after stimulus onset.
CORRELATION BETWEEN BOLD ACTIVITY AND REACTION TIME.
To assess the relation between RT and trial-to-trial variation in the hemodynamic response, we carried out a robust linear regression analysis (Venables and Ripley 2003) in which each trial in the experiment was modeled with a separate regressor, yielding 200 beta coefficients for each level of coherence. The relation between BOLD activity and RT was then estimated by regressing the 200 beta coefficients against the corresponding set of RTs for each level of motion coherence. This approach yielded two parameter estimates for each level of coherence: 1) an intercept that represented the estimated BOLD effect when RT = 0 and 2) a slope, or the estimated change in the BOLD effect per unit change in RT. Using these parameters, we were then able to estimate the relationship between RT and BOLD activity across the duration of the stimulus presentation (i.e., from 0 to 2,500 ms).
DEFINITION OF ROIS.
To avoid a selection bias in the definition of ROIs, a subset of the fMRI data was separately analyzed for the exclusive purpose of defining independent functional ROIs. Regions were defined on the normalized spherical surface as connected clusters of activity showing a significant (P < 0.005) inverse effect of motion coherence across the group of six subjects for runs 1, 7, 13, and 20 (one from each day of testing). Because the region of the IPS formed a large connected cluster of activity, extending ventrally into the occipital lobe and dorsally into the parietal lobe, the IPS was subdivided into anterior, medial and posterior subdivisions following the criteria of Stark and Zohary (2008). The anterior IPS was defined as the anteriormost third of the sulcus, the medial IPS was defined as the dorsalmost half of the posterior portion of the IPS, and the posterior IPS was defined as the ventralmost half of the posterior portion of the IPS (see Table 3 in the following text for centroids in Montreal Neurological Institute [MNI] space). Other ROIs (e.g., MT+ and anterior insula) were defined as the intersection of the functional activations with known neuroanatomy. Additionally, for MT+ and IPS we evaluated ROI coordinates determined by a number of independent studies (see results). Once ROIs were formed, they were reverse normalized from the cortical surface to each subject's native volumetric space using the AFNI program 3dSurfToVol. For each reverse normalized ROI volume, masked to include only voxels demonstrating a positive main effect of task, the voxel with the peak (negative) t-statistic for the subject's motion coherence contrast, and the eight additional most significant voxels within a 6-mm radius of the peak, were taken as the voxels of interest for each region, and applied only to the 16 runs not used to define the ROIs to ensure independence of the ROI definitions and the data tested. This method ensured that the same number of voxels was used for each ROI. Two additional ROIs that showed a main effect of stimulus, but no reliable effect of motion coherence—the occipital pole (OPOLE) and left motor cortex (M1)—were defined using the main effect localizer contrast. However, for the sake of consistency and for comparison with the other ROIs, the voxels of interest within these two main effect-defined ROIs were also selected with reference to the peak (negative) t-statistic for the motion coherence contrast.
FUNCTIONAL CONNECTIVITY ANALYSES.
To evaluate the correlation between MT+ and other brain regions across all motion coherence values, the series of trialwise beta estimates for the ten voxels identified as before were averaged together to produce a reference beta series. This vector was subsequently correlated with the beta series for every other voxel in the brain (Rissman et al. 2004). To eliminate any spurious effects of different motion-coherence levels on the computed correlation, all beta values for each motion coherence were separately normalized to a mean of zero and SD of one prior to computing the correlation. We then applied Fisher's r-to-z transform to the correlation values and divided by the square root of the variance (equal to the reciprocal of the number of degrees of freedom minus 3) to produce Z-scores for every voxel in the brain.
To search for regions whose connectivity varied with motion coherence, we computed the same values as before for every brain voxel, but separately for each motion coherence. For each voxel, we then weighted the seven resulting r-to-z transformed correlation values (one for each of the motion-coherence levels) by the same contrast vector used in the GLM analyses and divided by the square root of the weighted variance to produce Z-scores (Polk et al. 2007).
To better understand the neural mechanisms underlying a simple and well-studied perceptual decision-making task, we acquired fMRI data from six subjects performing a dot-motion directional-discrimination task. As described in methods and in Fig. 1, stimuli were randomized across seven different motion coherence levels (0, 2, 4, 8, 16, 32, and 64%) and across two response directions (left and right) and displayed in a jittered, event-related design for 2,500 ms each. Highly trained subjects were instructed to press one of two response buttons to indicate whether the motion was leftward or rightward. No performance feedback was provided during the scanning session. Subjects completed a total of 20 scanning runs, each consisting of 70 trials, for a total of 1,400 trials divided evenly across the seven coherence levels.
Behavioral data for all subjects can be seen in Fig. 2. In keeping with previous work in both humans and macaques, accuracy improved and reaction time declined as motion coherence increased. Across the group of subjects, accuracy varied somewhat for the most difficult conditions (2% motion coherence: mean proportion correct = 0.63, min = 0.60, max = 0.70), but was significantly greater than chance for all nonzero motion coherences.
Subject behavioral data were fit to a modification of Ratcliff's diffusion model (Ratcliff and McKoon 2008) described by Palmer and colleagues (2005) and presented in Fig. 3A. Depicted in gray is a hypothetical sample path of accumulating evidence taken from a single “trial” of the diffusion model. Starting at zero (indicating no bias for either decision 1, represented by threshold T, or decision 2, represented by threshold −T), the evidence for decision 1 increases with some noise around a mean velocity vector, shown in black, that represents the so-called drift rate. Once the evidence reaches T, a decision is made. If we assume that the drift rate v is proportional to motion coherence m (v = km, where k is known as the sensitivity), the data can be fit with a simplified diffusion model consisting of the following three parameters: A′, the normalized threshold; k, the sensitivity; and tR, the mean residual time for all nondecision related processes (Palmer et al. 2005). Our data were well described by this “proportional rate” diffusion model. The values of each of the parameters for each of our subjects, as well as the negative log-likelihood Lp of the fits, are shown in Table 2.
fMRI predictions and analysis
The proportional rate diffusion model also permitted us to make predictions about expected BOLD signal change in task-responsive areas. In previous work in the macaque by Shadlen and others (Gold and Shadlen 2007), the diffusion model described not only the performance of the monkey, but also the spiking behavior of single neurons in LIP. With fMRI, however, spatial or temporal resolution is not sufficient to acquire such data. Instead, each fMRI voxel records BOLD activity over several cubic millimeters of brain tissue and thus measures not the instantaneous firing rate of a single neuron, but rather the aggregate activity of many thousands of neurons (Rainer et al. 2001) integrated over several seconds and filtered by the hemodynamic response function. Under this assumption, how would the BOLD response in a putative “accumulator” region, such as the IPS, change as a function of motion coherence in the present task?
We made the simple prediction that the BOLD response should be monotonically related to the summed neural activity, as we and others assume (Anderson 2007; James and Gauthier 2006) and as accords with more detailed models of BOLD responses (Buxton et al. 2004). In the intraparietal sulcus (IPS), which contains the presumed homologues of areas such as the macaque lateral intraparietal area (LIP) (Grefkes and Fink 2005), we hypothesized further that neurons should show responses consistent with the summed neural activity predicted by an accumulation process. Under this hypothesis, the neural response for trials of high motion coherence (e.g., 64%) should rise to threshold quickly—due to the fast accumulation of sensory evidence—whereas for trials of low motion coherence (e.g., 2%) the accumulation process should take longer to reach threshold (Fig. 3B, top). However, the integral under each of the curves (i.e., the summed neural activity, represented by the semitransparent filled areas) is largest for low coherence stimuli and smallest for high coherence stimuli. As a result, the mean BOLD response in the mIPS is predicted to vary inversely with the degree of motion coherence, as demonstrated in the bottom panel of Fig. 3B. It should be emphasized that the direction of this effect is the opposite of that observed in single-unit recordings from area LIP data, where instantaneous firing rate, not integrated neural activity, is the relevant measure.
However, such a prediction needs to account for the population response. Why, for example, should the responses of neurons whose activity increased with a particular direction of motion not have been offset by neurons whose activity decreased, leading to a minimal overall response? Macaque neurophysiology demonstrates that responses of LIP neurons show two effects: a modest initial asymmetry in their firing rate responses that favors the preferred over the antipreferred direction (Churchland et al. 2008; Huk and Shadlen 2005; Kiani et al. 2008; Roitman and Shadlen 2002; Shadlen and Newsome 2001) and a more pronounced increase for the preferred direction that is significantly greater just prior to the time of the motor response (Churchland et al. 2008; Huk and Shadlen 2005; Kiani et al. 2008; Roitman and Shadlen 2002; Shadlen and Newsome 2001). For 54 neurons represented in Fig. 7A of Roitman and colleagues, for example, firing rates for the preferred motion stimulus show a larger difference with respect to the prestimulus firing rate of about 35 spikes/s than do the antipreferred firing rates (see also Fig. 9 in Shadlen and Newsome; Fig. 8 in Kiani and colleagues; a single representative neuron in Fig. 5 of Huk and colleagues; and the asymmetry in buildup rates for neurons recorded by Churchland and colleagues in their Fig. 4). Similarly, when measured at and within 200 ms before the time of the motor response (typically, a saccade), these differences are amplified. In Fig. 8 of Kiani and colleagues, for example, firing rates are about 15 spikes/s greater than the prestimulus spike rate of about 30 spikes/s for the preferred direction, compared with a decline of no >5 spikes/s for the antipreferred direction (see also Fig. 7C in Roitman and Shadlen; a single neuron in Fig. 5 of Huk and colleagues; and Fig. 5 in Churchland and colleagues). Moreover, these findings are consistent with computational studies (Beck et al. 2008; Wang 2002; Wong 2006), including studies that model the population response of LIP (Beck et al. 2008). Based on these modeling reports, the initial asymmetry may relate to a rectification nonlinearity—i.e., the fact that firing rates can decline to no lower than 0 spike/s, but can increase by greater amounts—and the latter potentially due to attractor dynamics that lead to selection of only the favored direction. (The former effect may even be enhanced if prestimulus firing rates have not been primed by a saccade target, as in our study and unlike that in most macaque studies.) Because of limitations in the time resolution of MRI, both responses will be incorporated into the BOLD signal, with a likely more notable effect of the latter. Thus the BOLD response was predicted to be positive for all motion coherences, but to vary in a negative parametric fashion with motion coherence, based on the model shown in Fig. 3A.
We next predicted that areas involved in the sensory analysis of motion stimuli, but not evidence accumulation per se, would show a similar pattern—but for different reasons. For human area MT+, whose macaque homologue provides inputs to LIP (Shipp and Zeki 1995), primate neurophysiology offers some predictions. Based on data reported by Britten and colleagues (1993) and discussed by Niwa and Ditterich (2008), we modeled neurons in MT+ representing 360° of motion direction preference, where the expected normalized firing rate response is demonstrated for a leftward motion stimulus for coherences ranging from 0 to 100% (Fig. 3B, top right). At 0% motion coherence, all cells fire weakly in response to the pure noise stimulus. At 100% motion coherence, there is a gradient of responses, with high firing rates for those few cells tuned to directions near 180°.
For parameters (see methods) taken directly from a previous fMRI study of BOLD responses to motion (Rees et al. 2000), this pattern of responses gives rise to the expected integrated response shown in Fig. 3B (bottom right panel). We make the assumption that the voxel measurements from fMRI for area MT+ are very likely to represent a pooled response of neurons tuned to many motion directions. Consequently, the individual neural responses at low coherence will be low, although many neurons will be active. At high coherence, the maximal neural response will be high, but responses will be limited to a smaller number of cells, proportional to the tuning bandwidth of the population of MT+ neurons. Just as for IPS, these parameters predict that the BOLD response should decrease with increasing coherence; but contrary to the sigmoidal relationship predicted for mIPS, the shape of the predicted MT+ curve is convex (representing the underlying linear relationship with firing rate, as plotted in logarithmic coordinates).
This model of MT+ is of course quite reduced. Although based on previously published parameters and models, its behavior simply reflects the fact that the spatiotemporally pooled neural activity decreases as the motion coherence increases. Other factors not accounted for here could alter the behavior of the model. For example, if the total neural activity in MT+ were homeostatically constrained to a particular level, any effect of spatial pooling would be eliminated, thereby leaving changes in neural activity dependent on differences in the duration of neural firing with changes in motion coherence. Nonlinear changes in neural activity with response magnitude, such as those due to surround suppression (Huang et al. 2008; Rust et al. 2006), could likewise have notable effects not modeled here, although the (linear) estimates were empirically derived. Increases in the width of the population tuning curve taken from Rees and colleagues (2000) would also produce more positive population responses; we note that Britten and Newsome (1998) found substantially larger tuning widths that would lead our model to produce primarily monotonically increasing responses. We opted for the simple model based on linear estimates demonstrated by others to describe MT responses, but we acknowledge that further detail based on the neurophysiology of human MT+ might alter it.
Correlations of BOLD activity with motion coherence
To test for correlation between dot-motion coherence and the BOLD response, we performed a linear trend analysis on the estimated regression coefficients for each of the seven coherence levels (0, 2, 4, 8, 16, 32, and 64%). Because we also predicted positive responses in all of the regions due to the simple presence of the visual stimulus, we inclusively masked the result of the trend analysis with the main effect of task to reveal only those regions demonstrating a correlation with coherence and a positive BOLD response to the visual motion stimulus. As predicted, along the length of the IPS (divided into posterior, medial, and anterior regions—pIPS, mIPS, aIPS—based on Stark and Zohary 2008), the BOLD response was seen to increase with decreasing motion coherence. In MT+ and several frontal regions, including the frontal eye fields (FEFs), the middle frontal gyrus (MFG), the anterior cingulate cortex/supplementary motor area (ACC/SMA), and anterior insula (aINS), a similar inverse variation was noted. A full listing of areas showing a negative correlation with motion coherence is presented in Table 3 and a surface rendering of the group average (not itself used to generate ROIs; see methods) is shown in Fig. 4. [The locations of these areas are strongly consistent with previous results. The peak of the MT+ activation, for example, lies within an average of 10.2 mm of previous MT+ coordinates (Luks and Simpson 2004; Sunaert et al. 1999; Tootell et al. 1995; Zeki et al. 1991). Similarly, the anterior, middle, and posterior IPS activations lie within an average of 13.8, 17.6, and 20.6 mm of independently determined coordinates summarized in Culham et al. (2006) and Stark and Zohary (2008)).]
Although many other regions showed a positive correlation with coherence (see Supplemental Figs. S1 and S2), such as the precuneus and lateral parietal cortex, none of these regions showed both a positive main effect of task and an increasing effect of coherence. As first described by Tosoni and colleagues (2008), a region previously found to vary directly with the sensory evidence for the decision—the superior frontal sulcus (Heekeren et al. 2004, 2006)—did not show a positive main effect of task in the current study.
Regional differences in peak and magnitude of BOLD response
A follow-up region of interest (ROI) analysis was performed to examine the precise pattern of effects observed across the seven levels of coherence. Five main ROIs—the occipital pole (OPOLE), MT+, the middle IPS, MFG, and M1 (Fig. 5A)—were chosen on the basis of their presumed contribution to different aspects of the decision task along the continuum running from sensory analysis to decision. Our evaluation of middle IPS was based on a tool-grasping study by Stark and Zohary (2008), suggesting that posterior IPS is more involved in the representation of visual location of the tool, whereas more anterior and middle regions represent the contralateral hand. In the current study, responses were made via button press (although others have used different response modalities; Heekeren et al. 2006; Tosoni et al. 2008). Three of these five ROIs were derived from a linear trend analysis (MT+, mIPS, MFG; see methods), whereas OPOLE was chosen as a low-level sensory control based on the main effect analysis because it did not show a significant main effect of coherence and M1 was selected as a low-level motor control. (All ROIs, here and elsewhere in this study, were derived from independent data; see methods.) As illustrated by the contrast shown in Fig. 4 and as plotted in Fig. 5B, MFG, mIPS, and MT+ all show a significant negative parametric effect of motion coherence.
Another implication of the changes in response profile across brain areas concerns changes in timing, as opposed to magnitude, of the BOLD response. Those regions more closely tied to the response should show greater changes in the time-to-peak of the BOLD response than should those areas more tightly linked to sensory processing. The intuition behind this prediction arises from the nature of the task: because the stimulus is present for 2,500 ms, but a subject's responses can occur at any time within that window, activity in more action-oriented brain regions should better correlate with response timing. On the other hand, activity in more stimulus-oriented brain regions should better reflect the constant 2,500-ms stimulus duration, but vary in magnitude as a function of stimulus strength—i.e., motion coherence.
This gradient in the degree of the temporal shift in the BOLD peak is evident in the data. In Fig. 5C the relationship between the magnitude and peak of the BOLD response is plotted for each of the five ROIs across all seven levels of motion coherence. It can be seen from this plot that the ROIs differ in the degree of variance seen in the horizontal (time to peak) and vertical (BOLD magnitude) dimensions, respectively. To quantify the relative variance along these two dimensions, we computed the ratio between the standard deviation (SD) of the group-averaged time to peak and the SD of the group-averaged magnitude (noted as “P/H” in the following text). To account for different scales (seconds vs. percentage change) of the two variables, we first normalized the time-to-peak and magnitude values before computing the ratio.
For MFG, most of the variance in MFG BOLD activity is present in the horizontal dimension, representing the time to peak (P/H = 3.59). M1 and mIPS exhibit variance that is slightly more apparent in time to peak (M1: P/H = 1.68; mIPS: P/H = 1.38). In MT+ most of the variance is concentrated in the vertical dimension, representing BOLD magnitude (P/H = 0.54). In OPOLE, there is little variance in either time to peak or magnitude (P/H = 0.93). The corresponding average time courses across each level of coherence and each of the five ROIs are displayed in Fig. 5D (see Supplemental Fig. S3 for SEs of these estimates). In MFG the time to peak in the BOLD response is shifted forward in time as motion coherence decreases from 64 to 0% (peak shift from 64% → 0% = 1.91 s, P < 0.04). This effect is also present in mIPS (peak shift = 1.22 s, P < 0.007) and MT+ (peak shift = 0.53 s, P < 0.02), whereas in OPOLE the effect is negligible (peak shift = 0.08 s).
BOLD correlations with response time
Because the recorded response times (RTs) were highly (negatively) correlated with the level of motion coherence, RTs have little independent explanatory value with respect to changes in BOLD activation across levels of coherence. Nevertheless, changes in RT within a given coherence offer a window into the stochastic variation in the decision-making process. On one extreme (as noted earlier), we might expect that activation in a purely sensory region should be poorly correlated with RT, due to the constant 2,500-ms visual input. At the other extreme, a purely “decision” or action-oriented region should exhibit a level of activity that is much more dependent on the amount of accumulated evidence—which covaries with RT—assuming that once a decision has been made, decision-related activity in that region declines. (This dependence may be stronger for longer reaction times; see Ratcliff et al. 2009.) Finally, a region that is involved in the decision-making process itself, but is also responsive to variations in sensory input such as motion coherence, should show a response that varies both with perceptual properties of the input and with variation in RT. For instance, if the mIPS acts as a sensory evidence accumulator and variations in RT are correlated with changes in the drift rate of the accumulator process, then longer RTs—because they reflect a slower accumulation of evidence and therefore more integrated neural activity—should be associated with a greater degree of activation in mIPS. Notwithstanding the predicted relationship between RT and BOLD activity in a sensory accumulator, this correlation should coexist with changes in activity that are purely attributed to motion coherence and thus independent of RT. This result follows from the model prediction that accumulated neural activity in mIPS is determined by the drift rate of the diffusion process, which is directly related to stimulus coherence (although it is only one of many sources of RT variation; Ratcliff et al. 2009).
To examine changes in RT independently of changes in motion coherence, we performed separate robust linear regressions for each level of motion coherence in which RT served as the independent variable and BOLD activity (estimated as trialwise beta coefficients) represented the dependent variable. Each of these seven regression fits yielded an estimate of the slope, reflecting the change in RT as a function of activation, and an intercept, which provided an estimate of BOLD activity when RT is zero. If a region is exclusively “driven” by RT, for example, the slopes for each level of motion coherence should be nonzero and constant, and there should be no difference in the intercept value across coherences, provided that the RT slopes do not differ as a function of motion coherence. Alternatively, if RT does not fully explain the effect of motion coherence on activation, the estimate of the amount of BOLD activation at RT of zero should demonstrate a negative parametric effect of motion coherence and slopes should be close to zero.
In Fig. 6A we show group means of the parametric effect in the intercept (i.e., matched at RT = 0 ms; left side) and the average within-coherence slope (right side) for the 11 ROIs (as well as OPOLE and M1) that showed a negative correlation with motion coherence in the standard analysis. Signatures of primarily sensory, integrative, and motor areas can be readily identified. Regions in occipital cortex, with the exception of OPOLE, show large coherence effects for the intercept estimates paired with weak and nonsignificant RT slope effects. In the IPS, on the other hand, there is a statistically significant parametric trend in the intercept estimates, as well as a trend in RT slope for mIPS—i.e., activity in the three IPS regions reflects both a property of the stimulus, i.e., motion coherence, and in mIPS a trend in the temporal factors associated with the behavioral response. Further anterior, the intercept effects in MFG, ACC/SMA, and aINS are relatively small and nonsignificant (P > 0.05) when the linear effect of coherence is estimated after extrapolating the effect at an RT of zero. [These parametric intercept effects for MT+, mIPS, and FEF are not particularly sensitive to our extrapolation of the effect at RT = 0 (rather than, for example, 1,000 ms), indicating that the responsiveness to motion coherence is consistent across RT; see Supplemental Table S1.] However, these areas show a large and significant relationship between RT and activation level (Fig. 6A, right plot). Finally, to characterize the differences between areas, we pooled the areas anatomically, dividing them into frontal, parietal, and occipitotemporal regions. There was a significant difference across these three regions for slope [main effect of region for slope: F(2,64) = 10.85, P < 0.0001], with post hoc t-tests showing significant frontal–occipital (t = 4.84, P < 0.0001) and parietal–occipital (t = 2.78, P = 0.007) differences. There was a trend for the main effect of region for the intercept estimate [F(2,64) = 2.8259, P = 0.0667]. Post hoc t-tests revealed a significant difference only between frontal and parietal regions (frontal–parietal difference: t = 2.33, P = 0.023).
Decision accuracy and BOLD activation
Another prediction, derived not from neurophysiology but from the diffusion model, pertains to expected differences in BOLD activity on correct versus incorrect trials when speed is not emphasized. Intuitively, for a given motion coherence, trials with a high drift rate are associated with both more accurate and more rapid responses. In contrast, trials with slower drift rates generally represent weaker evidence for the correct response and are thus more likely to result in errors (Ratcliff and McKoon 2008). Thus, among a set of trials that are “equally difficult” from the experimenter's perspective (e.g., 4% motion coherence), those trials in which the subject commits an error are on average more likely to have occurred for relatively slower drift rates, and therefore to be correlated with greater BOLD responses. It follows that an evidence accumulator should show greater activity on incorrect than that on correct trials. Because greater responses in error trials could result from other processes, such as error monitoring (although previous work suggests that error monitoring may be mediated by distinct regions such as the medial frontal gyrus and ventral/dorsal ACC; Wheeler et al. 2008), no feedback was given during the task to minimize the possibility of contributions from these other processes.
Behaviorally, these predictions held true for those motion coherences (2, 4, 8%) at which enough errors were committed. These differences were significant except for 2% motion coherence, which showed a trend in that direction [2%: correct 1.696 s, incorrect 1.750 s (T = 2.04, P = 0.096); 4%: correct 1.622 s, incorrect 1.756 s (T = 7.33, P = 0.00074); 8%: correct 1.478 s, incorrect 1.800 s (T = 5.92, P = 0.0019)]. Similarly, as can be seen in Fig. 6B, BOLD responses were indeed larger for incorrect than correct trials in the mIPS and the frontal ROIs: aINS, SMA, MFG, and FEF, excluding PMC. Of note, these findings in mIPS closely parallel those of Wheeler and colleagues (2008); their accuracy-sensitive mIPS region (Talairach coordinates −26 −68 38) is separated by about 8 mm from the region reported here. None of the earlier visual areas, including MT+, showed significant differences for correct and incorrect trials.
Functional connectivity between IPS and MT+
To investigate whether the aforementioned regions might form an interacting network, we computed the voxel-by-voxel correlation between MT+ and the rest of the brain. We considered two cases: correlations irrespective of motion coherence level, and correlations that varied parametrically with changes in motion coherence. In keeping with our predictions, the former analysis demonstrated that the trialwise BOLD estimates in MT+ were correlated with responses in a number of areas including MFG, aINS, and IPS. Intriguingly, the latter analysis revealed that correlations between MT+ and only one other region—the border between the anterior and medial IPS—varied significantly (and inversely) with motion coherence (Fig. 7). (This region also showed a significant increase in BOLD response to incorrect over correct trials.) A second, more lateral and inferior parietal area (not illustrated) showed correlations that varied directly with increasing motion coherence; however, the BOLD time course for this region deactivated with each stimulus presentation (Singh and Fawcett 2008) and we thus suspect that it is less likely to participate in the perceptual decision.
In this study of perceptual decision making, we identified a network of areas whose activity verified the predictions of simple physiologically and psychologically based models. Using a motion-coherence task, we were able to take advantage of quantitative predictions for responses throughout a perceptual decision-making circuit for the first time, including both specific sensory input and integrative areas. Moreover, task-active brain regions, characterized by BOLD activity that varied inversely with motion coherence, could be organized into a functional gradient along a sensorimotor continuum according to their sensitivity to stimulus strength, time to peak of the BOLD response, correlation with response times, and level of activity during error trials. Although consistent with other research on both macaques and humans (Gold and Shadlen 2007; Heekeren et al. 2008; Ploran et al. 2007; Schall 2003), these data expand our understanding of how parametric changes in perceptual discriminability, as well as trial-to-trial variation in accuracy and RT, jointly define the activity patterns in multiple regions that constitute the human visual decision-making network.
A growing body of work on perceptual decision making in humans complements the work in macaques and suggests that more anterior regions are significant components of the decision network. In a magnetoencephalographic study of a motion-detection task, Donner and colleagues (2007) noted that responses in the beta (12–24 Hz) frequency range were more active before correct than incorrect responses in areas located in the dorsolateral prefrontal and posterior parietal cortices. In an fMRI study of a fear–disgust discrimination, Thielscher and Pessoa (2007) found that trial-by-trial fluctuations in activity in anterior insula, but also in middle frontal gyrus and anterior cingulate cortex, correlated with reaction times for decisions in which the sensory input—a facial expression—was neutral/identical. They also identified the importance of anterior insula and anterior cingulate in tracking reaction times related to the degree of fear or disgust present in the face stimuli. Ploran and colleagues (2007) took an innovative approach based on hierarchical cluster analysis to segregate 32 regions of interest into sensory, accumulator, and recognition areas that appear to follow a similar posterior-to-anterior gradient as we observe here (e.g., their Fig. 5). Their results are quite complementary to ours, in that the prolonged time course of their stimulus presentation (images were gradually revealed over 14 s) permitted them to classify regions based on time courses but not to correlate BOLD data for individual trials with reaction times.
One aspect of our results—our finding of a model-based inverse relationship between task-active areas and predictions based on motion coherence—is consistent with hypotheses developed in other paradigms, in that “the BOLD response [in the context of the accumulator model] is considered proportional to the cumulative underlying neural activity” (James and Gauthier 2006). However, at first glance this finding appears at odds with past fMRI results in studies most directly similar to ours. In an fMRI study of motion processing in early visual regions, Rees and colleagues (2000) demonstrated that MT+ showed a positive linear relationship with motion coherence. In addition to the fact that two nonfoveal fields of dots were presented, including one that was irrelevant to the task, their stimulus durations were quite short (250 ms) relative to ours (2,500 ms), potentially favoring more transient responses. For such short time durations, for example, it is possible that accumulation in higher-level areas was captured in a subthreshold state for all motion coherences, in which case higher motion coherences would be expected to be closer to threshold, and thus with a higher integrated response, than lower coherences—a finding quite in keeping with our results. However, such responses were not reported in the parietal regions, but in MT+ and other occipital areas.
Heekeren and colleagues (2006), in evaluating a dot-motion–based decision task for two motion-coherence levels (12.8 and 51.2%), briefly noted the presence of nonmonotonic responses to additional motion-coherence levels in MT+. A methodological difference may relate to our decision to repeatedly image six subjects: it is possible that we were able to reliably identify the same parametric effect of coherence in all six subjects for several areas, including IPS and MT+, rather than having to rely solely on group-averaged estimates from single scanning sessions for eight subjects, as Heekeren and colleagues did. This additional power allowed us not only to identify additional stimulus-responsive areas, but also to organize active regions into a functional gradient. Heekeren et al. (2006), for example, demonstrated that some (such as FEF and the medial frontal cortex) but not all (such as the MFG and anterior insula) of the decision-related areas identified here did show greater activity for 12.8% motion coherence than for 51.2%.
More generally, consideration should be given to the classes of neuronal responses in mIPS that could potentially produce our BOLD findings: a positive BOLD signal response for each motion coherence level, but a negative parametric variation in this signal related to an increasing and more durable response to progressively lower motion coherences. A number of potential variables exist: the relative contributions of excitatory and inhibitory neuronal metabolic activity to the positive BOLD signal (Arthurs and Boniface 2002; Sotero and Trujillo-Barreto 2007); the balance of increases in neuronal firing rate for cells tuned to the motion stimulus with decreases for neurons tuned to off-preferred directions; neuronal interactions influencing the shape of the population tuning curve to direction of motion across IPS (Beck et al. 2008); the relative contributions of different phases of processing to the BOLD signal (e.g., evidence integration, motor preparation); the influence of nonlinearities (e.g., rectification nonlinearities, synaptic depression) on firing rates (Kayser et al. 2001); the effect of top-down (e.g., attentional) influences on firing rates across time (Treue and Maunsell 1996, 1999); and nonneural (e.g., hemodynamic) effects on the BOLD response related to nonlinearities in the transform from neural activity to blood flow (Buxton et al. 2004), among others. With respect to neuronal firing, the excellent data available for single-cell responses leave open primarily the question of population-level responses. Assuming the single-cell responses to preferred and antipreferred motion directions found in IPS, our data could hypothetically be consistent with a zero-sum excitatory response, but greater inhibitory neuronal activity for lower motion coherences due to suppression of a larger number of motion directions within the stimulus (translating to a larger BOLD signal at lower motion coherences secondary to increased metabolic activity). Alternatively, the same effect could be achieved if the width of the population tuning curve within IPS were sharper for higher motion coherences, leading to lower total responses for higher motion coherences. Based on the current data, a parsimonious explanation requiring only greater integration time for progressively lower motion coherences, as well as greater preferred than nonpreferred firing rates, would nonetheless be greatly aided by finer knowledge of the population response within IPS.
These considerations also pertain to the etiology of the negative parametric effect of motion coherence in MT+, which does not show activity related to evidence accumulation in macaques (Gold and Shadlen 2007). Different explanations would invoke bottom-up, top-down, or a combination of effects. One bottom-up explanation, for example, would posit that at low motion coherence values, MT neurons are minimally active. However, because neurons of all motion directions are pooled within the BOLD response (due to limitations in spatial resolution of the BOLD signal), the summed activity may be larger in this case than that for high motion coherences, in which a significantly smaller fraction of neurons is highly active. Additionally, other mechanisms, such as synaptic depression, might differentially decrease stronger perceptual stimuli. Alternatively, top-down signals that enhance the BOLD response in MT might be greater for low motion coherence stimuli than that for high motion coherence, thereby altering the population tuning curve (Scolari and Serences 2009; Serences et al. 2009). Future experiments in which requirements for top-down control of MT representations are varied (e.g., through attentional manipulations) would be informative: if the bottom-up hypothesis holds true, for example, such manipulations would have no effect on the parametric MT response.
Time on task and difficulty
A significant question for these results concerns the roles of other processes, such as time on task and difficulty. Time-on-task arguments would posit that the explanation for these findings relates simply to the duration between stimulus onset and response—in other words, that BOLD responses to lower motion coherence are different solely because these trials are generally correlated with longer reaction times. One caution is that increases in response time as motion coherence decreases are a predicted consequence of the slower rate of information accrual in the diffusion process—and thus “time on task” is not a nuisance factor to be explained away, but rather a fundamental measure of the temporal evolution of the decision-making process. Nevertheless, in both MT+ and IPS, even after correcting for variation in response times, the parametric effect of motion coherence persisted (Fig. 6).
A corollary of the time-on-task argument is that regions that vary with reaction time, but not with the sensory input, might be related to nonspecific “waiting” processes irrelevant to the perceptual decision itself. In both the MFG and the ACC/SMA, for example, activity might yet be characterized as a “waiting” process inasmuch as these regions showed only very small effects of motion coherence after response time was taken into account. Although one cannot rule out the possibility that these regions are idle bystanders in the decision-making process, it seems far more likely that they are operating at a level sufficiently removed from the analysis of sensory features that these factors have little impact on the levels of neural activity (Binder et al. 2004; Thielscher and Pessoa 2007). Alternatively, there is no reason to believe that accumulator behavior cannot occur in regions outside the IPS (Schall 2003). Lesion studies, perhaps using transcranial magnetic stimulation, could provide additional evidence for the role(s) that these areas play in the task.
Task difficulty is yet another explanation that could be advanced for these findings. Of course, difficulty can serve as a proxy for a number of related concepts of varying specificity, including but not limited to perceptual salience, confidence, the need for cognitive control, and “mental effort” (Grinband et al. 2006); moreover, care should be taken to specify which processes might be intended. It is certainly true that trials with lower motion coherence were more “difficult” in terms of accuracy. Regardless of the definition, however, the diffusion model offers a characterization of the decision process that operationalizes “difficulty” as emerging from the interaction of a small number of parameters, most important of which is the drift rate of the accumulation process.
One might also characterize many of the effects of variations in motion coherence as arising primarily from an increase in attentional deployment—either top-down or bottom-up—when the signal-to-noise ratio of the stimulus decreases. Our task design attempted to minimize bottom-up influences: every stimulus remained on the screen for the same period of time and subjects were required to maintain fixation without blinking throughout. Consequently, bottom-up visual attentional mechanisms would not clearly differentiate the various motion-coherence conditions. Nonetheless, to address the possibility that random fluctuation in the background motion noise might be important to behavior, we measured the actual motion coherence for 60 trials at each of the seven nominal motion coherence values and then correlated those values with response time for two subjects (101 and 103). Across the 2,500 ms of each motion-coherence presentation, the empirically derived average SD across motion coherence was 1.4%. Within each defined motion-coherence level, each subject showed a small negative correlation in reaction time with actual motion coherence [r = −0.12, P = 0.03 and r = −0.11, P = 0.10 (ns), respectively, by Wilcoxon's signed-rank test]. These data suggest that only 1–1.5% of the variance in reaction time might be explained by variation in the sensory stimulus within a nominal motion-coherence level.
It is also certainly possible that “top-down” attentional processes are deployed more for low coherence than for high coherence trials (Buschman and Miller 2007; Gazzaley et al. 2005). One approach used by Thielscher and Pessoa (2007) to control for “overall attentional and/or task difficulty contributions” in decision areas was 1) to correlate RT with trialwise BOLD responses, 2) to evaluate the covariance of RT with respect to BOLD activity in a condition-specific (motion-coherence–specific) manner, 3) to ensure that significant correlations in these parameters were also seen for the null (0% motion coherence) condition, and 4) to search for these changes only for voxels that showed the expected mean parametric effects. These constraints accomplished the following goals (numbered correspondingly): 1) to reduce the possibility that the relationship of RT to each motion coherence level was mediated only by mean RT, which would be indistinguishable from global attention or variation in task difficulty; 2) to greatly decrease the consistent (i.e., constant) within-condition influence of global attentional/difficulty demands by examining second-order (variance-related) rather than first-order (constant, or mean-related) properties of the BOLD response; 3) to control for the presence of motion-related, so-called bottom-up attentional effects by evaluating the condition in which minimal/no coherent motion was present; and 4) to ensure that these effects were not spurious by limiting evaluation to only task-responsive areas. By satisfying each of these conditions, Thielscher and Pessoa (2007) argued that the influence of global attentional effects (and difficulty effects, for that matter) could be minimized. Using this approach, they showed decision-related activity within hypothesized decision-sensitive areas in ACC, MFG, and IFG/anterior insula.
Figure 6A captures these arguments. For the trialwise correlation analysis (requirement 1) in which the covariation of RT and BOLD response was examined (criterion 2) for individual motion coherences across areas showing the expected mean effect (criterion 4), we found, as did Thielscher and Pessoa (2007), that the MFG, aINS, and ACC/SMA (as well as PMC and FEF) showed a significant and consistent correlation between BOLD response and RT within all seven motion coherences that was present even when matched for RT. [Of note, mIPS showed a trend in this direction (P < 0.058).] When we explicitly evaluated RT slope solely for 0% motion coherence (criterion 3), only these same areas (with the exception of aINS) showed slopes significantly greater than zero: PMC (P = 0.047), FEF (P = 0.045), MFG (P = 0.02), and ACC/SMA (P = 0.01), with mIPS again showing a trend (P = 0.065). These findings suggest that global attentional effects alone are not sufficient to explain the parametric variations in BOLD activity in these areas.
It might be further suggested that such arguments do not exclude attentional effects within a given level of motion coherence. We attempted to establish a consistent level of attention—i.e., to minimize the influence of variations within a motion-coherence level (e.g., due to fatigue)—through the randomized, counterbalanced distribution of different trials, such that a given motion coherence trial was equally likely to occur during the beginning, middle, and end of a scan session. Moreover, subjects were very successful (see methods) at maintaining fixation, without eyeblinks, for the duration of each stimulus presentation outside the scanner—another mechanism for enforcing consistent attention throughout each trial. Finally, although studies in both macaque (reviewed in Maunsell and Treue 2006; Reynolds and Chelazzi 2004) and human (Beck and Kastner 2009; Silver et al. 2005) suggest that visual responses throughout posterior cortices are modulated by attentional state, we are not aware of evidence to support the notion that such modulatory processes are as exquisitely well tuned to stimulus discriminability as we and others have observed. Nonetheless, extensions of our perceptual decision-making research in which the attentional state is further manipulated will clearly be important, especially insofar as accumulator processes in parietal cortex are influenced by reentrant or top-down modulation from control centers in frontal and/or parietal cortex. It is conceivable, for example, that attentional processes are necessary for the implementation of accumulator mechanisms in the brain, although our data do not address this possibility.
Accumulators and future directions
In keeping with predictions of the diffusion model, we have shown that the behavioral data in our task are well fit by the parametric diffusion model developed by Palmer and colleagues (2005) and that, as this simple model predicts, responses in IPS decrease as the motion coherence increases. One limitation of our current approach is that reaction time is dictated by the subject, permitting us to sample beta values/BOLD activity only for trials in which neural responses have presumably integrated to threshold. By fixing the viewing duration at various lengths, it might be possible to “catch” accumulation at various time points across various motion-coherence values, independently of motion coherence. As mentioned earlier, another important direction would be to explicitly vary the attentional state. A further limitation of the present study is the small number of subjects tested. Our decision to scan a small number of highly trained subjects across multiple days resulted in excellent fits of the diffusion model and robust single-subject activation maps. Moreover, the large number of sessions gave us sufficient power to separately analyze a portion of each subject's data for the purpose of defining ROIs in a statistically independent and unbiased manner. Nevertheless, in future work it will be important to test larger samples of trained subjects that will offer greater statistical power for both whole-brain random-effects statistics and individual differences analyses (e.g., correlations between brain activations and diffusion model parameters, such as drift rate).
In summary, these data support a simple model in which motion information from primary visual cortex is extracted by MT+ and transformed in the IPS under the control of top-down signals from sources including MFG. Areas that are sensitive to task set (aINS, ACC/SMA; Dosenbach et al. 2006), to uncertainty (aINS; Grinband et al. 2006), and/or to conflict (ACC/SMA; Botvinick 2007) would then reduce the space of possible responses in preparation for action production by motor structures (SMA, premotor cortex). Further modifications of this task in both macaques and humans might work to distinguish both the complex nature of the processing within, and the relationships between, these multimodal areas in the generation of this rapid perceptual decision.
This work was supported by National Institute of Mental Health Grant MH-63901 and start-up funds provided by the State of California to A. S. Kayser.
↵1 The online version of this article contains supplemental data.
- Copyright © 2010 the American Physiological Society