Sequential sampling models provide a useful framework for understanding human decision making. A key component of these models is an evidence accumulation process in which information is accrued over time to a threshold, at which point a choice is made. Previous neurophysiological studies on perceptual decision making have suggested accumulation occurs only in sensorimotor areas involved in making the action for the choice. Here we investigated the neural correlates of evidence accumulation in the human brain using functional magnetic resonance imaging (fMRI) while manipulating the quality of sensory evidence, the response modality, and the foreknowledge of the response modality. We trained subjects to perform a random dot motion direction discrimination task by either moving their eyes or pressing buttons to make their responses. In addition, they were cued about the response modality either in advance of the stimulus or after a delay. We isolated fMRI responses for perceptual decisions in both independently defined sensorimotor areas and task-defined nonsensorimotor areas. We found neural signatures of evidence accumulation, a higher fMRI response on low coherence trials than high coherence trials, primarily in saccade-related sensorimotor areas (frontal eye field and intraparietal sulcus) and nonsensorimotor areas in anterior insula and inferior frontal sulcus. Critically, such neural signatures did not depend on response modality or foreknowledge. These results help establish human brain areas involved in evidence accumulation and suggest that the neural mechanism for evidence accumulation is not specific to effectors. Instead, the neural system might accumulate evidence for particular stimulus features relevant to a perceptual task.
- decision making
- functional magnetic resonance imaging
- random dot motion
- sequential sampling
many decisions ranging from simple perceptual and memory decisions (Ratcliff and McKoon 2008) to complex economic decisions (Busemeyer and Townsend 1993) appear to be based on an evidence accumulation process. During this process, agents sequentially sample noisy information and accumulate it as evidence until a threshold is reached initiating a response. Given the success of sequential sampling models in explaining behavior, it is important to know if and how this evidence accumulation process is instantiated in the brain (Gold and Shadlen 2007). Single-unit recording studies have suggested that sensorimotor areas accumulate evidence in simple perceptual decisions (Hanes and Schall 1996; Kim and Shadlen 1999; Romo et al. 2004; Shadlen and Newsome 2001). These findings support an effector-specific hypothesis where the neural mechanisms responsible for carrying out an action are also responsible for deciding to execute the action (Gold and Shadlen 2007; Shadlen et al. 2008).
However, single-unit recording only provides a spatially restricted, albeit a precise, view of the neural circuits underling evidence accumulation. Functional magnetic resonance imaging (fMRI) can be used to address this limitation and extend these investigations to humans. Several fMRI studies have manipulated the quality of the sensory information (which translates to the drift rate in the sequential sampling model, see Fig. 1) to look for the neural correlates of evidence accumulation but have obtained contradictory results (e.g., Heekeren et al. 2006; Ho et al. 2009; Tosoni et al. 2008). The complication arises because investigators have adopted different criteria in relating changes in fMRI response to changes in neuronal activity. Here we reasoned that if higher drift rate corresponds to a steeper ramp-up in neuronal activity (Shadlen and Newsome 2001), we would expect a smaller fMRI response for these trials relative to low drift rate trials (Fig. 1). In other words, the fMRI signal reflects the total amount of accumulated evidence, not the change in instantaneous firing rate (Ho et al. 2009; Kayser et al. 2010). We used this criterion to examine the effector specificity of evidence accumulation in a simple perceptual decision task: random dot motion (RDM) direction discrimination.
A limitation of previous studies on evidence accumulation processes is that the agent has foreknowledge of which response mode they are going to use during decision formation. A neural system containing effector-specific decision modules would appear to be an efficient solution in this situation. However, sometimes the agent has no foreknowledge of which response they are going to use, much like a person might not know which action to take when they decide to buy a product behind a glass display case at a store. Once a choice is made, the person could express his preference to a clerk via a glance or a gesture. An outstanding issue is how an effector-specific decision mechanism would operate under such a situation. To be sure, an effector-specific decision system could still operate in these conditions. For instance, in the extreme case, each decision module in each effector area would accumulate evidence simultaneously (Shadlen et al. 2008). However, an alternative effector-general mechanism in which evidence accumulates without regard to the response mode would seem more efficient and adaptive for this situation. We manipulated foreknowledge of response mode to examine the generality of the effector-specific hypothesis.
MATERIALS AND METHODS
Nine right-handed subjects (4 female, mean age = 27 yr) participated in the experiment; all had normal or corrected-to-normal vision and reported normal neurological history. Two of the subjects were authors; the rest were graduate and undergraduate students at Michigan State University, all of whom gave written informed consent and were compensated for their participation. The experimental procedures were approved by the Institutional Review Board at Michigan State University and adhered to safety guidelines for MRI research.
Visual Display and Stimuli
Visual stimuli were generated using MGL (http://gru.brain.riken.jp/doku.php/mgl/overview), a set of custom OpenGL libraries running in Matlab (Mathworks, Natick, MA). For training in the psychophysical laboratory, images were presented on a SONY 21' CRT monitor (resolution: 1,024 × 768 and refresh rate: 60 Hz). Subjects viewed the display from a distance of 57 cm, with their heads stabilized by a chin rest. In the scanner, images were projected on a rear-projection screen located in the scanner bore by a Toshiba TDP-TW100U projector outfitted with a custom zoom-lens (Navitar, Rochester, NY). The screen resolution was set to 1,024 × 768, and the display was updated at 60 Hz. Subjects viewed the screen via an angled mirror attached to the head coil at a viewing distance of 60 cm.
We employed the RDM stimulus, originally developed in neurophysiological studies (Newsome and Pare 1988). The RDM stimulus consisted of moving dots (dot size: 0.1°) in a circular aperture (10°), presented on a dark background (0.01 cd/m2). The dots were plotted in three interleaved sets of equal number, with each set plotted in a single video frame. Three video frames later, a proportion of the dots in a set was plotted with a small displacement of 0.25°, producing apparent motion with a speed of 5°/s, while the rest of the dots were plotted in random locations within the aperture. This yielded an effective dot density of 16.8 dots·deg−2·s−1. The proportion of coherently moving dots (motion coherence) is the key stimulus parameter that controls the accumulation rate of sensory evidence and task performance.
RDM Direction Discrimination Task: Training
Before the imaging session, each subject trained in the psychophysical laboratory on the RDM direction discrimination task for 4–8 h. The purpose of training was threefold: 1) to familiarize subjects with the task; 2) to stabilize their performance on the task; and 3) to collect sufficient amount of data to allow model fitting (see below). On each trial, a RDM stimulus was presented for 0.5 s, after which subjects pressed a key on a computer keyboard to indicate the direction of coherent motion. On these training trials, subjects made an immediate response without delay. They pressed “z” key with the left index finger for leftward motion and “1” key (on the numeric keypad) with the right index finger for rightward motion (Fig. 2A). We used the method of constant stimuli to measure the motion coherence threshold. Subjects completed six or seven levels of coherence. The number of levels and the coherence values were adjusted for individual subjects to include the dynamic range of their psychometric functions. Overall the range of motion coherence used across subjects was 0.25–64%. Subjects were trained on the task until their threshold stabilized across at least two 1-h sessions (∼600 trials/session). The first two subjects in the experiment completed 120 trials per level, and all subsequent subjects completed 160 trials per level.
In a separate, 1-h training session before the first scanning session, we obtained at least two runs of data for both the response known and response unknown conditions (see below for the description of the task). The purpose of this training was to familiarize subjects with the scanner task and to obtain data for the saccade responses. During this part of the training, we monitored subjects' eye position using an Eyelink II eye tracker (SR Research, Ontario, Canada). The position of the right eye was tracked at 250 Hz, and data were analyzed offline using custom code written in Matlab.
Fitting the Training Data with a Drift Diffusion Model
We used a drift diffusion model to investigate whether manipulations in the coherence level resulted in changes in the mean drift rate. In the drift diffusion model, the probability of a correct response is (1) The parameters are described in Table 1. Note the experimental design implied no bias so z was set to 0. The drift coefficient σ can be interpreted as a scaling coefficient, so it was set to 1. The probability of an incorrect response is P(Rincorrect) = 1 − P(Rcorrect). The joint cumulative probability distribution for obtaining a response time less than T and responding correctly is (2) where Td is the decision time and Te is the nondecision time (see Table 1). The drift-diffusion model only accounts for the decision time so the nondecision time is subtracted off (Luce 1986). The function corresponding to an incorrect response can be found by replacing (θ − z) with (θ + z) and P(Rcorrect) with P(Rincorrect).The derivation for these expressions can be found in Busemeyer and Diederich (2010).
We used the data from the final two training sessions to fit different drift diffusion models. During these final two sessions, participants experienced six to seven different levels of coherence. For each subject, we removed trials that were >4 SD above the mean response time for the given coherence level. This removed on average 1 trial per condition per subject.
The models were fit using the quantile maximum probability estimation method (Heathcote et al. 2002). Roughly, this method entails calculating five quantiles (0.1, 0.3, 0.5, 0.7, and 0.9) from the response time distributions for correct and incorrect responses for each level of coherence. These quantiles form category boundaries for response times that can be fed into a multinomial likelihood function with the drift diffusion model determining the probability of each category of response times. The Nelder-Meade simplex method was used to find parameters that maximize the function (Nelder and Mead 1965). In terms of degrees of freedom (df), this method produces 8 df for the response times of each coherence level (4 for correct and 4 for incorrect choices) and 1 additional df for the choice (correct or incorrect). Thus there were 63 df for the average subject with seven coherence levels. Table 2 lists the different models we fit with this method, the hypotheses the models test regarding the effect of coherence, and the number of free parameters each has.
RDM Direction Discrimination Task: fMRI
At the beginning of each fMRI session, we ran another 280–560 training trials while subjects lay in the scanner. This was done to precisely calibrate subjects' performance, as the viewing condition and display system was different between the scanner room and psychophysical laboratory. We generally obtained very similar motion coherence thresholds from the two settings. From these calibration trials, we computed two levels of motion coherence, such that each subject's performance on the RDM task was expected to be ∼60% (low coherence) and 80% (high coherence) correct.
We manipulated three factors in the decision task during fMRI: response mode (saccade vs. button press), foreknowledge of response mode (known vs. unknown), and motion coherence (low vs. high). All factors are within-subject, with foreknowledge blocked by runs, while response mode and motion coherence interleaved within a run.
A trial in the response known condition is illustrated in Fig. 3A. At the beginning of a trial, a cue (0.25 s) indicated that a RDM stimulus would appear and also the response to be used (oval: saccade and square: button press). After a 2-s fixation period, a RDM stimulus (same characteristic as in training, see above) along with two peripheral circular targets (0.6° and 10° eccentricity) appeared for 0.5 s. The motion coherence of the RDM stimulus was either low or high, as determined by the calibration trials for each subject in the scanner (see above). A fixation period of 9.5 s followed the stimulus offset, after which the fixation point turned green, signaling the subject to make the required response. We used a long delay between stimulus and response to isolate neural activities for decision formation from those for motor responses. For saccade trials, subjects were instructed to move their eyes to the location of the peripheral target in the direction of coherent motion in the RDM stimulus. For button press trials, they were instructed to indicate the motion direction by pressing buttons held in the corresponding hand (leftward motion: left-hand buttons and rightward motion: right-hand buttons). For each response, they pressed their index finger and middle finger sequentially to indicate their choice. An intertrial interval of 4–10 s, randomly jittered, followed the response cue. We used two finger responses because in pilot studies we found it was easier to activate the motor and premotor regions for manual responses in the functional localizer experiment (see below).
For the response unknown condition (Fig. 3B), the trials were identical with two exceptions. First, the initial cue was a slightly larger circle at fixation that did not indicate the response mode. Second, the response cue was shown at the end of the delay period (green oval: saccade and green square: button press) to instruct subjects to respond with a particular motor modality.
Each subject completed two fMRI sessions (2 h per session) in the scanner. In each session, they completed 8–12 runs of the decision task (344 s per run). In the first session, they also completed two functional localizer runs for saccade and button press (see below). In total, we obtained 18–20 runs of data for the decision task, resulting in an average of 43 trials for each cell of the design (foreknowledge, response mode, and motion coherence).
Functional Localizers (for Saccade and Button Press) and Retinotopic Mapping
For each subject, we localized brain areas involved in making saccade and button press responses in an independent localizer scan in which subjects performed a memory-delayed response task. On each trial, they made either saccade or button press responses in a 16-s block, followed by a 12-s fixation period (Fig. 4). The response mode was indicated by the shape of the fixation point (oval: saccade and square: button press). Within each 16-s block, multiple trials of memory-delayed response task were performed. A white circle (0.8°) appeared in the periphery (10° eccentricity on the horizontal meridian) for 0.3 s, followed by a delay interval (1–3 s), after which the fixation point turned to green, signaling subjects to make the response. After a 1.2-s intertrial interval, the next trial started. In the saccade trials, subjects were instructed to remember the target location and to make a saccade to the memorized location of the peripheral target. In the button press trials, they were instructed to press the index and middle fingers sequentially with the corresponding hand (left target: left hand and right target: right hand). Thus, on average, subjects made five to six responses in each 16-s block. Each fMRI run contained 12 trials (6 saccade and 6 button press) with an 8-s fixation period at the beginning (344 s total), and each subject completed two fMRI runs at the beginning of their first fMRI session.
For each subject, we also mapped early visual cortex as well as several parietal areas that contain topographic maps in a separate scanning session. We used rotating wedge and expanding/contracting rings to map the polar angle and radial component, respectively (DeYoe et al. 1996; Sereno et al. 1995). Borders between visual areas were defined as phase reversals in a polar angle map of the visual field. Phase maps were visualized on computationally flattened representations of the cortical surface, which were generated from the high resolution anatomical image using FreeSurfer (http://surfer.nmr.mgh.harvard.edu/) and custom Matlab code. Multiple runs of the wedge and ring stimuli were collected and averaged to increase signal-to-noise ratio. We incorporated an attentional tracking task in the mapping procedure where subjects tracked the moving stimulus with covert attention and detected a luminance decrement in the stimulus via button press. The amount of luminance decrement was controlled by an adaptive staircase procedure. This attentional tracking task helped us identify topographic areas in the parietal areas [intraparietal sulcus (IPS)1–4], consistent with previous reports (Silver et al. 2005; Swisher et al. 2007). In a separate run, we also presented moving vs. stationary dots in alternating blocks and localized the human motion-sensitive area, MT+, as an area near the junction of the occipital and temporal cortex that responded more to moving than stationary dots (Watson et al. 1993). Thus, for each subject, we indentified the following areas: V1, V2d, V2v, V3d, V3v, V3A/B, V4, V7, and hMT+, and four full-field maps in the IPS: IPS1, IPS2, IPS3, and IPS4. We did not observe a consistent boundary between V3A and V3B; hence, we defined an area that contained both and labeled it V3A/B. We adopted the definition of V4 as a hemifield representation anterior to V3v (Wandell et al. 2007).
Magnetic Resonance Imaging Protocol
Imaging was performed on a GE Healthcare (Waukesha, WI) 3T Signa HDx MRI scanner, equipped with an 8-channel head coil, in the Department of Radiology at Michigan State University. For each subject, high-resolution anatomical images were acquired using a T1-weighted MP-RAGE sequence (field of vision = 256 × 256 mm, 180 sagittal slices, and 1-mm isotropic voxels). Functional images were acquired using a T2*-weighted echo planar imaging sequence (TR = 2 s, TE = 30 ms, flip angle = 77°, matrix size = 64 × 64, in-plane resolution = 3.3 × 3.3 mm, and slice thickness = 4 mm, interleaved, no gap). Thirty axial slices covering the whole brain were acquired every 2 s. In each scanning session, we also acquired a T1-weighted anatomical image that had the same slice prescription as the functional scans but with higher in-plane resolution (1.6 × 1.6 × 4 mm). This image was used to align the functional data to the high resolution anatomical images for each subject.
fMRI Data Analysis
Functional MRI data were analyzed using mrTools running in Matlab (http://www.cns.nyu.edu/heegerlab/wiki/doku.php?id=mrtools:top), as well as custom codes. Data for each run were first motion corrected (Nestares and Heeger 2000), linearly detrended and high-pass filtered at 0.01 Hz to remove low frequency drift, and then converted to percent signal change by dividing the time course of each voxel by its mean signal over a run. Spatial smoothing was also applied with a 4-mm FWHM Gaussian blur. Data from the two sessions of decision task were then aligned and concatenated for subsequent analysis.
For the functional localizer data, we fit each voxel's time series with a general linear model containing two sets of regressors (saccades vs. button presses). Each regressor was constructed by convolving a boxcar function of the active blocks with a canonical hemodynamic response function parameterized as a difference between two gamma functions (Friston et al. 1995).
To localize cortical areas differentially involved in making saccadic and button press responses, we performed a linear contrast (saccade and button press) after first removing the common variance associated with both regressors. Two values were obtained for each voxel: the difference in the fitted coefficients (beta weights), and the amount of variance in the time series explained by the model (r2). The r2 value indicates how well a voxel's time course is explained by the experimental paradigm. We evaluated the statistical significance of the r2 value by a permutation test (see below for details) and chose an r2 threshold value corresponding to a P value of 0.01 (uncorrected for multiple comparisons). The exact threshold value was not critical for the results presented here. For each subject, we defined the areas preferring saccades and button presses by thresholding the r2 value in conjunction with the sign of the contrast (positive contrast: saccade preferring, negative contrast: button press preferring, see Fig. 6).
We analyzed the known and unknown data separately by fitting each voxel's time series with a general linear model containing four sets of regressors, corresponding to the two coherence levels (low vs. high) crossed by two response modalities (saccade and button press). Each regressor was composed of 13 columns of time-shifted 1's (essentially a finite impulse response filter), modeling the fMRI response in a 26-s window after the onset of a trial. The design matrix was then pseudo-inversed and multiplied by the time series to obtain an estimate of the hemodynamic response evoked by each of the four conditions. This deconvolution approach assumes linearity in temporal summation (Boynton et al. 1996; Dale 1999) but not a particular shape of the hemodynamic response. To obtain a measure of response amplitude for a brain area, the deconvolved responses were averaged across the voxels in that area. This analysis was performed on all predefined areas, either by the retinotopy, functional localizer, or overall task-driven activity (see below).
We also looked for voxels that showed significantly modulated response in the decision task, regardless of the sign of the response or the relative response amplitude among conditions (equivalent to an omnibus F-test). This was done by using the goodness of fit measure, r2, of the deconvolution model, which was the amount of variance explained by the model. The statistical significance of the r2 value was evaluated via a permutation test (Gardner et al. 2005; Nichols and Holmes 2002). Event times were randomized and r2 values were recalculated for the devolution model. Ten such randomizations were performed; the resulting distributions of r2 values for all voxels were then combined to form a single distribution of r2, which we took as the distribution of r2 values expected by chance. Note that each of the 10 distributions was computed for all voxels (∼100,000); thus combining 10 of them produced a sufficiently large sample to estimate the null distribution. Each voxel's P value was then calculated as the percentile of voxels in the null distribution that exceeded the r2 value of that voxel. Using a cut-off P value of 0.01 (uncorrected for multiple comparisons), we defined three frontal areas that were active during the RDM task but were not part of the areas defined by retinotopy and the functional localizer: inferior frontal sulcus (IFS), anterior insular (aINS), and middle insular (mINS), all bilaterally (see Fig. 7). We used a relatively liberal threshold in analyzing individual subject data to define brain areas of sufficient size.
Visualization of group data.
All analyses were performed on individual subject data, and all quantitative results reported were based on averages across individual subject results. We also performed group averaging of the individual maps to provide a visualization of the overall pattern of brain activity. Each subject's two hemispherical surfaces were first imported into Caret and affine-transformed into the 711–2B space of the Washington University at St. Louis (Buckner et al. 2004). The surface was then inflated to a sphere and six landmarks were drawn, which were used for spherical registration to the landmarks in the Population-Average, Landmark- and Surface-based (PALS) atlas (Van Essen 2005). We then transformed individual maps to the PALS atlas space and performed group averaging before visualizing the results on the PALS atlas surface. To correct for multiple comparisons, we thresholded the maps based on individual voxel level P value in combination with a cluster constraint.
For the group maps (see Figs. 6 and 7), we derived a voxel level P value based on aggregating the null distributions generated from the permutation test for each individual subject. Specifically, we randomly drew 10,000 r2 values from the 10 randomization distributions for each subject and combined them. This combined distribution served as the null distribution for the averaged r2 value across subjects. The P value of each individual voxel was thus the percentile of voxels that has a higher r2 value in the null distribution. We then performed 10,000 Monte-Carlo simulations with AFNI's AlphaSim program to determine the appropriate cluster size given a particular voxel-level P value to control for the whole-brain false positive rate. For the maps shown in results (see Figs. 6 and 7), we used a voxel level P value of 0.001, and a cluster size of six voxels. This corresponded to a whole-brain corrected false positive rate of 0.005 according to AlphaSim.
Behavior in Psychophysics Laboratory
We trained subjects extensively in the RDM task until their performance stabilized before fMRI scans. We then fit a set of drift diffusion models to the data from the final two sessions (see materials and methods). Table 2 lists the four different nested drift diffusion models we fit to the data and the hypotheses they test. The results of the model comparison are also listed in Table 2. We used the Bayesian Information Criterion (BIC; Raftery 1995) to make model comparisons where lower BICs represent better model fits. As a heuristic generally differences in BIC scores >10 are taken as very strong support for the model. We report the fits to the average data, but the conclusions are the same with the fits to the individual data. The model assuming only drift rate changed with coherence level provided the best fit to the data (see Table 2). The best fitting parameters are listed in Table 1. Figure 2B displays the fit of the model to the mean choice proportions and response times. These results confirm that our manipulation of motion coherence impacts drift rate, consistent with previous modeling studies (Ho et al. 2009; Palmer et al. 2005; Ratcliff and McKoon 2008).
Before the imaging sessions, each subject completed the experimental task in the psychophysics laboratory. During the task, subjects indicated their direction discrimination judgment by either making an eye movement (saccade) or pressing buttons (button-press), and they were cued either before (known) or after (unknown) the RDM stimulus regarding the response modality to use on that trial (see Fig. 3). There were two levels of motion coherence (low and high), determined for each individual subject via the constant stimuli task to achieve a low (∼60% correct) and high (∼ 80% correct) performance (see materials and methods). The behavioral results are shown in Fig. 5A. As we would predict with a drift-diffusion model of the decision process, there was a main effect of coherence on accuracy [F(1,8) = 178.9; P < 10−4] with no other significant effects. An interaction effect between coherence and response mode was marginally significant [F(1,8) = 5.16, P = 0.053]; this was caused by a slightly elevated accuracy for saccade responses made on low coherence trials, leading to a smaller coherence effect on saccade than button-press trials. Although it seems that saccades could be more accurate responses than button presses, without more confirmatory data we refrain from interpreting this effect. Importantly, there were robust coherence effects on both saccade and button-press trials in both known and unknown conditions (t-test, all P < 0.01). Due to the stimulus-response delay, reaction times were not meaningful measures of performance.
Behavior in the Scanner
Recall, during the first of two scanning sessions subjects again completed a block of RDM trials with several different levels of coherence. Performance on these trials was used to select two levels of coherence (low and high) for the decision task. Behavioral results in the scanner during the decision task confirmed that our main manipulation of coherence was effective. Accuracy on the RDM task with button-press responses is shown in Fig. 5B. For both known and unknown trials, low coherence stimuli were discriminated at lower accuracy than high coherence stimuli [F(1, 8)=535.2, P < 10−7]; no other effects were significant (all P > 0.35). Reaction time measures again were not meaningful here, given that subjects made their responses after a long delay (the latency to press buttons with respect to the onset of the green report cue was ∼0.4 s on the known trials and ∼0.5 s on the unknown trials, with no difference between low and high coherence). Because our scanner was not equipped with a MR-compatible eye tracker, we were not able to collect eye movement data in the scanner. However, there is no reason to believe performance during the saccade trials was different for the well-trained subjects, especially given their consistent performance in the psychophysics laboratory (see Fig. 5A).
During the first scanning session, subjects completed a functional localizer experiment that allowed us to localize brain areas involved in making the motor responses in our task. Subjects made memory-delayed saccade or button-press responses in alternating blocks (see Fig. 4 for the trial sequence). Figure 6A shows a group-averaged contrast map of saccade vs. button press conditions. Positive values (yellow-red) indicate larger responses for saccades and negative values (cyan-blue) indicate larger responses for button presses.
Saccades evoked stronger responses in three clusters in frontal and parietal cortex: along the precentral sulcus, in posterior medial frontal cortex, and along the IPS. In addition, an area in the lateral occipital cortex also showed higher response to saccades than button presses. The IPS and lateral occipital areas coincided with retinotopically defined IPS1–4 and MT+, respectively. We split the precentral sulcus activity into two areas: a dorsal portion that was located in the caudal part of the superior frontal sulcus and precentral sulcus, the putative human frontal eye field (FEF; Paus 1996) and a ventral portion (vPrCS). The medial frontal area was consistent with the location of human supplementary eye field (Grosbras et al. 1999).
Button presses evoked stronger responses in areas along central sulcus, including parts of the pre-/postcentral gyri, and an area in the posterior medial frontal cortex. In individual subjects, button presses activated areas near a posterior convexity of the central sulcus on the lateral surface. This anatomical feature has been shown to be a reliable landmark for sensorimotor areas related to hand/digit (“hand area”) (White et al. 1997; Yousry et al. 1997). We defined four cortical regions involved in button press responses. The dorsal premotor area (PMd) was defined as the most anterior portion of the hand area, on the precentral gyrus; the primary motor area (M1) was defined as the hand area in the anterior wall of the central sulcus (Moore et al. 2000). The activity posterior to M1 was defined as a single region (PoC) containing parts of the postcentral gyrus and sometimes extended to postcentral sulcus, corresponding to the somatosensory cortex related to the hand/digit (Moore et al. 2000). Finally, on the medial surface, the button-press preferring area was defined as the supplementary motor area (SMA; Picard and Strick 1996). Figure 6B shows the locations of saccade and button press areas on the lateral surface relative to major anatomical landmarks in a single subject.
The overall activity patterns were largely symmetric across hemispheres (Fig. 6A). We were able to define most saccade and button-press areas in both hemispheres in all subjects (all motor areas were defined in at least one hemisphere). For the time-course analyses (below), we have combined the corresponding areas in the left and right hemispheres, as there was no qualitative difference between their time courses.
Decision Task: Overall Active Cortical Network
Whole-brain general linear model (GLM) analysis revealed a network of areas that showed systematic modulation in their activity during the decision task (Fig. 7). The active areas for response known and unknown conditions were very similar (compare Fig. 7, A and B); these included the occipital visual areas, areas along the IPS, and the motor and premotor areas surround the central sulcus. These areas overlapped with individually defined functional areas via retinotopy and the functional localizer (Fig. 7C). In addition, there were three active areas that were not part of the functionally defined areas: a region in the posterior end of the IFS, a region in the aINS, and a region in the mINS. We thus defined these three additional areas for each subject (see Fig. 7C for their locations). For each subject, we have 21 areas defined: retinotopically defined occipital visual areas (V1, V2, V3, V3A/B, V4, V7, and MT+), retinotopically defined IPS areas (IPS1–4, all of which also showed a higher response to saccade than button press in the localizer scan, and will be referred to as saccade areas), localizer-defined areas (FEF, vPrCS, supplementary eye field for saccade, PMd, M1, PoC, and SMA for button press), and task-defined amodal areas (IFS, aINS, and mINS). We next examined time courses during the decision task in these areas.
Decision Task: Region-Based Time-Course Analysis
Time-course data from select brain areas are shown for the response known condition (Fig. 8) and response unknown condition (Fig. 9). These were representative of the four categories of brain areas defined in our experiment: visual areas (V1, V2, and MT+), saccade areas (IPS1, IPS2, and FEF), button-press areas (PMd, SMA, and M1), and amodal areas (IFS, aINS, and mINS). The majority of brain areas, including all the visual areas, the saccade areas, IFS, and aINS showed a bimodal response. The first mode peaked ∼6 s after the RDM stimulus (8 s after trial onset) and the second mode peaked ∼6 s after the response cue (18 s after trial onset). Notably, the button press areas and mINS showed no apparent peak response for the first mode but only showed a peak for the second mode. Given the hemodynamic delay, the first peak corresponds to stimulus processing and the ensuing delay period, whereas the second peak corresponds to cue processing and motor response. The fact that all areas exhibited the second mode of response underscores the importance of separating motor- and decision-related neural activity. Here we focus on the first response (peaking ∼6 s after stimulus) as the time window to examine neural activity associated with accumulation of sensory evidence.
Decision Task: Region-Based Amplitude Analysis
We used the average activity of time points 3–6 (2–8 s after stimulus onset) to quantify response amplitude (using other points around the peak gave similar results). Thus for each foreknowledge condition (known and unknown) and each brain area, we obtained response amplitude measures for four conditions, crossed by response mode (saccade vs. button press) and coherence (low vs. high, see Fig. 10). We then performed three-way repeated-measures ANOVAs on data from each brain area; the statistical significance of these tests is summarized in Table 3. Recall the effector-specific hypothesis would predict an interaction effect between coherence and respond mode, perhaps particularly in the response known condition, whereas an effector-general mechanism would predict no such interaction. Across all brain areas, we found no significant coherence by response mode interactions nor were there any significant three-way interactions. To examine if power is an issue, we also calculated the effect sizes for these interactions with a generalized eta square (ηG2) (Olejnik and Algina 2003), which puts the effect sizes from within-subjects designs and other designs on comparable grounds. We focused on the saccade areas as they showed a significant coherence effect, whereas the button-press areas showed no coherence effect (and for that matter did not really exhibit a response associated with stimulus processing, see Figs. 8 and 9). The average ηG2 for the response mode by coherence interaction in the saccade areas was 0.002 (SD = 0.002), and the average ηG2 for the three-way interaction in the same areas was 0.008 (SD = 0.006). These constitute extremely small effects falling well below the heuristic of 0.01 for a small effect (Cohen 1988) and give little support for any type of effector specificity. Note the amplitudes in the saccade areas in the response known conditions do appear to show a larger effect of coherence for saccade trials (see Fig. 10). However, even this effect was quite small. Conditionalizing solely on the response known trials, the effect size for the response mode by coherence interaction had an average value of 0.003 (SD = 0.004) in the saccade areas, again an extremely small effect.
The only type of interaction effect observed was a foreknowledge × response mode interaction in V1, V4, and SMA (average ηG2 = 0.04, SD = 0.02). This was manifested as a higher fMRI response for button press than saccade trials in the response known condition but no difference in the response unknown condition (see V1 and SMA in Fig. 10). It is unclear what drove this interaction effect in visual areas. In SMA, this effect likely reflects a motor preference for button press responses, consistent with our functional definition of SMA. Indeed, other button press areas like PMd and M1 also showed a similar, although statistically nonsignificant, trend (Fig. 10). Given that the most critical manipulation is coherence, we will not further consider this interaction effect.
Finally, we observed a coherence main effect in seven brain areas: IPS1, IPS2, IPS3, FEF, vPrCS, IFS, and aINS (average ηG2 = 0.05, S = 0.03), with low coherence showing a larger fMRI response than high coherence trials. Five of these areas (all except IFS and aINS) also showed a main effect of foreknowledge, with a larger fMRI response for response known than unknown trials (average ηG2 = 0.20, SD = 0.06). This might be related to response preparation/maintenance during the delay period, which was presumably more likely to occur in the response known than response unknown condition. We will return to this point in the discussion.
In this study, we aimed to test the effector-specific hypothesis that the neural mechanisms that carry out specific actions (e.g., saccade) are also responsible for making decisions about a sensory stimulus (Gold and Shadlen 2007; Shadlen et al. 2008). We used fMRI to examine the effector specificity of evidence accumulation across different conditions of stimulus strength, response mode, and foreknowledge in a simple perceptual decision task. We found that the posterior IPS (IPS1–3) and precentral sulcus (FEF, vPrCS, and IFS), and aINS showed the fMRI signature of evidence accumulation: higher fMRI response on low coherence trials (Fig. 1). This difference between coherence levels was similar for oculomotor (saccade) and manual (button press) responses and also did not vary between manipulations of foreknowledge of the to-be-used effector. Thus our data suggest evidence accumulates similarly for oculomotor (saccade) and manual (button press) responses and regardless of the foreknowledge of the response mode. These results speak against the effector specificity of evidence accumulation in the RDM task and instead support an effector-general view of evidence accumulation for decision making.
fMRI Signature of Evidence Accumulation
Our modeling results showed that manipulation of motion coherence changed the drift rate of evidence accumulation in drift diffusion models (Fig. 2). Proponents of these models have long speculated that the sequential sampling process could be mapped to differential changes in neural firing rates (Luce 1986; Townsend and Ashby 1983). Indeed during the RDM task with a saccadic response, neurons in eye movement related areas (LIP and FEF) of monkeys show a ramp-up of activity during decision formation with the ramp profile changing monotonically with motion coherence (Kim and Shadlen 1999; Roitman and Shadlen 2002; Shadlen and Newsome 2001). In addition to a coherence-dependent ramp-up, the neural activity also predicts choice and response times thus linking the neural activity to the sequential sampling process (Purcell et al. 2010). Whereas such a link between model and neuronal activity is relatively straightforward, linking model and fMRI measures is not.
One difficulty in such a linking is due to the fact that fMRI blood-oxygen-level-dependent (BOLD) signal does not reflect instantaneous firing rate; instead it integrates neural activity over time (and space). If evidence drifts to threshold and returns to baseline, then low drift rate trials should produce a larger fMRI response than high drift rate trials (Fig. 1), a pattern we observed in the current study. However, it is also possible that evidence can be maintained at the threshold level over time. For example, Shadlen and Newsome (2001) have reported such maintained activity over short durations (0.5 and 2.0 s) at the single cell level in monkey LIP. If a similar maintenance occurs in humans, then high drift rate trials could in principle produce a larger fMRI response than low drift rate trials. Indeed, Tosoni et al. (2008) found such a pattern in sensorimotor areas during a 10.5-s delay period between stimulus and response in a perceptual decision task.
Thus the precise fMRI signature of evidence accumulation might depend on how long evidence is maintained. As discussed above, opposite signs for this difference are expected at the extremes (evidence either goes back to baseline immediately or is maintained over a long time). A corollary of this situation is that if evidence is maintained for some intermediate duration, similar levels of fMRI response for high and low drift rate trials would be expected. This creates ambiguities in prediction, as it is somewhat difficult to know precisely how long accumulated evidence is maintained in practice. We believe the uncertainty regarding the duration of the maintained evidence might underlie the differences between our results and those of Tosoni et al. (2008). In our experiment, we never observed a higher fMRI response for high drift rate trials, suggesting that evidence was not maintained over the delay period in our experiment, whereas subjects in the experiment of Tosoni et al. might have maintained the accumulated evidence for a longer period of time. It should be noted that we did observe an overall higher delay period activity in the response known than the response unknown condition in saccade related areas (Table 3, main effect of foreknowledge). Thus there might be some partial maintenance of evidence in our response known condition but not strong enough to reverse the sign of the difference. Taken together our results suggest that although a delayed response affords the possibility to maintain accumulated sensory evidence, it does not necessarily always leads to such maintenance. Methodologically, the results imply that if one seeks to experimentally separate motor- and decision-related fMRI signals in sensorimotor areas, then foreknowledge of the response should be minimized.
A potential confound with the standard prediction (Fig. 1) is that a higher fMRI response on low drift rate trials might be attributed to neural processes other than evidence accumulation. For instance, one possibility is that low drift rate trials are more difficult and require more effort and hence evoke greater overall brain activity. There are several properties about our results that speak against this possibility. For instance, a pure effort/attention account would predict increased fMRI response on low coherence trials in visual cortex, which is known to be subject to top-down modulations (Kastner and Ungerleider 2000; Reynolds and Chelazzi 2004). We did not, however, observe any coherence-related effects in the visual cortex. Instead, the coherence effects were largely constrained to a set of independently defined sensorimotor areas. Importantly, these regions included the IPS areas and FEF, which are likely the human analog of monkey LIP and FEF (Grefkes and Fink 2005; Silver and Kastner 2009). Thus our results converge with neurophysiological results on evidence accumulation in nonhuman primates (Kim and Shadlen 1999; Shadlen and Newsome 2001). Nevertheless, the intrinsic temporal blur introduced by fMRI BOLD measurement does make it difficult to directly examine if an area exhibits ramp-like neural activity. Future research is necessary to further clarify the fMRI correlate of evidence accumulation.
Effector Specificity of Evidence Accumulation
We took advantage of fMRI's spatial resolution as well as its ability to monitor neural activity in a wide range of brain areas to examine the effector-specific hypothesis of evidence accumulation. To this end, we not only independently localized motor and premotor areas for each response modality but also manipulated response mode and foreknowledge of the to-be-used response mode. Across these manipulations, we found a network of brain areas showing fMRI signatures consistent with evidence accumulation, but such signatures did not depend on response mode or foreknowledge. Importantly, our effect size calculations showed that the interaction effects predicted by effector-specific hypothesis were generally much smaller than the coherence main effect (1/5th the size). Thus the absence of significant interactions was not simply due to a lack of statistical power. These results are consistent with an effector-general mechanism of evidence accumulation.
Two aspects of our data point to an effector-general mechanisms of evidence accumulation during the RDM task. First, we observed fMRI signatures for evidence accumulation in amodal areas such as anterior insula and IFS. These areas are generally thought of as nonmotor areas. Consistent with this notion, neither area showed differential activity for saccades and button presses in our localizer experiment (Fig. 6). Their participation in evidence accumulation violates a strict interpretation of effector specificity in which only sensorimotor areas accumulate evidence (Tosoni et al. 2008). Previous fMRI studies have also implicated the anterior insula and inferior frontal areas in different perceptual decision tasks, including facial expression discrimination (Pessoa and Padmala 2005; Thielscher and Pessoa 2007), auditory speech discrimination (Binder et al. 2004), and masked picture identification (Ploran et al. 2007). Furthermore, Ho et al. (2009) also identified the right anterior insular as evidence accumulator for both saccadic and reaching response in a variant of RDM task. Thus there is converging evidence that the anterior insular and inferior frontal region accumulate sensory evidence across stimulus types and response modalities.
Second, there was a lack of effector specificity in sensorimotor areas in that we only observed evidence accumulation in saccade preferring areas but none in button-press preferring areas (Table 3). This finding stands in contrast to Ho et al. (2009) who not only reported evidence accumulation in amodal areas but also different brain areas accumulating evidence only for saccades or reaches (also see Tosoni et al., 2008, although they used the opposite criterion to define evidence accumulation, see discussion above). There are at least two possible explanations for this discrepancy. First, Ho et al. (2009) conducted whole-brain analysis to search for voxels showing the predicted pattern without independently verifying their motor preference. Thus it is not clear whether those areas are indeed sensorimotor areas. Second, there were differences in the motor demands of the manual response. In our experiment, subjects made simple, two-alternative, button-press responses with either hand, whereas subjects in Ho et al., made more elaborate responses by pressing one of four buttons in a spatial array corresponding to the motion direction. It seems plausible that requiring a higher spatial precision of the response would engage more the effector-specific motor areas.
While effector-specific evidence accumulation might operate under certain conditions, our results suggest an alternative scheme in which evidence accumulation might not be effector specific but sensory specific. That is, instead of accumulating evidence for a particular motor response, the IPS and FEF areas we found could be accumulating evidence for a particular stimulus feature (motion direction in this case). These two possibilities are difficult to dissociate, especially when the motor response and the stimulus feature are coupled (e.g., look to the left when dots move to the left). Interestingly, a recent single-unit study using the RDM task decoupled such mapping and found that LIP neurons carried direction information before the monkey knew where to saccade for a response (Bennur and Gold 2011). This finding is consistent with recent evidence showing these parietal areas can carry feature information in category learning and attentional selection tasks (Freedman and Assad 2009; Liu et al. 2011). Indeed, the fronto-parietal network we found to accumulate sensory evidence contains essentially the same brain areas active during many attention tasks (Corbetta and Shulman 2002). Thus what matters more might be stimulus/feature-specificity rather than effector-specificity during evidence accumulation.
T. J. Pleskac was supported by National Science Foundation Grant 0955410.
No conflicts of interest, financial or otherwise, are declared by the author(s).
We thank Scarlett Doyle and David Zhu for help in data collection and the Department of Radiology at Michigan State University for generous support of imaging research. We also thank Dr. Susan Ravizza for helpful comments on an earlier version of the manuscript.
- Copyright © 2011 the American Physiological Society