Multiple visual tasks can be performed on the same visual input, with different tasks presumably engaging different neuronal populations. The modular layout of the visual system implies that specific cortical regions carry more information about certain stimulus attributes than others. Thus it is reasonable to assume that decisions during a task will be optimal if they are based on the responses of the most informative neuronal signals, which presumably originate in regions with the sharpest tuning for the relevant stimulus feature. Previous studies have supported this position. Here we present the results of two fMRI experiments that confirm these findings and expand on earlier investigations by addressing the effects of the physical properties of an attended stimulus on task-related modulations in human visual cortex. Specifically, we ask whether performing two-alternative forced choice speed- and color-discrimination tasks (and other attentional processes) can modulate neural activity independent of visual stimulation and whether the effect of spatial attention depends on which task is being performed. The results indicate that 1) when stimulation and spatial attention are constant, responses in V4 and MT+ depend on the task being performed and are independent of the tested physical properties of the selected stimulus, 2) this task-dependent modulation might require a stimulus—task-specific preparatory mechanisms alone are not sufficient to drive responses, and 3) independent of which task is being performed, spatial attention adds a baseline shift to responses in MT+ and V4 when a stimulus is present.
- MT+, V4
- spatial attention
multiple visual tasks can be performed on the same visual input. When viewing an object such as a tree, one can analyze its shape for its utility to climb, its color to assess the season, its motion to estimate wind speed, or the shadow it casts to determine the position of the sun. While the visual input remains the same, the different tasks presumably engage different populations of neurons in different computational strategies. How attention can engage these populations to achieve optimal task performance has been a long-standing question in visual neuroscience.
One principle of visual system organization that may aid in this process is the largely modular layout of visual cortex (Zeki and Bartels 1998). While the degree of functional modularity is open to debate, it is well known that different regions of visual cortex contain neurons that are tuned to different visual features. For example, color and motion are processed in two interacting but largely separate streams of processing (Gegenfurtner and Hawken 1996; Vaina 1994; Van Essen and Gallant 1994). In particular, there is evidence that neuronal responses in V4 and MT+ underlie at least some of the perceptual attributes of color and motion, respectively (Brouwer and Heeger 2009; Conway et al. 2007; Salzman et al. 1990; Zeki and Bartels 1999). As a result of a modular organization, a specific neuronal population will carry more information about a particular stimulus component, such as direction of motion, than will others. Consequently, during a specific visual task, such as speed or color discrimination, it is reasonable to assume that decisions will be optimal if they are based on the responses of the most informative neuronal signals, which presumably originate in the populations with the sharpest tuning for the relevant stimulus feature(s).
Two decades ago, Corbetta et al. (1990, 1991) performed the first investigations into the effects of selectively attending to different stimulus attributes on responses in human visual cortex. Estimating neuronal responses with positron emission tomography (PET), they reported different foci of activation during tasks involving discrimination of speed, color, and shape of an otherwise physically unchanging stimulus. Subsequent functional magnetic resonance imaging (fMRI) experiments expanded upon these findings. For example, Beauchamp et al. (1997) demonstrated that responses in MT+ increase when observers discriminate the speed of a moving field of dots compared with when they are discriminating color. Similarly, Chawla et al. (1999) found that when observers were cued to detect a motion or color change in an otherwise stationary monochromatic dot field, responses in MT+ and V4, respectively, increased even if no change occurred. Huk and Heeger (2000) compared responses in V1, V3A, and MT+ during speed and contrast discrimination of both moving dots and moving gratings. Regardless of the stimulus, responses in MT+ were higher during speed compared with contrast discrimination. Together, these studies suggest that the behavioral goal of the observer increases the gain in neuronal populations tuned to task-relevant information, perhaps increasing the signal-to-noise of informative neurons in order to optimize task performance. Note that these findings stand in contrast to results suggesting that attention to any feature of an object facilitates the neuronal representation of all features of that object (Desimone and Duncan 1995; O'Craven et al. 1999; Valdes-Sosa et al. 1998).
In addition to these task-related modulations, previous studies have shown that selecting a surface or object results in area-specific modulations. For example, O'Craven et al. (1997) demonstrated that when covert attention is directed to a moving random dot field fMRI responses in MT+ increase relative to when attention is directed to a spatially superimposed static random dot field. Such surface-specific effects of selective attention have also been found in single neurons in area MT of the macaque (Wannig et al. 2007). These results suggest that if an attended object or surface contains the physical properties for which a particular population of neurons is selective, the response of those neurons increases.
Just as the visual cortex is organized in a modular fashion for specific stimulus attributes, the retinotopic organization of the early visual areas means that these areas are in a sense modularly organized for spatial position. Similarly, just as performing a task modulates fMRI signals in areas associated with the attended stimulus attributes, spatial attention modulates responses within retinotopic maps associated with the attended position (Gandhi et al. 1999; Martinez et al. 1999; Somers et al. 1999). These spatial attention effects appear to be largely independent of the properties of the physical stimulus and have been typically modeled with a baseline shift in response magnitude (Buracas and Boynton 2007; Murray 2008; however, see Li et al. 2008 for evidence that inclusion of contrast-gain mechanisms accounts for a larger proportion of spatial attention effects than a baseline shift alone). Thus it appears that performing a task on a stimulus at a specific spatial location modulates cortical regions associated with both the task and the spatial location. What is not firmly established is the relation between task-specific and spatial attention mechanisms, specifically, whether or not they are independent.
Here we present the results of two fMRI experiments that independently varied the task, spatial attention, and properties of the attended stimulus in order to reveal the individual contributions of these factors to the population responses of different visual areas. We focus exclusively on two-interval forced-choice speed- and color-discrimination tasks. In experiment 1 we address how the effects of performing these tasks are related to the physical properties of the attended item. For example, does performing a color task on a moving stimulus result in task-related modulations (e.g., in V4), stimulus-related modulations (e.g., in MT+), or some combination of the two? In experiment 1, we manipulated task type and selective attention to one of two superimposed dot surfaces. The results strongly suggest that task—and not the physical properties of a selected stimulus—is the primary contributor to attention-driven modulations in V4 and MT+. In experiment 2 we ask whether task and spatial attention can modulate neural activity independent of visual stimulation and whether the effect of spatial attention depends on task. We manipulated these three factors independently in an event-related design: on a given trial, observers could perform either task on a stimulus of variable dot density and attend to the left or the right visual field. Overall, the results of experiment 2 suggest that a stimulus is required for task-specific modulation, the effects of spatial attention do not depend on stimulus strength (as long as a stimulus is present), and response modulation by spatial attention is independent of what task is being performed. Overall, the results show that task, stimulus strength, and spatial attention each have independent effects on fMRI responses within a given visual area. This lack of dependence gives us confidence that results from previous studies that manipulated only one of these factors (task, stimulus strength, or spatial attention) generalize to changes in the other factors.
Materials and methods.
Eight subjects participated (4 men, 4 women), ranging in age from 23 to 41 yr. One of the observers was author E. Runeson. All subjects gave written and informed consent to participate in protocols reviewed and approved by the human subjects Institutional Review Board at the University of Washington, had normal or corrected-to-normal vision, and were compensated $20/h. Each observer took part in the following: two 2-h sessions in the lab for practicing the experimental conditions, two 2-h psychophysics sessions in the scanner, and two scanning sessions. One of the scanning sessions consisted only of retinotopic mapping scans, and the other consisted of experimental scans and localizer scans. One observer was unable to complete the study, and data from another observer were unusable because of excessive head motion and excluded from analyses.
Two overlapping surfaces of limited-lifetime dots were presented peripherally within a circular aperture (centered at 4° above and 7° to the left or right of fixation) on a black background. The diameter of the aperture was 6°, and the overall dot density within the aperture was 2.65 dots/°2. The fields were distinguished by their direction of motion, such that one field was moving upward at an average of 2.0°/s and the other remained nearly static (a small amount of threshold horizontal motion energy was sometimes added to permit speed discrimination task, see below). The fields also differed in color, such that if one field was red the other was green (Fig. 1A). Stimuli for all experiments were created with MATLAB software (MathWorks) and presented with the Psychophysics Toolbox (Brainard 1997; Pelli 1997).
During practice in the laboratory, the stimuli were generated and displayed via a Dell Inspiron 530 desktop computer and presented on a 41-cm ViewSonic 690fB CRT monitor. During threshold measurements in the scanner and during fMRI data acquisition, the stimuli were generated with a Dell Latitude D610 laptop and back-projected onto a fiberglass screen via an Epson Powerlite 7250 projector.
Figure 1B outlines the procedure for a given trial. During lab practice, observers performed blocks of two-alternative forced-choice (2AFC) discrimination trials on either the speed or color of the moving or static surface. As either surface could be red or green and the stimuli could be presented in either the left or right visual field, a total of 16 condition blocks were necessary to include all possible combinations of attended stimulus and task (2 surfaces × 2 tasks × 2 color combinations × 2 locations). Practice was distributed over 2 days.
Within a single practice block, consisting of 64 trials, the attended surface, color combinations, and the task being performed remained constant. A trial began with a brief fixation period, followed by two 1,400-ms stimulus intervals separated by a 200-ms interstimulus interval. Between the intervals, the color and speed of both surfaces independently varied; if observers performed the wrong task or attended to the wrong stimulus, performance would be at chance (50%). After the second interval, observers responded by pressing the key corresponding either to the interval that contained the fastest-moving dots (if discriminating speed) or the interval when the color of the dots was chromatically more red or chromatically more green (if discriminating color). To permit speed discrimination of the static surface, a small amount of horizontal motion energy (a few pixels per interval) was added during one of the intervals. Color was varied for each surface by adding or subtracting RGB increments within a predetermined range, so that the perceived chromaticity of the surfaces never overlapped but remained within the perceptual domains of green and red, respectively. The magnitude of the change in the task-relevant feature of the attended surface was determined by a three-down one-up staircase and was closely matched by the change magnitudes in the unattended features. Responses were collected with the number pad on a standard keyboard. Immediately afterwards, accuracy feedback was provided by changing the color of the fixation mark (green = correct, red = wrong).
PSYCHOPHYSICAL THRESHOLD MEASUREMENTS.
In an effort to equate difficulty across conditions, we estimated psychophysical thresholds for each subject and condition. Psychophysical thresholds were measured in the scanner prior to the fMRI experiment. While lying in the bore of the scanner, observers performed the exact same experimental blocks that were performed in the lab. Responses were collected with a magnet-compatible fiber-optic key-press device.
After the scanner psychophysics sessions were completed, Weibull functions were fit to the data with the use of a maximum likelihood procedure to estimate the speed or color increment that would produce 79% correct performance during each condition. These thresholds were then implemented in the experimental fMRI session; implemented thresholds were averaged across visual fields but not across the color of the attended surface, which could be either red or green.
fMRI: EXPERIMENTAL SESSION.
Four main conditions were included. These were products of two independent variables (attended surface, task) with two levels each. Observers attended to either the moving or static field of dots and performed either a speed-discrimination or color-discrimination task by attending to the appropriate feature of the stimulus.
The experimental session consisted of eight experimental and two localizer scans. Each experimental scan lasted 5 min, 36 s and consisted of twelve 20-s condition blocks preceded by 8-s fixation intervals. During the last 3 s of each fixation interval, the condition of the upcoming block was cued by changing the color of the fixation mark (green: attend to greenish surface, color task; red: attend to reddish surface, color task; yellow border: attend moving surface, speed task; yellow fill: attend static surface, speed task). Each block was made up of five trials of the cued condition. Each trial proceeded exactly as described above, but the increment change of each feature (attended and unattended) between intervals was constant across trials and determined by the estimated thresholds. Each condition was repeated 3 times during each scan and 24 times in total (12 repetitions per hemifield). Stimuli were always presented in one side of the visual field within each scan and in alternating sides between scans. The order of conditions within and across scans was arranged so that each condition succeeded the other conditions as equal a number of times as possible.
DEFINING REGIONS OF INTEREST.
The fMRI experimental sessions also included a localizer scan that was repeated twice. These scans were designed to localize both MT+ and regions of cortex that responded preferentially to chromatic over isoluminant gray stimuli. The scan alternated between 20 s of a blank screen, 20 s of coherently moving dots, 20 s of static dots, and 20 s of static chromatic dots. The stimulus was always isoluminant and restricted to the areas of visual space stimulated in the experimental scans.
Visual area MT+ was defined as the voxels showing a greater response to moving than to isoluminant static dots at a Bonferroni-corrected threshold. Only those voxels exceeding the threshold and located near the typical anatomical location of MT+ [posterior to the intersection of the lateral occipital sulcus and the inferior temporal sulcus (Watson et al. 1993)] were considered.
Color-selective regions were defined as the voxels responding more to chromatic than to isoluminant gray dots at a Bonferroni-corrected threshold. With this contrast, the most significant voxels were consistently located on the ventral surface of the occipital cortex and corresponded to a subregion of V4, which was defined in a separate retinotopy session (see below). This localizer could only reliably define color-selective regions in four of the six observers (possibly because of factors related to vascular interference in this region of cortex; see Winawer et al. 2010).
On a separate day prior to the experimental scanning session, we defined visual areas V1, V2, V3, V4, and V3A, using standard retinotopic mapping procedures. Once defined, we restricted the regions of interest (ROIs) within each of these regions to the area of visual space stimulated during the experimental scans. In light of recent experimental evidence, we assumed a full hemifield representation in V4 (Winawer et al. 2010).
fMRI DATA ACQUISITION.
fMRI data were acquired in a Phillips 3T scanner at the Diagnostic Imaging Science Center at the University of Washington. Functional images were acquired with an echo planar sequence and an 8-channel head coil. We used a repetition time of 2 s and echo time of 30 ms. Thirty-two axial slices (64 × 64 matrix, 220-mm field of view, 0.5-mm gap) were collected per volume (voxel size: 3.5 × 3.4 × 3.4 mm). Anatomical images were acquired with a standard T1-weighted gradient echo pulse sequence. All preprocessing (anatomical-functional coregistration, conversion to standardized Talairach space, slice-scan time correction, motion correction, and linear trend removal) was performed with BrainVoyager. Subsequent analyses were carried out with custom MATLAB code.
fMRI DATA ANALYSIS.
Preprocessed experimental time courses were imported into MATLAB for analysis. In experiment 1, individual voxel time courses within ROIs were averaged together to produce a mean time course for each ROI per scan. These time courses were segmented into their constituent blocks and normalized to percent signal change from the average of the last two time points of the preceding fixation period. Normalized time points 4–10 were then averaged together for each block and across blocks of the same condition to produce one summary data point per ROI per condition.
During the experimental scanning sessions, eye position was monitored with an ASL LRO6 eye tracking system to ensure that observers maintained fixation for the duration of the scan. Eyenal (version 2.93) was used to convert the data into a text file. Custom MATLAB code was then used to analyze the data to determine whether the extent of gaze deviation was significantly different between the experimental conditions. Fixation position was monitored during the experimental scans to ensure that any potential differences in signal change between conditions could not be accounted for by differences in eye movements. Because of problems with lens refraction, these measurements were noisy in three of the six observers and could not be analyzed reliably. The data from the remaining three observers were analyzed in two different ways. First, the proportion of total frames during which eye position deviated by a criterion distance of 1° from the center of the fixation mark was computed for each observer. This proportion did not differ between the four main conditions for any of the three data sets that were analyzed. Second, we compared mean horizontal eye position for when the stimulus was presented on the right versus when it was presented on the left. Fixation position did not vary as a function of which hemifield was stimulated, and thus attended, in any of the three observers. We also looked for, and failed to find, a difference in mean horizontal eye position between the two stimulation intervals.
We monitored behavioral performance during the imaging experiment. Overall, observers' performance across conditions matched well with their estimated thresholds. Average percent correct was 81.96% and 79.70% when the moving surface was attended and speed- and color-discrimination tasks were performed, respectively. When the static surface was attended, average percent correct was 80.83% and 77.25% for speed- and color-discrimination tasks. Performance did not significantly vary as a function of task-surface combination (1-way ANOVA, P = 0.34) or as a function of surface color (1-way ANOVA, P = 0.65). Performance data for individual observers are presented in Table 1.
fMRI data were collected while observers performed speed- and color-discrimination tasks at threshold difficulty. Eight scans were collected for each observer; during four of these the stimuli were always presented in the left visual field and during the other four in the right visual field. Results did not differ as a function of visual field or color of the attended surface, so data were collapsed across hemispheres and surface color. Figure 2 shows percent signal change for all conditions in all ROIs, averaged across subjects. Error bars indicate 1 SE.
To test for any effects of attended surface and task, we conducted a two-way repeated-measures (2WRM) ANOVA on the percent signal change in each ROI (V1, V2, V3, V4, V3A, and MT+). A significant effect of task was found in MT+ (P < 0.0005, n = 6) but not in any of the other ROIs. Percent signal change from baseline was significantly higher in MT+ during speed-discrimination blocks than during color-discrimination blocks. Surprisingly, we found no modulation by attended surface in MT+ [contrary to previous findings (O'Craven et al. 1997)] or in any other ROI. Percent signal change in MT+ was similar during the “attend moving” and “attend static” conditions, in terms of both magnitude and difference between the task conditions (Fig. 3A); the difference in MT+ response between speed task and color task conditions was slightly larger when attention was directed to the static surface (not significant), which might be due to noise or a small task difference between conditions. The task during the speed task/attend moving condition is essentially a motion-detection task (during which interval was motion present?), whereas it is a discrimination task in the other three conditions. The overall task-driven response pattern, regardless of which surface was attended, was largely consistent across observers, despite some variability in overall signal change magnitude. Interestingly, MT+ showed a pattern of responses in the unstimulated hemisphere similar to that in the stimulated hemisphere (Fig. 3B): we analyzed the data from MT+ when observers were attending to a stimulus isolated to the ipsilateral visual field and found a significant effect of task (P < 0.0005, n = 6) and no effect of attended surface. The only reliable difference between the stimulated and unstimulated hemispheres was in terms of overall signal change magnitude.
We expected to find the complementary task dependence in area V4 (color discrimination > speed discrimination). Corbetta et al. (1991) found foci of activation in an area roughly corresponding to V4 when observers performed a color-discrimination task. In our experiment, all the ROIs were defined by finding the intersection between retinotopic areas and the most significant voxels with the motion > static localizer contrast. This was done to restrict the data to the areas of visual cortex corresponding to the stimulated region of the visual field. We performed a second localizer contrast (chromatic > isoluminant gray) that isolated color-selective voxels located on the ventral surface of the occipital cortex, which corresponded to a contiguous subregion of V4. We analyzed the data restricted to these explicitly color-selective voxels and found a trend toward an activation pattern similar to that of MT+, but with the complementary task dependence (color > speed; Fig. 4). This trend was nearly significant (P = 0.056, n = 4) in the stimulated hemisphere. The localizer contrast was reliable in only four of the six observers, and the analysis was restricted to those four.
The results of experiment 1 indicate that task is the primary factor modulating the responses in MT+ and V4 and that the physical properties of the attended stimulus have little or no effect. This raises the question of whether task and stimulus properties are completely independent factors or interact in some way. To address this question we designed experiment 2 to have multiple levels of stimulus density, including no stimulus. It is possible that preparing for a particular discrimination task—in the absence of any stimulus—may change responses in a similar manner as the act of actually performing the task. Chawla et al. (1999) found increases in the response of areas MT+ and V4 when a preferred stimulus feature (speed, color) was cued for discrimination relative to a nonpreferred feature; similar effects were found by Puri et al. (2009) in the fusiform face area and the parahippocampal place area for faces and places, respectively, and by Giesbrecht et al. (2006) for color and location. However, Shulman et al. (2002) did not find any differential effects between cuing for a color- or motion-discrimination task within any region of occipital cortex during a preparatory period but did find differential effects within MT+ during the subsequent discrimination period. A more recent study also failed to find any effects of preparing for a color or motion task (McMains et al. 2007). Given the ambiguous nature of previous results, we cannot determine the extent to which the task-driven modulations found in experiment 1 and in previous studies were due to anticipation effects and the extent to which they were dependent on stimulation. Experiment 2 was designed to address this possibility—using an event-related design with 12 different conditions, independently varying stimulus density, attended side, and task.
Materials and methods.
Seven observers were included in experiment 2 (4 men, 3 women), ranging in age from 24 to 31 yr. Four of the observers also participated in experiment 1, one of whom was E. Runeson. All subjects again gave written and informed consent to participate in protocols reviewed and approved by the human subjects Institutional Review Board at the University of Washington, had normal or corrected-to-normal vision, and were compensated $20/h. As in experiment 1, each observer took part in two 1-h sessions in the lab for practicing the experimental conditions, two 1-h psychophysics sessions in the scanner, and two scanning sessions. Both sessions contained a mix of experimental, retinotopic mapping, and localizer scans. Data from one of the seven observers was excluded from analyses because of unreliable ROI definitions.
Within each quadrant of the visual field, an identical number of limited-lifetime moving dots occupied a circular aperture (5° diameter, centered 7° to the left/right of fixation, 4° above/below fixation; Fig. 5A). Each dot occupied 0.25° of visual angle and was randomly replotted every 100 frames or when it exceeded the aperture bounds. The color (reddish) and speed (average: 4.5°/s) of the dots were identical within each aperture but varied between apertures. In all phases of the experiment, the dots moved upward.
The machines and monitors used in the various phases of experiment 2 were the same as those used in experiment 1.
Each observer began the experiment by participating in a 1-h practice session in the lab, intended to produce familiarity with the task and to estimate appropriate initial parameters for a staircased threshold-estimation procedure later completed in the scanner.
During the imaging phase, there were 12 main conditions. These were products of three independent variables (stimulus density: 3 levels; task: 2 levels; spatial attention: 2 levels). On a given trial, each quadrant aperture contained 0, 3, or 10 dots, while observers performed either a speed- or color-discrimination task between the two apertures in either the left or right visual field. The observers' task was to determine which of the two apertures contained dots moving at a higher velocity (speed task) or appearing more reddish (color task). The 0-dot conditions were not included during practice or threshold measurements. Figure 5B outlines the procedure for a given trial.
Practice consisted of 8 blocks of 80 trials each (1 block per condition), during which observers performed the appropriate discrimination task. Each trial began with a 500-ms cue interval separated from a single stimulus interval by 300–700 ms (randomly varied so that stimulus onset could not be predicted). The stimulus appeared on the screen for 750 ms and was followed by a 1,000-ms response window. Observers indicated which aperture on the attended side contained either the fastest-moving dots (if performing speed discrimination) or the dots with the most reddish hue (if performing color discrimination). Feedback was given, and the difficulty of the next trial was determined by a three-down one-up staircase. The intertrial interval lasted, on average, 1,250 ms. The magnitude of the between-aperture difference in the untracked dimension on the attended side was kept similar to the magnitude of the attended dimension on a trial-by-trial basis; both dimensions in the unattended hemifield were modulated in a similar fashion. The “correct” aperture on each trial was independent between hemifields and feature dimensions, such that if the observer was performing the wrong task or attending to the wrong hemifield performance would be at chance (50%).
PSYCHOPHYSICAL THRESHOLD MEASUREMENTS.
As explained for experiment 1, it was important to equate the difficulty of the various task conditions. To this end, we again estimated the difference increment on each dimension that yielded threshold accuracy. Observers completed blocks of single-condition trials while lying in the bore of the scanner and responded with a fiber-optic key-press device. The initial values for each staircase were loosely based on performance during the practice session. The procedure was otherwise identical to the lab practice. As in experiment 1, Weibull functions were fit to the data by using a maximum likelihood procedure to estimate the speed or color difference that would produce 79% correct performance during each block of trials. These threshold differences were then averaged across visual fields and implemented in the experimental fMRI session.
On a separate day before scanning, each participant spent ∼1 h in the lab practicing the fMRI version, until performance on each condition was reliably above chance and the participant felt comfortable with the procedure. The fMRI experiment differed from the psychophysical sessions in several ways. First, the task on each trial during imaging was to decide whether the two stimuli in the attended hemifield were the same or different in the attended dimension. Second, the magnitude of any color or speed differences between apertures was determined by the estimated psychophysical thresholds and remained constant across trials and scans. Third, an event-related design was used in which every trial type was interleaved during a single scan. For analysis purposes, it was also necessary to include blank trials, during which only the fixation mark remained on the screen and no attention-directing cue appeared. Finally, participants did not receive feedback after responding.
Each participant took part in two fMRI sessions over two separate days. On the first day, four event-related scans were administered, along with two spot localizer scans. The event-related scans contained multiple repetitions of the 12 unique conditions described above, along with blank trials, presented in sequences designed for efficiency in the time series deconvolution procedure described below. Each event-related scan contained 128 trials (32 blank trials, 8 repetitions of each of the 12 conditions) and followed the sequence outlined in Fig. 5B. On each trial, the participant foveated the fixation mark, followed the instructions provided by the attention cue (red left arrow: attend left, color task; red right arrow: attend right, color task; yellow left arrow: attend left, speed task; yellow right arrow: attend right, speed task), viewed the stimulus, and made a response with a magnet-compatible fiber-optic key-press device.
The second session also consisted of four event-related scans, as well as two standard retinotopic mapping scans (rotating wedge and expanding ring). Across the 2 days, the data from each participant included 64 trials of each condition.
The initial fMRI experimental session included two repetitions of the same localizer scan utilized in experiment 1. The procedures for defining ROIs were identical to those for experiment 1.
fMRI DATA ACQUISITION.
fMRI data were acquired in the same Phillips 3T scanner with the same sequence and head coil as in experiment 1. However, the acquisition parameters were slightly different. Data were acquired with a repetition time of 1 s and echo time of 22 ms. Eighteen axial slices (64 × 64 matrix, 220-mm field of view, no gap) were collected per volume (voxel size: 3.4 × 2.75 × 2.75 mm). Preprocessing steps were performed with BrainVoyager, and custom MATLAB code was used for subsequent analyses. Time series were low-pass filtered and normalized by subtracting and dividing by the mean.
fMRI DATA ANALYSIS.
For each scan separately, hemodynamic responses (HDRs) to each of the 12 conditions were estimated by deconvolving the time course by the pseudoinverse of the design matrix (Dale 1999). We chose not to prewhiten the time series prior to deconvolution because we found that the remaining temporal autocorrelations after low-pass filtering were minimal, and that the choice of method for prewhitening can have significant effects on the results. Since our design matrix was counterbalanced, we do not expect any remaining temporal autocorrelations to cause any systematic biases in our estimated responses across conditions.
The peak response of each HDR was calculated for each scan, visual area, and participant by averaging time points 5 and 6 after the onset of the attention cue. We were mainly interested in the initial part of the estimated response, because each subsequent time point would be increasingly contaminated by subsequent trials. Across conditions, the estimated HDRs were indistinguishable beyond time point 6. This procedure was carried out separately for voxels in the two hemispheres. For each participant, the peak responses were then averaged across scans to yield one summary point per ROI and condition. We averaged the summary points across participants to yield grand means. Each hemisphere was considered separately.
No eye tracking was performed during experiment 2. We intended to collect fixation data, but technical issues prevented us from doing so. However, the absence of significant fixation biases during experiment 1 alleviates, to some degree, any concerns about fixation biases playing a role in the outcome of experiment 2. The same observers whose fixation data was analyzed in experiment 1 were used in experiment 2, and the procedures were fairly similar across the two experiments.
Behavioral performance in experiment 1 was not significantly different between conditions and was always ∼80%. In experiment 2, however, this was not the case. Performance was in general higher during speed discrimination than during color discrimination (grand means: 72.8%, 62.5%). Five of six observers performed well below 80% on all four conditions. Four of six performed at least 10% better on their “best” condition than on their “worst” condition, and two of six had a disparity of at least 20%. Percent correct on the four conditions for all observers, collapsed across attended side, is represented in Table 2.
During scanning, responses were based on whether or not the stimuli within the two apertures were the same or different in the relevant dimension. However, during threshold estimation, responses were 2AFC, based on which aperture on the attended side contained either the fastest dots or the chromatically more red dots, depending on task (because thresholds are much easier to derive with 2AFC trials). This change, and the fact that no behavioral feedback was given during scanning, very likely explains why performance deviated from 80%, despite at least 1 h of practice on the scanner task in the lab shortly before scanning. Given that the goal was to control for task difficulty as a general confound, it was possible that the imaging results could have been in some way biased by the differential levels of performance across the conditions. To analyze whether or not this was likely, we computed the correlation between percent correct and BOLD percent signal change (24 data points: 4 conditions, 6 observers) and found that the latter could not be predicted by the former in any of our ROIs. Thus we have no strong reason to suspect that differences in behavioral performance produced spurious differences in BOLD responses.
fMRI RESPONSES—EFFECTS OF TASK.
In experiment 1, we found a significant main effect of task in area MT+ in the hemisphere representing the attended visual field (“attended data”) as well as in the hemisphere representing the unattended visual field (“unattended data”). When participants performed a speed-discrimination task, responses were higher than when they performed a color-discrimination task. The results from experiment 2 for MT+, averaged across participants, are shown in Fig. 6, A and B. As in experiment 1, we found that fMRI responses in area MT+ were larger during the speed-discrimination task than for the color-discrimination task but only for the 3- and 10-dot conditions (separate t-tests conducted for each stimulus density condition: 0 dots: P = 0.771; 3 dots: P = 0.015; 10 dots: P = 0.028). There was no overall main effect for task in area MT+, presumably because of the null result for the 0-dot condition (main effect for task: P = 0.07, 2WRM ANOVA). There was also no significant interaction between task and stimulus density, but, somewhat surprisingly, the main effect of task from responses to the unattended stimulus was highly significant in MT+ (P < 0.001, 2WRM ANOVA) but was also dependent on stimulation (0 dots: P = 0.596; 3 dots: P = 0.003; 10 dots: P = 0.032).
We found a trend toward significance in the color-selective voxels (subregion of V4) in experiment 1 as a function of task (color > speed; Fig. 6, C and D). In experiment 2, we once again defined voxels within V4 that were color selective on the basis that they responded more to a chromatic field of dots than to an isoluminant gray field of dots (Fig. 6, C and D). The main effect of task was significant in the attended data (P = 0.035, 2WRM ANOVA) but, as in MT+, depended on the presence of a stimulus (0 dots: P = 0.126; 3 dots: P = 0.009; 10 dots: P = 0.010). There was a trend toward significance from the unattended stimulus (P = 0.059, 2WRM ANOVA) and a significant effect during the high-density condition (0 dots: P = 0.684; 3 dots: P = 0.093; 10 dots: P = 0.006).
As in experiment 1, there were no effects of task in any of the other ROIs that we analyzed. V1, V2, and V3 were not modulated by task at any stimulation level, whether considered separately or as one homogeneous ROI.
fMRI RESPONSES—EFFECTS OF SPATIAL ATTENTION.
We calculated main effects of spatial attention by subtracting the mean percent signal change across all three stimulation conditions and the two task conditions in the hemisphere representing the unattended hemifield from the mean percent signal change in the attended hemifield. The effects of spatial attention were robust, showing significant modulations in all ROIs (V1: P = 0.013, V2: P = 0.05, V3: P = 0.006, V4: P = 0.01, MT+: P < 0.001). Percent signal change for the attended and unattended conditions is plotted in Fig. 7.
Scrutiny of Fig. 7 reveals two interesting findings. First, the effect of spatial attention appears to be well-represented by a simple baseline shift when a stimulus is present—the increase in percent signal change was the same for the 3- and 10-dot conditions. This is consistent with previous studies investigating the effect of spatial attention as a function of stimulus contrast (Buracas and Boynton 2007; Murray 2008). Second, the effect of spatial attention was considerably smaller when no stimulus was presented, indicating that spatial attention might interact with stimulation. This interpretation would not be compatible with an additive baseline shift model, but the 0-dot condition differed from the 3- and 10-dot conditions in other ways that may explain the discrepancy (see discussion).
We further subdivided the data by analyzing the effect of spatial attention for each task condition separately. Figure 8 shows the effects of spatial attention as a function of task and stimulus density for areas MT+ (Fig. 8A) and V4 (Fig. 8B). It is apparent from these plots that the effect of spatial attention is independent of what task is being performed and of stimulation level.
The results of experiment 1 replicate previous findings showing that task can modulate population-level responses in visual cortex (Beauchamp et al. 1997; Chawla et al. 1999; Corbetta et al. 1990, 1991; Huk and Heeger 2000). Specifically, the responses of populations containing a large proportion of neurons tuned to a particular feature (motion, color) increase when that feature is task relevant compared with when it is not. We found an increase in BOLD response in MT+ during speed-discrimination blocks relative to color-discrimination blocks and the complementary effect in color-selective voxels within V4. No significant modulation by task was found in areas V1, V2, V3, or V3A. Task-dependent modulation was also found in voxels representing the opposite visual hemifield (MT+), suggesting the operation of a feature-based gain mechanism that increases the responses of neurons tuned to a task-relevant feature, regardless of receptive field location (Saenz et al. 2002; Treue and Martinez-Trujillo 1999). This result corroborates that of Serences and Boynton (2007), who were able to classify the attended direction of motion presented in one visual hemifield based on responses to the other, unstimulated, hemifield. The importance of task in modulating the responses of specific neural populations was largely replicated in experiment 2, which also provided additional information about the timing of the effect. In experiment 1, tasks were performed in 20-s blocks; thus the data did not provide information about the time course of task-related modulation. It is possible that multiple trials were necessary for task-related signals to appear after switching from performing a block of one particular task to performing another. The well-documented detrimental effects of task switching on behavior support this possibility: the first trial after switching tasks almost always produces reduced performance, even with long intertrial intervals (Monsell 2003; Sohn and Carlson 2000). The event-related design of experiment 2, where tasks were interleaved, produced task-driven modulation regardless of frequent task switching, suggesting that task-related signals manifest more quickly than could be discerned by experiment 1. This result is consistent with those of Liu et al. (2003), who demonstrated rapid modulation differences in feature-selective sensory areas when attention was cued to “hold” on a currently attended preferred feature versus when attention was cued to “hold” on a nonpreferred feature. One previous study failed to find any effects of task on responses in visual cortex (Buracas et al. 2005). We speculate that the discrepancies between that study and our findings (and other studies showing similar effects) are due to methodological differences. First, Buracas et al. (2005) compared the effects of performing a speed-discrimination task to those of performing a contrast-discrimination task. If, as our results suggest, performing a task requiring information about a specific feature increases the response of neurons tuned to that feature, then performing a contrast-discrimination task may not modulate responses at all, since contrast might be thought of as a measure of stimulus intensity rather than a feature. Individual neurons display preferential tuning for stimulus attributes such as orientation, direction of motion, and spatial frequency. However, this is not the case for contrast; increasing contrast typically yields monotonically increasing neuronal responses. Second, Buracas et al. (2005) used a moving grating stimulus, containing only a single spatial frequency, whereas a moving field of dots contains a wide range of spatial frequencies (the Fourier spectrum of a point includes energy at all frequencies). Thus it is likely that our stimulus activated more neurons in general, leading to a larger signal-to-noise ratio in the population average response. Combined, these two factors could have led to the cloaking of a real effect of task in Buracas et al. (2005).
What are the possible neural mechanisms involved in instantiating the observed task-driven modulations in MT+ and V4? The most commonly proposed explanation is that a general response gain is applied to neurons selective for a task-relevant direction of motion (in the case of motion tasks). This response gain could also be accompanied by an increase in selectivity for task-relevant features. Serences and Saproo (2010) measured voxel-based tuning functions for orientation in early visual cortex as they varied the relative value of oriented gratings presented in the left and right visual fields. They found a sharpening of tuning functions in voxels tuned to one of the orientations when that stimulus became valuable (monetary reward for responding to the correct orientation). It is possible that such a mechanism contributes to the task-driven modulations observed here; as different features become “valuable,” or task relevant, neurons tuned to those features may sharpen their response profiles, allowing for increased discrimination sensitivity between directions of motion and colors (Serences et al. 2009; Shadlen et al. 1996).
Role of Stimulus Properties
O'Craven et al. (1997) found that responses in MT+ modulated as a function of surface selection. When attention was directed to a moving surface, responses were strongly increased relative to when observers attended to a static surface. However, we found no such effect in MT+ (or in any of the ROIs), even though the voxels were defined on the basis of being strongly modulated by a moving stimulus relative to a static stimulus. This discrepancy could be due to the absence of a task in the O'Craven study: observers were simply instructed to “attend” to one of the surfaces. In the absence of a controlled task, there are no grounds for ruling out the possibility that observers were simply more aroused or engaged with the stimulus when attending to the moving surface. The static stimulus in experiment 1 did contain a threshold-level amount of horizontal motion during some intervals; it could be argued that the equality of the selection conditions within a given task condition was due to motion energy being present in both surfaces. However, a pilot experiment performed in our lab (unpublished) with the same conditions and procedure as in experiment 1 (except the motion task/attend static condition) used a truly static surface (no small increments, as in experiment 1) and revealed that responses in MT+ did not depend on which surface was attended when a color task was performed—the responses were equal. These results indicate that the modulatory effect of task does not necessarily depend on the physical properties of the selected stimulus. It should be noted, however, that the stimulus used by O'Craven et al. was centered at fixation and had a lower overall density of dots than the stimuli used in our experiment 1 (0.48 dots/°2 vs. 2.65 dots/°2). These factors may have allowed subjects in their experiment to attend more selectively to one field over the other. However, we believe this explanation to be unlikely, as the high performance of our subjects indicates that they had no problem selecting the relevant stimulus.
On the basis of previous research it is unclear whether there are response increases (e.g., in MT) when a preferred stimulus (e.g., motion) is anticipated. Chawla et al. (1999), Giesbrecht et al. (2006), and Puri et al. (2009) demonstrated response increase during cue periods in areas tuned to the cued feature. However, neither Shulman et al. (2002) nor McMains et al. (2007) found any such modulations. Our experiment 2 suggests that the presence of a relevant stimulus might be necessary for task-driven modulation of population responses. The effect of task was only significant during conditions when a stimulus was presented on the screen. Although each trial was preceded by a cue indicating the task to be performed (except blank trials), cue-driven processing was not by itself sufficient to modulate responses. However, the 0-dot conditions also did not require a decision or a response. Therefore, we cannot rule out the possibility that task-specific decision-response processes play more or less of a role in modulating responses than stimulation. As the results stand, it is safe to say that some combination of stimulation, decision, and response is necessary for task-driven modulation, and not cue-driven signals related to preparing the neuronal circuitry for a particular task. Further studies are necessary to differentiate the relative importance of stimulation, decision, and response.
Spatial Attention, Task, and Stimulation
The design of experiment 2 allowed us to investigate whether the effect of spatial attention is dependent on task, stimulation density, or some combination of the two. Previous imaging studies have demonstrated that attending to a region of visual space increases the response of voxels selective for that region, independent of stimulus contrast (Buracas and Boynton 2007; Murray 2008). This is suggestive of a baseline shift in the responses of the underlying neurons tuned to the attended space, applied after any stimulus-related processing and multiplicative gain modulations (Boynton 2009, 2011). Our results are consistent with a baseline shift when a stimulus was present: responses to the attended side were larger than responses to the unattended side by the same amount regardless of whether a 3- or 10-dot stimulus was presented, in all ROIs. However, when no stimulus was presented the difference was much smaller. Interpretation of this result is complicated by differences between the 0-dot condition and the 3- and 10-dot conditions. First of all, there was no stimulus in the former case, and therefore—again—no decision and response were necessary. Consequently, there was less incentive for observers to maintain spatial attention at the cued location, instead of simply returning it to fixation in anticipation of the next cue (even though they were explicitly instructed to keep attention on the attended side until the next cue appeared). If observers were inconsistently attending, the apparent stimulus dependence would likely have been produced. Therefore, we cannot rule out that the results are indeed consistent with previous studies showing a stimulus-independent baseline shift in responses from spatial attention.
The effect of spatial attention also appears to be independent of which task is being performed. The difference in response between the attended hemisphere and the unattended hemisphere did not vary with the two tasks implemented in experiment 2, even though performance was considerably higher during speed-discrimination than color-discrimination trials (grand means: 72.8%, 62.5%; Table 2).
In this study we have measured the effects of attentional processes using only the BOLD signal and have assumed that increases in BOLD are coupled to increases in neural responses directly involved in attentional modulation (by task and spatial attention). However, a recent study that separately measured BOLD along with cerebral blood flow (CBF) found that CBF might be a more sensitive index of top-down attentional modulation than BOLD (Moradi et al. 2012). The authors found that directing attention to a visual stimulus in a peripheral location modulated the CBF response about twice as much as the BOLD response, relative to when the same stimulus was unattended. More research is necessary to understand the full relationship between attention, CBF, and neural activity, but it may be that relatively small top-down effects such as those observed here would be more robustly detectable with CBF as an index.
We can say with confidence that performing different tasks requiring different visual information systematically modulates responses across visual cortex. Specifically, our results are consistent with previous findings that a motion-related task increases responses in MT+ and a color-related task increases responses in V4. In general, it is likely that populations containing a large proportion of motion-tuned neurons are modulated by a motion task, and vice versa for color. However, motion-tuned populations (specifically MT+) do not seem to be modulated when attention selects a stimulus containing motion that is superimposed on a stimulus that is static, contrary to previous reports (O'Craven et al. 1997). Although our results suggest the possibility that stimulation might be necessary for task-driven modulation, rather than the act of task anticipation, limitations inherent in our method prevent us from making any strong conclusions regarding this point. Similarly, although spatial attention increased responses by a larger amount when a stimulus was present than when it was absent, we cannot rule out that the results are consistent with a baseline shift of attention, especially since the effect was independent of stimulation when a stimulus was presented. However, it is clear that spatial attention does not interact with what task is being performed, suggesting that the neural mechanisms involved are independent. In sum, the results of this study indicate that manipulations of stimulus density, task type, and spatial attention produce patterns of responses in MT+ and V4 that are largely independent from each other. We are not aware of any previous studies demonstrating separability of these modulations.
This work was supported by an National Science Foundation CAREER award to S. O. Murray.
No conflicts of interest, financial or otherwise, are declared by the author(s).
Author contributions: E.R., G.M.B., and S.O.M. conception and design of research; E.R. performed experiments; E.R. analyzed data; E.R., G.M.B., and S.O.M. interpreted results of experiments; E.R. prepared figures; E.R. drafted manuscript; E.R., G.M.B., and S.O.M. edited and revised manuscript; E.R., G.M.B., and S.O.M. approved final version of manuscript.
We thank the staff at the Diagnostic Imaging Science Center at the University of Washington for their help in developing and implementing fMRI acquisition protocols.
- Copyright © 2013 the American Physiological Society