How is working memory for different visual categories supported in the brain? Do the same principles of cortical specialization that govern the initial processing and encoding of visual stimuli also apply to their short-term maintenance? We investigated these questions with a delayed discrimination paradigm for faces, bodies, flowers, and scenes and applied both univariate and multivariate analyses to functional magnetic resonance imaging (fMRI) data. Activity during encoding followed the well-known specialization in posterior areas. During the delay interval, activity shifted to frontal and parietal regions but was not specialized for category. Conversely, activity in visual areas returned to baseline during that interval but showed some evidence of category specialization on multivariate pattern analysis (MVPA). We conclude that principles of cortical activation differ between encoding and maintenance of visual material. Whereas perceptual processes rely on specialized regions in occipitotemporal cortex, maintenance involves the activation of a frontoparietal network that seems to require little specialization at the category level. We also confirm previous findings that MVPA can extract information from fMRI signals in the absence of suprathreshold activation and that such signals from visual areas can reflect the material stored in memory.
- functional magnetic resonance imaging
- prefrontal cortex
- parahippocampal gyrus
whether encoding and maintenance of visual information are subserved by functionally homologous brain systems or governed by qualitatively different principles is one of the enduring questions of working memory research. Three main scenarios, which are not mutually exclusive, are conceivable. First, the same areas that have been shown to engage in the encoding of stimuli may also support their maintenance, with category-specific occipitotemporal areas active throughout the delay period. Second, a similar specialization for visual categories may exist in prefrontal cortex (PFC), a brain region that consistently shows sustained activity during visual working memory (VWM) delays (Courtney et al. 1998a; Curtis and D'Esposito 2003; Munk et al. 2002) and modulation by VWM load (Linden et al. 2003; Mayer et al. 2007). Third, maintenance may be mainly supported by areas outside the visual cortex, for example, PFC, but these do not show a similar category-specialization to the visual areas. We used a VWM task with different categories of objects to decide which model gives the most realistic picture of the relationship between encoding and maintenance processes in the human brain.
Functional imaging studies of higher vision have produced evidence of a remarkable degree of specialization of occipitotemporal cortex for visual categories. Broad lateral and ventral occipitotemporal areas (lateral occipital complex, or LOC; Malach et al. 1995) respond significantly more to intact relative to scrambled objects. Brain areas that respond highly selectively to particular stimulus classes include the fusiform face area (FFA) for faces (Kanwisher et al. 1997; Puce et al. 1996), the parahippocampal place area (PPA) for scenes or buildings (Aguirre et al. 1998; Epstein and Kanwisher 1998), and the extrastriate (EBA; Downing et al. 2001) and fusiform body areas (FBA) for bodies (Peelen and Downing 2005; Schwarzlose et al. 2005). Activity in these specialized areas seems to be tightly related to the analysis of exemplars of the respective categories (Epstein et al. 2005; Grill-Spector et al. 2004; Urgesi et al. 2004), although activity in the remainder of occipitotemporal cortex, after exclusion of the relevant specialized area, still shows category-selective patterns (Haxby et al. 2001; but see Spiridon and Kanwisher 2002).
There is conflicting evidence as to whether activity in these occipitotemporal areas is sustained throughout VWM delays. Some functional imaging studies found sustained activity in inferior temporal areas during the delay period (Ranganath and D'Esposito 2005; Sala et al. 2003), as would have been predicted from single-unit recordings (Miller et al. 1993; Nakamura et al. 1995). Conversely, our own study of VWM load effects showed that these were present in inferior temporal cortex during encoding and early delay but faded away with a long delay period of 12 s, with overall activity receding below baseline (Linden et al. 2003).
Such sustained activity during working memory (WM) retention periods does not have to reflect maintenance of information but could be related to stimulus expectation or response preparation. Stimulus expectation may start early in the delay and is difficult to rule out by experimental design, but the effect of response preparation can be minimized by using a long delay, as in the present study. Electroencephalographic research using the contingent negative variation has suggested that response preparation, rather than occurring across the whole delay, builds up toward the presentation of the expected stimulus (Klein et al. 1996, 1998). However, WM maintenance is a complex process, involving elements of mental imagery and long-term memory retrieval in addition to storage and rehearsal (Jackson and Raymond 2008), and all of these processes can contribute to sustained activity.
For PFC, a large body of evidence from electrophysiological recording, lesion, and anatomical connectivity studies in nonhuman primates points toward a differentiation along the lines of the separation of the posterior visual areas into a dorsal (“where”) and ventral (“what”) pathway (Goldman-Rakic 1987). Human functional imaging studies have indeed revealed specialization of dorsal areas in prefrontal and premotor cortex for spatial WM (Courtney et al. 1998b; Jackson et al. 2011) and ventral areas for VWM for faces (Courtney et al. 1997), objects (Munk et al. 2002), or color (Elliott and Dolan 1998; Mohr et al. 2006). How the differentiation of posterior areas for visual object categories would map onto this dorsoventral segregation of frontal activity is not straightforward to predict. One might assume that all of the category-sensitive areas are part of the ventral stream, which is concerned with the extraction of detail from complex visual stimuli (Goodale and Westwood 2004). Yet, Sala et al. (2003) found activity for houses in dorsal frontal areas, suggesting that the spatial component might be crucial to the processing of this particular category.
For the present study, we selected three categories with a stable representation in posterior cortex (faces, bodies, and scenes) and one further object category (flowers) for a delayed discrimination task. The task paralleled classic mapping tasks in that, always, only stimuli from one category were presented within a trial. We furthermore applied both univariate and multivariate analyses. Univariate analysis reveals whether activity is higher during a task phase than during baseline, or higher in one condition than in another, and has been the classic way of identifying sustained and specialized brain activity. This approach has recently been complemented with multivariate analyses of WM delay data (Harrison and Tong 2009), with the aim of identifying the correlates of the stored information with higher sensitivity. We hypothesized that some brain areas, for example, frontal cortex, would show sustained activity during the delay but no category selectivity in either univariate or multivariate analysis, which could be interpreted as reflecting general executive WM functions. Conversely, other areas, for example, posterior cortex, would show category selectivity throughout the delay, possibly reflecting the storage of visual representations.
MATERIALS AND METHODS
Eighteen healthy (9 male, 9 female, mean age 27.5 yr, range 20–41 yr) right-handed volunteers with normal or corrected-to-normal vision were recruited from staff and students of the University of Wales Bangor. All participants gave informed consent, and experimental procedures were approved by the ethical board of the School of Psychology.
In each trial of the paradigm, participants had to memorize three sequentially presented grayscale exemplars of one of the four categories (faces, bodies, scenes, or flowers). Sample stimuli were presented for 1 s each, separated by 1 s of blank screen. After a 10-s delay, one exemplar from the same category was presented as test stimulus, and participants had to decide by button press whether it matched one of the sample stimuli (50% matches). Trials were separated by an intertrial interval of 9 s (Fig. 1). Eighty trials (20 per category) were presented in four runs in pseudorandomized order. Participants continuously heard task-irrelevant spoken text from a radio news program during the entire scanning runs to minimize verbal rehearsal. No task was associated with the presentation of these stimuli. Such presentation of unattended speech has been shown to disrupt verbal rehearsal, although not completely abolish it, in WM studies but to leave other cognitive operations largely intact (Hanley 1997; Hanley and Bakopoulou 2003; Salame and Baddeley, 1982). We further aimed to minimize verbal recoding by using a large array (40 in each category) of unfamiliar stimuli. Stimulus presentation was controlled by a personal computer running the Presentation 7.1 software (Neurobehavioral Systems, Albany, CA). Images were backprojected on the center of a screen, subtending 5° of visual angle, and viewed by participants through an angled mirror mounted on the head coil.
Functional Imaging and Analysis
Functional magnetic resonance imaging (fMRI) data were acquired with a Philips Gyroscan Intera 1.5-Tesla system using a gradient echo echo-planar imaging sequence [repetition time (TR) = 1,000 ms; echo time (TE) = 50 ms; flip angle (FA) = 90°, acquisition matrix = 96 × 96; in-plane resolution = 2.5 × 2.5 mm2, 10 axial slices with 7-mm slice thickness, covering the occipital and inferior temporal cortex and most of frontal and parietal cortex (except the vertex and orbitofrontal cortex)]. A high-resolution T1-weighted three-dimensional (3-D) anatomical MR data set was used for coregistration (TR/TE = 11.5/2.95 ms; FA = 8°; coronal slice thickness = 1.3 mm; acquisition matrix = 256 × 256; in-plane resolution = 1 × 1 mm2).
Data were preprocessed with BrainVoyager 4.9 and further analyzed with BrainVoyager QX (Brain Innovation, Maastricht, The Netherlands) for the univariate analysis and with AFNI (Cox 1996) and custom Matlab routines for the multivariate analyses. Data preprocessing for the univariate analysis included slice scan time correction, 3-D motion correction, spatial smoothing with an 8-mm Gaussian kernel (full width at half-maximum), temporal high-pass filtering to remove low-frequency nonlinear drifts of three or fewer cycles per time course, and linear trend removal. Talairach transformation was performed for the complete set of functional data of each participant. For the multivariate analysis, the step of spatial filtering was dropped.
The statistical analysis of the variance of the blood oxygen level-dependent (BOLD) signal was based on the application of multiple regression analysis to time series of task-related functional activation (Friston et al. 1995), modeling the phases of the experiment with shifted predictors (Zarahn et al. 1997). We designed the general linear model (GLM) of the experiment considering the four stimulus categories (faces, bodies, scenes, and flowers) and four task phases (encoding, early delay, late delay, and retrieval, with each being assigned 5 s of the trial duration) as effects of interest. The corresponding predictors, obtained by shifting an ideal box-car response of 5-s duration by 4 s to account for the hemodynamic delay, were used to build the design matrix of the experiment. Only correct trials were considered. We chose the box-car model rather than a temporally smoothed hemodynamic response function to minimize overlap of predictors between task phases (Sack et al. 2002; and see correlation analysis below). However, to exclude the possibility that any null results may have been a product of this particular choice of hemodynamic model, we confirmed the absence of significant differences across categories in prefrontal areas during late delay with a model that used the smoothed response function obtained by Boynton et al. (1996). In the first-level analysis, the GLM was used to find a least mean squares solution for the beta weight of the predictors, which were then entered into a second-level random effects analysis.
We performed a 4 × 4 (phase × condition) analysis of variance (ANOVA). Multisubject statistical maps were thresholded at an initial uncorrected P < 0.001. We then applied cluster-level correction (Forman et al. 1995). In this procedure, 1,000 statistical maps are randomly generated from a normal distribution, spatially smoothed to match the smoothness of the original data set, and thresholded. After each iteration, the surviving voxel clusters are tabulated, which ultimately results in a distribution of cluster sizes under the null hypothesis of no effect for the statistical test of interest. For each cluster in the original data, its P value was set at the number of iterations that yielded a cluster at least as large, divided by the number of iterations. Statistical results were visualized through projecting 3-D maps on axial slices of the volume obtained by taking the mean anatomical volume across participants.
To quantify the potential carryover effects from encoding-related activity onto the delay 1 predictor, we computed (using SPSS version 15; IBM, Armonk, NY) correlations between an idealized response during the encoding phase only, convolved with the standard hemodynamic reference function, and the predictors for the different phases (which were shifted by 4 s). Correlations were high, as expected, for the encoding predictor (r = 0.745, P < 0.01) and for the delay 1 predictor (r = 0.487, P < 0.01), but close to 0 for the delay 2 predictor (r = −0.008, P = 0.831). This shows that activity captured by the delay 2 predictor was not confounded any more by encoding activity.
Each of the 72 runs (4 runs × 18 participants) of preprocessed data (see above) was converted to percent signal change by dividing the signal by 1% of the mean signal over the run. Responses (beta estimates) to each of the four categories times four phases were estimated using predictors similar to those in the univariate analysis (5-s box-car functions delayed by 4 s; see above) but were estimated separately for the data in runs 1 and 4 and in runs 2 and 3 to conduct split-half analysis (see below). Whole brain information maps were constructed for each participant by using a spherical “searchlight” mask (Kriegeskorte et al. 2006) as follows. Within each individual subject's brain, a voxel was chosen as the center of a sphere. The 100 voxels nearest to this center voxel (including the center voxel itself) were selected for subsequent analysis, excluding voxels from outside the brain. We used a fixed number of voxels, rather than a fixed radius, because variations in the number of voxels in the searchlight (e.g., around the edges of the brain) may cause different discrimination across conditions (Cox and Savoy 2003; Oosterhof et al. 2010).
For each of the four time phases separately, we computed an information score as follows. The beta estimates from the two split halves were correlated within and across conditions, yielding a 4 × 4 similarity matrix, where the (i, j)-th element is the correlation across split halves between category i and j (in the range [1..4], corresponding to bodies, faces, scenes, and tools). The correlations in the similarity matrix were Fisher-transformed to make the data more normally distributed. An information score was derived from the resulting values by computing a weighted mean, where on-diagonal elements were weighted by 1 and off-diagonal elements were weighted by −⅓. This information score can be interpreted as a contrast of within vs. between pattern similarity: a value of zero means that the pattern of selected voxels shows no category-specific information, whereas positive values indicate the presence of category-specific information. The information score was assigned to the center voxel. This procedure was repeated for every voxel in the brain (i.e., each voxel is taken as the center of a sphere, and an information score was computed using the voxels surrounding the center voxel) and for each of the four time phases, resulting in four information maps.
For group analysis, information scores for each voxel were tested against zero across participants with a t-test. To correct for multiple comparisons, residual accuracy maps were computed for each individual subject by subtracting the group average map from the individual map. The smoothness of these residual accuracy maps in x, y, and z directions was computed using AFNI's 3dFHWMx program (Cox 1996). The resulting estimated kernel widths in x, y, and z direction were averaged between subjects and then used for a Monte Carlo simulation (10,000 iterations) using AFNI's 3dClustSim program to assess significance of cluster sizes at an uncorrected threshold of P = 0.05.
The analysis above requires that the uncorrected threshold is specified a priori and depends on accurate estimates of smoothness. To assess whether these factors might bias the Monte Carlo simulation with respect to the estimated significance of weak but spatially extended clusters, we conducted a separate correction for multiple comparisons using threshold-free cluster enhancement (TFCE; Smith and Nichols 2009) and a bootstrap procedure. Information scores for each voxel were tested against zero across participants using a t-test (as above), and for each voxel the TFCE output was computed using Eq. 1 in Smith and Nichols (2009) with the recommended values of h0 = 0, E = 0.5, H = 2, and dh = 0.1. To obtain a distribution of TFCE outputs under the null hypothesis, in a single iteration of the bootstrap, information maps were sampled from the participants with replacement (18 samples, equal to n = 18 participants), and for any sample (information map), the information scores of all voxels were multiplied with either −1 or 1 (with equal probability), which is allowed under the null hypothesis of an information score of zero. The resulting samples were subjected to a voxelwise t-test against zero, TFCE outputs were computed for each voxel, and the maximum TFCE output was computed across voxels. This procedure was repeated 1,000 times, yielding a null distribution of TFCE maximum outputs. The significance level of each voxel was computed by dividing the number of times the maximum TFCE outputs in the null distribution was greater than that voxel's TFCE output by the number of iterations (1,000). This procedure was repeated for each of the four time phases.
For qualitative visualization, a 30-frame unthresholded animation was created to show change in information scores over time. The analysis was similar to the four-time-phase analysis described above, but now 30 time phases were considered (instead of 4), spaced 1 s (1 TR) apart. To improve signal to noise, data for each frame were temporally smoothed by taking the average of the previous, current, and next frame. As in the four-time-phase analysis, information scores were computed for each voxel, frame, and participant, and group analysis was conducted for each frame separately. The resulting group maps for each frame were then exported as bitmaps and concatenated into a movie picture of 30 frames.
Behavioral data were available for 17 of the 18 participants because of one case of technical failure. The number of correct responses was high (above 85%) for all categories. Mean response accuracy differed across categories (F3,48 = 6.9, P = 0.0008; Huynh-Feldt å = 0.95). Whereas response accuracy was similar for bodies (18.5 of 20 ± 1.2) and scenes (18.6 ± 1.4; F < 1), faces (16.9 ± 2.0; F1,16 = 9.1, P = 0.009) and flowers (17.4 ± 1.7; F1,16 = 6.6, P = 0.02) were recognized worse than scenes. Reaction times (RT) to probe stimuli differed significantly across conditions (F3,48 = 3.82, P = 0.02; Huynh-Feldt ε = 1.03). Whereas flower (998 ± 305 ms; F1,16 = 2.8, P = 0.1) and body probes (1,000 ± 316 ms; F1,16 = 3.2, P = 0.09) were responded to with similar RT as scenes (951 ± 260 ms), responses to faces (1,053 ± 352 ms; F1,16 = 8.9, P = 0.009) were significantly faster than responses to scenes. Comparison of participants with face-specific activation in inferior frontal sulcus during encoding with those who did not show this effect yielded neither a significant main effect on performance nor a significant interaction with stimulus category.
Univariate group analysis.
Several areas in posterior parts of the brain showed an interaction between task phase and category (Table 1, Fig. 2). For the right lateral extrastriate visual cortex (EVC), which comprised the LOC and the EBA, the interaction was driven by higher activation in the body condition during encoding and delay 1 (some of the effects of delay 1 can be attributed to carryover effects from the encoding phase because of the properties of the hemodynamic response measured with fMRI; see materials and methods). For the left EVC, the interaction was produced by effects for scenes and bodies during encoding and for scenes, bodies, and flowers during delay 1. For all other areas (left and right PPA, left and right cuneus, and right lingual gyrus), effects were driven by higher activation for scenes compared with all other conditions, particularly during encoding and delay 1 but also during retrieval (for details see Table 2). In some areas (left and right cuneus, left PPA, right lingual gyrus), significant differences were even observed during the delay, although activation had returned to baseline or below for all conditions (see Fig. 2). These were driven by higher or lower signal in the scene condition. Again, these may be carryover effects from previous task phases, because the earlier higher activation could produce slower return to baseline as well as undershoot.
All these higher visual areas also showed a main effect for category, driven by the strong effect for scenes (or in the case of right lateral EVC, by bodies, and for left lateral EVC, by bodies and scenes). In addition, main effects of category were found in the posterior cingulate (driven by less deactivation in the scenes condition) and along the left postcentral sulcus (higher activation for flowers than for the other conditions) (Fig. 3). We did not find significant effects of category or interactions between category and task phase in the PFC.
Because we were interested in the neural correlates of the maintenance of visual information, we furthermore specifically assessed areas that showed sustained activity during the delay phase, as evidenced by a significant difference from baseline during delay 2 (Table 2, Fig. 4). These areas were in the ventrolateral (VLPFC) and dorsolateral prefrontal cortex (DLPFC) and medial frontal gyrus bilaterally and in the left intraparietal sulcus, and none of them showed a significant difference between any of the categories.
Univariate individual analysis.
In the light of recent reports of face selectivity in right PFC (Downing et al. 2006; Ishai et al. 2005), we contrasted faces against the other categories for each individual participant (P < 0.01 FDR). Twelve participants did indeed show face-selective activation in frontal cortex during encoding, which was always in the right hemisphere, and along the inferior frontal sulcus (IFS) [Talairach coordinates (SD) x = 44 (6.2)/y = 15 (5.7)/z = 32 (5.8)]. To test whether these individually defined regions might be face selective during delay as well, we computed event-related averages of their time courses (Fig. 5). As expected from the way the individual areas were selected, activation in the encoding phase was significantly higher for faces than baseline (within-sample t-test, 1-tailed, P < 0.05 corrected for multiple comparisons). However, delay activity in these face-selective IFS clusters was not significantly different from baseline and not different between faces and the other categories. Furthermore, face selectivity was not evident during the retrieval phase.
Qualitatively, the animation of category information (Supplementary Animation 1), suggested that most information is represented in high-level visual areas during the encoding phase. (Supplemental data for this article is available online at the Journal of Neurophysiology website.) Less pattern information that discriminates among the four stimulus categories (i.e., greater within- than between-category correlations) seems present during the late delay phase in these areas, although the parahippocampal gyrus and other posterior regions continued to show information signals.
Quantitatively, the category information map for the encoding phase showed prominent clusters in high-level visual cortex and in and around the hippocampus (Fig. 6, Table 3), and these clusters were less than one-half the size during the late delay phase. The TFCE analysis, which does not require that the voxelwise threshold is specified a priori, showed very similar results (Fig. 7).
The prefrontal and parietal areas with sustained activity throughout the late delay did not show significant category selectivity. However, both the univariate and the multivariate analyses brought out category-specific signals in posterior parts of the brain, mainly in higher visual areas. Such category-specific effects were mainly seen in areas that showed category-selective activation already during encoding, for example, the PPA to scenes or the lateral EVC to bodies, and where activation levels returned below baseline during the late delay.
What are possible reasons for the lack of category selectivity of prefrontal activation? One possibility is that fMRI is just not sensitive enough to pick up the category-selective activity in frontal cortex. It has been argued that specialization exists at a finer scale than typically tested by cluster-level analyses of BOLD activity (Nieder 2004). This was the reason why we also analyzed our data with the more sensitive multivariate pattern analysis, which can detect stimulus or category selectivity that is not observable at single-voxel level (Haxby et al. 2001; Haynes and Rees 2005; Kamitani and Tong 2005; Peelen et al. 2006). Although this did not bring out category-selective signals in the frontal cortex during the late delay period either, it was at least intriguing to observe trends for such signals during the late delay in frontal areas in the unthresholded information maps (see Supplementary Animation 1 and Fig. 7A). The spatial variability across individuals, which probably affects frontal more than posterior cortex, may also have contributed to the difficulties finding category selectivity in frontal cortex. Finally, specialized frontal neurons might not be spatially clustered but integrate their activity through temporal coding (Fries 2005). In such cases this specialization may not be detectable by fMRI at all, especially when, rather than boosting a population code, the dynamic pattern itself carries the information (Crowe et al. 2010). Meyers et al. (2008) found that patterns of inferior temporal (IT) and PFC neural activity that carried task-relevant information in the visual categorization data from Freedman et al. (2001) were only stable across a few hundred milliseconds. In humans, further investigation of this issue would thus require techniques with higher temporal resolution such as electro- or magnetoencephalography.
Alternatively, frontal neurons may just not be category selective in the same way as occipitotemporal neurons. Electrophysiological recording data from nonhuman primates do suggest that neurons in ventral premotor/prefrontal cortex are selective to face stimuli (O Scalaidhe et al. 1997, 1999). In keeping with this, previous fMRI studies (Chan and Downing 2011; Downing et al. 2006; Ishai et al. 2005) and the present study found some specialization for faces along the right IFS. This may be an example of a more global mechanism by which prefrontal neurons become specialized for task-relevant categories (Freedmann and Miller 2008). Demonstration of this type of category selectivity would require a categorization task rather than the present approach, which was based on the procedures used for mapping of higher visual areas. Prefrontal specialization in humans may then become apparent when participants have to judge whether a stimulus matches a particular category, as for the monkeys in Freedman et al. (2001), rather than whether it matches another stimulus. Moreover, whether PFC neurons encode individual stimuli would need to be tested with a multivariate design that discriminates the exemplars stored in WM rather than the categories, as done (for gratings) by Harrison and Tong (2009).
Although the specialization of the right IFS was most apparent during encoding, some face selectivity may have been preserved during retention, although contrasts between categories did not reach significance during delay 2. This area was similar to posterior areas in that activity dropped back to or below baseline during the retention interval (Fig. 5). One explanation for the dissociation between areas with category-selective encoding activation (in posterior cortex and in some participants in IFS) and sustained (but not selective) delay activation in PFC may be that, for WM, information is transferred into supracategory representations. These could be pictorial, symbolic, or verbal. Although the verbal suppression should have minimized the last option, some of the sustained activity during delay was observed in left inferior frontal gyrus (VLPFC), the classic area for verbal rehearsal.
It is also worth considering that some components of frontal activation would in any event be predicted to be domain general. According to Fuster (1989), the PFC establishes a contingency between the stimuli of a delay task by employing three related functions: provisional memory to maintain the encoded stimulus, interference control to protect this mnemonic representation during the delay against potentially distracting stimuli, and anticipatory set to prepare for the impeding response on the basis of the stored representation. Predictions of category selectivity would only apply to the first of these components.
The absence of a significant contrast between faces and scenes during delay in the present study is at odds with the findings of experiment 3 of Sala et al. (2003). These authors reported higher delay activity in inferior frontal gyrus/insula for faces and in superior frontal gyrus/frontal eye field for houses. The higher reliance of VWM for houses on dorsal frontal activity was interpreted as reflecting the higher need for the computation of spatial relations. One reason for the absence of place-selective dorsal frontal activation in the present study might be the slightly different nature of our stimuli. We used scenes rather than houses. Scenes drive PPA at least as effectively as houses but might change the responses in frontal cortex. However, this explanation is not very likely, because scenes, if anything, should make an even higher demand on spatial processing because the scene stimuli were more visually diverse than a set of standardized houses. Another reason for our failing to replicate the Sala et al. (2003) study might reside in the different sample sizes and approaches to analysis. We performed a whole brain random effects analysis on 18 participants, whereas their study was based on a region-of-interest fixed effects analysis of 4 participants, 3 of whom showed the contrast. At the level of individual analysis, the difference between the two studies is less pronounced because we, too, found face selectivity in ventral frontal cortex in the majority of participants.
This study departs from at least part of the VWM imaging literature in another respect, as well. Activity in posterior areas peaked with a hemodynamic delay after stimulus presentation but then returned to baseline or below (Fig. 2). There was thus no sustained activity in higher visual areas (Postle et al. 2003). However, we did find category effects in the delay activity of posterior cortex, as reported by Ranganath et al. (2004b). A common problem with delay activity in areas that show a strong response to stimulus presentation is that even with relatively long delays, some of this activity might still represent the descending flank of the initial BOLD response. This problem was overcome in the study by Ranganath et al. (2004a), where sustained activity was demonstrated in FFA and PPA in a delayed paired associate task. In this study, presentation of a house cued subjects to recall a particular face that they had learned earlier and match it to a test face, and vice versa. House cues evoked less activity than face cues in FFA during stimulus presentation but more during delay. Thus delay activity was associated with the category of the cued stimulus rather than that of the cue. However, such delay activity in higher visual areas might reflect retrieval from long-term memory (Polyn et al. 2005) or mental imagery (O'Craven and Kanwisher 2000) and is thus not conclusive as to the question of WM delay activity.
It is somewhat puzzling that activity in higher visual areas is not consistently detected in delay periods of VWM fMRI studies, considering the reports of such activity in monkey electrophysiology studies. Possible reasons include the number of active neurons, which might not be sufficient to evoke or sustain the BOLD activity throughout the delay period. For example, in the analysis of Meyers et al. (2008), information about stimulus category could be extracted from 16 neurons in PFC or IT during any time bin. The transient nature of the contribution of individual neurons to this code would make it even harder to detect these signals after the spatial and temporal filtering that is intrinsic to BOLD fMRI. We also acknowledge that the present experiment was not primarily designed to detect sustained activity in higher visual areas (but to detect category specificity at all stages of the task). Designs with intervening distractors and variable memory delay may be more sensitive to activity in sensory areas during the retention interval.
Our finding of category-selective fMRI signals in the absence of activation differences from baseline extends previous work demonstrating that fMRI activation that was undistinguishable from baseline during a memory delay could still distinguish between to-be-remembered features (color vs. orientation: Serences et al. 2009; different orientation of gratings: Ester et al. 2009; Harrison and Tong 2009). Transcranial magnetic stimulation has shown that activity in category-selective higher visual areas is functionally relevant and specific for discrimination of exemplars of that particular category (Pitcher et al. 2009), and similar protocols with transient interference during delay may help further clarify the contribution of posterior cortex to the maintenance of visual information.
Dual task designs along the lines of those performed for visual vs. spatial working memory (Mohr and Linden 2005) and houses vs. locations (Sala et al. 2003) may also produce behavioral evidence for parallel processing of object categories in VWM. It might then be possible to use more refined fMRI approaches, including adaptation designs and functional connectivity, to detect the neural signatures of such specialization. At present, however, we would have to conclude that there is no convincing evidence for category-selective specialization of frontal WM activity below the spatial/visual distinction (Mohr et al. 2006).
Our findings support a neural model of WM in which category-general prefrontal activation and stored representations in higher visual areas are combined to ensure rehearsal, protection, and subsequent retrieval of the relevant information. They also corroborate previous evidence for the relevance of multivariate activation patterns that may not translate into changes from baseline of the summed activity across voxels to the representation of information in memory (Harrison and Tong 2009; Polyn et al. 2005). We thus propose a combination of the first and third scenarios outlined in the Introduction, with contributions from both category-general prefrontal areas and category-selective higher visual areas. The different functional contributions of the prefrontal, parietal, and higher visual areas to the component processes of WM will have to be investigated in future studies of functional connectivity and local interference effects (transcranial magnetic stimulation) during the phases of WM.
This work was supported by The Wellcome Trust Grant 077185/Z/05/Z.
No conflicts of interest, financial or otherwise, are declared by the author(s).
Author contributions: D.E.J.L., C.K., and P.E.D. conception and design of research; D.E.J.L. performed experiments; D.E.J.L., N.N.O., C.K., and P.E.D. analyzed data; D.E.J.L., N.N.O., C.K., and P.E.D. interpreted results of experiments; D.E.J.L. and N.N.O. prepared figures; D.E.J.L. and P.E.D. drafted manuscript; D.E.J.L., N.N.O., C.K., and P.E.D. edited and revised manuscript; D.E.J.L., N.N.O., C.K., and P.E.D. approved final version of manuscript.
We thank Anthony Bedson for expert radiography support.
- Copyright © 2012 the American Physiological Society