The distributed model of face processing proposes an anatomical dissociation between brain regions that encode invariant aspects of faces, such as identity, and those that encode changeable aspects of faces, such as expression. We tested for a neuroanatomical dissociation for identity and expression in face perception using a functional MRI (fMRI) adaptation paradigm. Repeating identity across face pairs led to reduced fMRI signal in fusiform cortex and posterior superior temporal sulcus (STS), whereas repeating emotional expression across pairs led to reduced signal in a more anterior region of STS. These results provide neuroanatomical evidence for the distributed model of face processing and highlight a dissociation within right STS between a caudal segment coding identity and a more rostral region coding emotional expression.
Models of face perception propose a dissociation between the representation of identity and other aspects of human faces, for example, emotional expression (Bruce and Young 1986). The “distributed model” (Haxby et al. 2000) posits an anatomical basis for the stages of face perception hypothesized by Bruce and Young (1986). In this model, fusiform cortex—a region known to be activated during simple face perception (Kanwisher et al. 1997; McCarthy et al. 1999; Puce et al. 1995; Sergent et al. 1992)—represents the identity of a perceived face, whereas superior temporal sulcus (STS) represents “changeable aspects” of the face, such as eye gaze and facial expression. Single unit recordings in monkeys (Hasselmo et al. 1989) and studies of human patients with discrete brain lesions (Adolphs et al. 1996; Young et al. 1993) support this model.
Functional imaging studies comparing familiar versus unfamiliar faces also support a role for fusiform cortex in representing facial identity (George et al. 1999; Henson et al. 2000, 2003). However, differences between familiar and unfamiliar faces, such as increased attention to familiar faces (Wojciulik et al. 1998), may confound these findings. Support for dissociable roles of fusiform and STS comes from studies that directed attention to different aspects of face stimuli (Hoffman and Haxby 2000; Narumoto et al. 2001; Sergent et al. 1992, 1994; Winston et al. 2003a). Nevertheless, it is difficult to draw definitive conclusions about the nature of the representations in these regions using such task manipulations, because differences in activation might reflect processing involved in directing visual attention to the specific aspects of the face required by each task rather than representations of those face components themselves.
Functional imaging studies of emotional facial expression have reported data that seem inconsistent with the distributed model, showing enhanced fusiform activity to emotional compared with neutral faces (Breiter et al. 1996; Dolan et al. 1996; Morris et al. 1998; Pessoa et al. 2002; Surguladze et al. 2003; Vuilleumier et al. 2001; Winston et al. 2003a, b). If fusiform cortex is specialized for identity, it is unclear why it should show this enhanced response to emotional faces. This effect has been attributed to modulatory effects from amygdala, reflecting enhanced attentional processing associated with emotive stimuli (Dolan 2002), but direct evidence for this proposal is sparse (for exceptions, see Morris et al. 1998 and Pessoa et al. 2002).
Functional MRI-adaptation (fMRI-A) is a technique used to infer regional specialization with greater specificity than the subtractive methodology used in the preceding imaging studies. The logic of fMRI-A, outlined previously (Grill-Spector and Malach 2001; Henson 2003; Naccache and Dehaene 2001), can be summarized as follows: if a region contains subpopulations of neurons excited by distinct aspects of stimuli, when two stimuli are shown sequentially in which one of these aspects is repeated, the firing neurons will habituate, and decreased fMRI signal from that region will be seen compared with when that aspect is not repeated. We report here the use of an fMRI adaptation paradigm to test the hypothesis, derived from the distributed model (Haxby et al. 2000), that fusiform cortex would show adaptation when identity was repeated relative to when it changed, and STS would show adaptation when an emotional expression was repeated relative to when it changed. Given our factorial design (Fig. 1B), the prediction was that fusiform cortex should show a significant main effect of identity repetition and STS would show a significant main effect of expression repetitions, but that neither area would show a significant interaction, because the interaction in this experimental design looks for areas showing a co-dependence between identity and expression repetitions.
We applied the technique of fMRI-A using a 2 × 2 factorial design (Fig. 1) to examine the neural basis for extraction of identity and expression from faces. Faces were presented in pairs where identity and expression of the second face could independently repeat or change with respect to the first face. Such immediate repetition induces robust fMRI adaptation (Epstein et al. 2003; Kourtzi and Kanwisher 2001). We refer to trials in which identity was repeated as “SI” and use “DI” to indicate a change in identity across a face pair. Similarly, trials where expression is held constant or changed are labeled as “SE” or “DE,” respectively. Thus a trial in which identity was unchanged but expression varied is referred to as “SIDE”.
The stimuli were a selection of five male faces from the KDEF database (Lundqvist and Litton 1998; The Karolinska Directed Emotional Faces, Department of Clinical Neuroscience, Psychology Section, Karolinska Institute). A major advantage of using this database is that it contains two exemplars of each expression for each identity. This allowed the use of different images for the first and second faces in the adaptation session for all trial types, even SISE. The five specific identities (males 13, 14, 16, 23, and 24) were selected on the basis of successful emotion recognition by nine subjects who took part in a pilot study judging the emotions expressed across the entire database. All face stimuli showed head and eye-gaze direction that were forward-facing toward the viewer. Five emotions were used in the study: anger, disgust, fear, happiness, and sadness. One exemplar of each expression (series “A”) was nominated as the prime face and the second (series “B”) as the second face for the adaptation phase. Stimuli were converted to grayscale and equated for mean luminance in Matlab (The Mathworks, Natick, MA) and cropped to a standardized outline in Photoshop (Adobe, San Jose, CA). Faces for the filler trials were neutral male faces from a variety of sources and were prepared similarly to emotional faces, as were chairs and female faces (the targets in the localizer and adaptation phases, respectively). One hundred scrambled faces for the localizer session were derived from the 50 emotional faces and 50 neutral male faces used in that session by permuting the phase of each spatial frequency in the image while maintaining a constant power density spectrum and cropping to the same outline.
Sixteen right-handed healthy subjects gave informed consent to take part in the study, which was approved by the local ethics committee. Data were rejected from two of these subjects: one because of gross head movement during fMRI scanning and the second due to an incidental structural abnormality that made normalization of scans difficult. Age range of included subjects was 18–29 yr (mean, 23 yr), and there were seven males. Subjects had normal or corrected-to-normal vision.
There were three sessions to the fMRI component of the study. The first was a face localizer session, designed to familiarize participants with face stimuli and provide a generic map of face- and expression-responsive brain regions. The second and third sessions were adaptation sessions and were split only for subject comfort. In the first (“localizer”) session, subjects' task was chair detection—they responded on an MRI-compatible button box when they saw a chair. The stimuli seen in this session were the 50 emotional faces from critical trials of the second and third (“adaptation”) sessions, 50 neutral male faces, 100 scrambled faces, and 20 chairs. In the adaptation phase, the task was to press the button when a female face appeared. In this way, trials of interest in all sessions were uncontaminated by motor response, and repetition of faces was incidental to the subject's task. A number of different trial types occurred in the adaptation phase. The four trial types of interest consisted of a pair of emotional faces that could exhibit the same or different identities, crossed with the same or different emotional expression (Fig. 1B). In addition, between any pair of trials of interest, one or two “filler trials” occurred to reduce the predictability of repetition. Three types of filler trials were used: “repeated fillers,” in which the face pair consisted of two neutral faces of the same identity; “different fillers,” consisting of a pair of neutral male faces of different identities; and “target trials,” containing a neutral male and a neutral female face (which could be either the 1st or 2nd face of the pair). In total, 80 repeated filler trials, 60 different filler trials, and 20 target trials occurred in each of the two sessions. Twenty-five trials of interest occurred for each of the four trial types in each session. Because there is only one way of combining the faces to produce SISE trials, but multiple ways of combining to produce the other trial types, the face pairs used for such trials were counterbalanced across subjects, with the constraint that each first face occurred once in each condition and each second face once in each condition within each session.
A Siemens 1.5T Sonata system (Siemens, Erlangen, Germany) was used to acquire blood oxygenation level dependent (BOLD) contrast-weighted echoplanar images (EPIs) for functional scans. Volumes, which consisted of 24 horizontal slices of 2 mm thickness with a 1 mm gap, were acquired continuously every 2.16 s. This sequence was sufficient to obtain coverage from above the corpus callosum to below the inferior temporal lobes, thus including all regions of interest in this study: fusiform, amygdala, STS, inferior frontal cortex, and orbitofrontal cortex (Fig. 2). In-plane resolution was 3 × 3 mm. The first six volumes were discarded to allow for T1 equilibriation effects. Subsequent to functional scans, a T1-weighted structural image (1 × 1 × 1 mm resolution) was acquired for co-registration and display of the functional data. Because of the difficulty of normalizing limited field-of-view EPIs, we additionally acquired whole brain EPIs in each subject for improved normalization.
fMRI data were spatially preprocessed using SPM2 (Wellcome Department of Imaging Neuroscience, London; http://www.fil.ion.ucl.ac.uk/spm). The volumes were co-registered (Friston et al. 1995a) and normalized to an EPI template corresponding to the MNI reference brain in Talairach space. The normalization parameters for each subject's EPIs were obtained by normalizing the whole brain EPIs acquired after the experimental session. The limited field of view EPIs were co-registered with the raw whole brain images (Collignon et al. 1995) and normalized by applying the parameters calculated for normalization of the whole brain images. Normalized images were smoothed using an 8-mm Gaussian kernel to account for residual intersubject differences and to allow statistical inference using Gaussian random field theory.
Data analysis used SPM2, applying a mass univariate general linear model (GLM) (Friston et al. 1995b). First, delta functions were constructed corresponding to the onset of each event type and for button presses in the adaptation phase to accommodate false alarms. These delta functions were convolved with a synthetic hemodynamic response function to create regressors for the subsequent GLM. Also included in the model were six movement parameters estimated by the realignment stage, regressors representing session effects, and for the adaptation sessions, three regressors of no interest corresponding to the potential confound of similarity between face pairs (see Visual similarity between face pairs). Serial autocorrelations were modeled using an AR(1) process, and the data were high-pass filtered at 1/128 Hz. Linear contrasts pertaining to the main effects and interaction of the factorial design were calculated. Consistent effects across subjects were tested using the resultant contrast images in a one-sample t-test (conforming to a “random effects” model). The model for the localizer session included separate regressors for the distinct facial expressions. Because no significant (P < 0.05, uncorrected) interactions with emotion type were detected in regions of interest (see also Winston et al. 2003a), we modeled the adaptation sessions without reference to different emotion types. Statistical threshold was set at P < 0.05, corrected for multiple comparisons across a small volume of interest, using a mask derived from the localizer session (see Mask-defining regions of interest).
Mask-defining regions of interest
Regions of interest were defined using two statistical results from the localizer session. First, we tested for effects of faces > scrambled faces and thresholded the result at P < 0.001 (uncorrected). We next tested for emotional faces > neutral faces and thresholded this result at P < 0.001 (uncorrected). The two resulting masks were combined using a logical OR function, yielding the combined mask (Fig. 2) that was subsequently used for small volume correction (SVC) (Worsley et al. 1996).
Visual similarity between face pairs
It could be argued that changes in identity and expression across a pair of faces do not represent only categorical changes, but also a variation in a continuous spectrum of visual similarity. Thus, for example, a pair of faces with the same identity and expression are likely to be more visually similar than a pair where identity is the same but the expression different. To avoid this potential confound affecting fMRI data analysis, three measures of visual similarity were collected and included in fMRI data analysis as covariates of no interest. The first two were image-based metrics, derived from mathematical analysis of image pairs. These were derived from normalized least squares measures of differences between face pairs. Briefly, faces were normalized for luminance, and the second face was subtracted from the first. The root mean square of the value at each pixel of this difference image was the difference score for a given image pair. In a refinement of the technique that accounted for minor differences in co-registration of salient features, faces were allowed to move over one another by ≤25 pixels in either plane, and the minimum resulting value was taken as the difference score (Vogels et al. 2001). The third measure adopted was derived from an independent group of subjects (see Control data on explicit tasks for identity/expression detection) who each saw 200 face pairs (50 of each trial type) presented with the same parameters as the imaging component of the study and rated each pair for visual similarity. The three ratings were in good agreement (Table 1). All three were included in the main statistical model as regressors of no interest by generating a design matrix in SPM whereby all events of interest were modeled as one trial type that was modulated parametrically by expansions to model the three similarity confounds. The columns pertaining to similarity confounds were extracted and used in the model described above.
Analysis of eye-tracking data
fMRI differences in regions such fusiform cortex could be attributable to variations in visual attention with trial type (Wojciulik et al. 1998), and differences in emotionally responsive regions such as amygdala would be similarly attributable to variations in arousal (Critchley et al. 2002). To explore whether such differences might exist, we used on-line eye-tracking. Data were acquired during scanning for a majority of subjects using an ASL504LRO eye-tracker (Applied Science Laboratories, Bedford, MA). Specifically, accurate pupillometry was achieved in nine subjects during the scanning session, and accurate eye-gaze tracking was achieved in eight. Pupillometry data were analyzed by defining a window of 1.2 s after the second face and measuring the minimum, maximum, and mean pupil diameter during averaged traces (low-pass filtered at 7.5 Hz and baselined for the onset of the 2nd face) from this window for each subject. These three measures were entered into separate 2 × 2 ANOVAs. Eye-gaze direction was also assessed using a summary statistic approach. For each of the four critical trial types, spatial maps of eye-gaze density were constructed. Each of these maps was compared with the mean map, and difference images constructed. The root mean squares of the density difference values for these latter maps were entered into a 2 × 2 ANOVA.
Control data on explicit tasks for identity/expression detection
An important consideration is the possibility that subjects might not have noticed repetition of identity or expression of faces within each pair. Furthermore, if a change in one dimension (e.g., the expression of the faces) affected subjects' ability to detect repetition of the other dimension (e.g., the identity of the faces), interactions between the two dimensions detected by fMRI would be difficult to interpret (in that a decrease in the fusiform response for SIDE trials relative to SISE trials, for example, could simply reflect a reduction in the number of trials in which subjects realized it was the same identity). Control behavioral experiments were conducted to test these possibilities. An independent group of 16 subjects (age range, 22–36 yr; mean age, 28.5 yr; 11 males; 2 left-handers) completed three behavioral tasks, using identical procedural parameters to those in the imaging study. In the first task, they rated pairs of faces presented for visual similarity using a computer-based visual analogue scale (providing the subjective measures of similarity mentioned above). In the second and third tasks, they classified face pairs as exhibiting either the same or different identity or the same or different emotional expression, with the order of identity/expression task and the buttons used for same/different responses counterbalanced over subjects. In all, each subject performed 50 trials of each type for each task. A short (25 trial) practice session preceded each task.
Behavioral data during scanning
Subjects detected 99 ± 2% (SD) of targets (chairs) in the localizer (false alarm rate, 0.2 ± 0.4%) and 88 ± 7% of targets (female faces) in the adaptation phase (false alarm rate, 7 ± 6%).
The two measures derived from the eye-tracking data from the fMRI scanning sessions showed no significant differences between the four trial types of interest (for gaze direction, all F(1,7) < 0.125, all P > 0.7; for pupil diameter, all F(1,8) < 2.1, all P > 0.18, except a marginal trend for an interaction between identity and expression repetitions in the minimum pupil constriction: F(1,8) = 4.0, P = 0.08). These nonsignificant results suggest that there were no detectable differences in visual attention (indexed by eye-gaze direction) or arousal (indexed by pupil diameter changes) between the different experimental conditions.
The results of the two t-tests performed on the data from the localizer phase, faces > scrambled faces and emotional > neutral faces, are shown in Fig. 2. As expected, the activated regions included bilateral fusiform and more posterior occipital areas, as well as STS and amygdala. The two contrasts were combined to create a mask of regions that responded to faces and/or facial expression. This mask defined a search region for the subsequent comparisons in the adaptation phase, allowing a principled means for correcting for multiple comparisons over voxels.
Adaptation phase: main effect of repeated identity
As predicted, a significant main effect of repeated identity [reduced response when the 2nd face exhibited the same identity as the 1st; (DISE + DIDE) > (SISE + SIDE)] was seen in right fusiform cortex (x,y,z = 39,−60,−15; Z = 3.76; P < 0.05, 1-tailed, SVC for the localizer mask; Fig. 3). In addition, adaptation for repeated identity was seen in right posterior STS (STSp; x,y,z = 63,−51,15; Z = 3.73; P < 0.05 SVC; Fig. 4). Because an interaction or main effect of repeated expression would influence interpretation of these results, we examined for such effects at reduced threshold. In the peak right STSp voxel, a marginally significant main effect of repetition of expression was evident (Z = 1.74; P < 0.05, 1-tailed, uncorrected), whereas in fusiform, no such effect was evident (Z = 0.99, P > 0.1, 1-tailed, uncorrected). There was no evidence for an interaction in the peak fusiform voxel (Z = 1.26, P > 0.2, 2-tailed, uncorrected) or in the right STSp (Z = 0.50, P > 0.2, 2-tailed). Although simple effects are not conventionally inspected in the absence of a significant interaction, we checked whether changing expression modulated fusiform responses either in the context of identity remaining constant or changing. In neither case was there a significant effect (simple effect of DE relative to SE with a change in identity, P = 0.93; with identity constant, P = 0.14). A more posterior region of right occipital cortex, possibly corresponding to a face-responsive occipital region (FROR), showed uncorrected repetition effects but failed to withstand correction for multiple comparisons across the volume of the mask (x,y,z = 42,−75,−18; Z = 3.59; P = 0.071 SVC; see Fig. 3A).
Adaptation phase: main effect of repeated expression
A region of right STS anterior to that described above was shown to be less active when the second face exhibited the same expression as the first face [Fig. 5; (SIDE + DIDE) > (SISE + DISE)]. This activation corrected for multiple comparisons across the volume of our mask (x,y,z = 57,−18,−12; Z = 3.80; P < 0.05, 1-tailed, SVC). Despite the apparent trend toward an interaction in Fig. 5B, this was not significant when tested at a lenient statistical threshold (Z =1.06, P > 0.2, 2-tailed, uncorrected). Similar to the fusiform region, we checked for significant simple effects (DI relative to SI) opposite to the detected main effect and found no significant differences in the context of expression changing (P = 0.14) or being held constant (P = 0.97).
To determine whether the differences in detectable main effects between the mid-STS region and fusiform were significant, we undertook a region-by-condition interaction using a 3-mm sphere centered on each peak. A significant three-way interaction was obtained (P < 0.05) in the direction predicted by the distributed model (adaptation to identity in fusiform and to expression in STS).
Adaptation phase: interaction
No areas within the mask defining our regions of interest showed an interaction between identity and expression.
One potential confound in our design is the presence of differences in the visual similarity between face pairs of different trial types. To account for this confound, we obtained mean subjective and objective similarity measures for the four different trial types (Table 1). The subjective data were obtained from an additional behavioral experiment (see methods). All three measures showed significant differences between the four trial types (all F(1,13) > 190, all P < 0.001). SISE pairs were more similar than the other three types, despite our use of different images in this condition. Unsurprisingly, and consistent with the concept of identity as an invariant feature of the face, trials with same identity had greater similarity than trials with different identity. To account for these differences, we included all three measures as covariates of no interest in the analysis of the fMRI data (see methods), which removed any linear contribution of similarity to the above fMRI findings. A random effects analysis of the contribution of these regressors to the model (using an F-contrast spanning the 3 regressors in an ANOVA model) suggested that they were explaining effects in visual regions, although not within the mask used for SVC (e.g., peaks at x,y,z = −51,−60,−6, Z = 3.76; x,y,z = 9,−78,9; Z = 3.46; x,y,z = 54,−45,−15, Z = 3.39; x,y,z = 30,−60,−15, Z = 3.38; all P < 0.001 uncorrected).
An additional concern, noted above, is that subjects might not notice repetitions of identity or expression or that the presence of repetition in one dimension would affect behavior to the other dimension. Data from a behavioral experiments on a separate subject cohort (see methods) showed that the mean accuracy in an identity discrimination task was 88% for trials when expression was held constant and 83% for trials when expression changed (paired t-test: Z = 3.78, P < 0.001). This was paralleled by slower reaction times (RTs) when judging identity in the context of expression changes (806 vs. 773 ms, Z = 3.33, P < 0.001). Mean accuracy for the emotion discrimination task was 87% when identity was unchanged across the face pair and 80% when identity changed (Z = 3.53, P < 0.001). Again, RTs were slower on trials where the task-irrelevant dimension (identity) changed compared with being held constant (888 vs. 848 ms, Z = 4.41, P < 0.001). These data show that people's ability to detect repetition of identity or expression with these stimuli was generally high. The data also suggest that changes in one dimension do affect sensitivity to repetition of the other dimension. This behavioral interaction does not, however, confound our findings of two orthogonal main effects in the fusiform/STSp and mid-STS regions.
In this study, we used event-related fMRI-A to identify the neuroanatomical basis for coding different aspects of faces, specifically identity and expression, in the human brain. By presenting pairs of faces in which the identity and emotional expression of a second face could accord or vary with respect to the first, we showed that discrete brain regions show a reduced BOLD signal when a specific dimension was repeated relative to when it changed. Specifically, posterior lateral right fusiform cortex and posterior right STS exhibited adaptation for identity, whereas right mid-STS showed adaptation for emotional expression. These differences do not relate to any obvious measure of visual similarity between faces in each pair, given that we co-varied out both objective and subjective measures of similarity. In addition, we found no evidence that the effects we observed could be attributed to differences in eye movement or arousal. Control data showed that subjects' explicit ability to detect changes in identity or in expression was generally high. Although performance was reduced when the other dimension changed, which might confound any interaction between identity and expression on the levels of adaptation, this observation cannot explain the simultaneous finding of two orthogonal main effects in the imaging data.
In the distributed model of face processing (Haxby et al. 2000), a dissociation is posited between processing of invariant and changeable aspects of faces. Specifically, it is suggested that invariant features are coded in ventral occipital and temporal cortex in the lateral fusiform region (also known as the “face area”; Kanwisher et al. 1997), whereas changeable aspects are coded by right STS. Our data broadly support this model. Within the framework of fMRI-A, our demonstration of a main effect of repeated identity in right fusiform cortex indicates this region represents identity, an invariant aspect of human faces. Although previous studies have shown repetition decreases to faces in fusiform cortex (Gauthier et al. 2000; George et al. 1999; Henson et al. 2000, 2003), to our knowledge, this study is the first to show repetition effects in fusiform cortex across dramatically different views of the same identity (i.e., with different expressions). In our view, this finding is important because it suggests that face representations in this region encode not just a specific visual image but a more abstract representation of facial identity (see also Eger et al. 2004; Vuilleumier et al. 2003a).
A consistent finding in neuroimaging studies of emotional face perception is activation of fusiform cortex in perception of emotional relative to neutral faces (Breiter et al. 1996; Dolan et al. 1996; Morris et al. 1998; Pessoa et al. 2002; Surguladze et al. 2003; Vuilleumier et al. 2001; Winston et al. 2003a,b). This has been interpreted as relating to enhanced attentional processing associated with arousing emotional faces relative to nonarousing neutral faces (Dolan 2002). However, an alternative explanation is that this region encodes the emotionality of the face, resulting in enhanced activation when expressive faces are presented. The use of an adaptation paradigm in this study enables us to potentially dissociate between these possibilities. If this region coded for specific expressions, it should have shown adaptation for expression, akin to that for identity. The lack of evidence for adaptation for repeated expressions is consistent with the former interpretation that fusiform modulation is mediated by an amygdala-associated effect (although we note that this inference is based on a null result). At the very least, it seems that any bottom-up effects of expression in right fusiform cortex are of less importance than those of identity, i.e., it exhibits relative preference for identity processing from faces.
In contrast to right fusiform, a focus in right mid-STS showed a main effect for repetition of emotional expression, with repeated expressions associated with reduced activation relative to differing expressions. This agrees with a role for this region in coding the specific emotion expressed in a face. We were surprised by the anterior locus of this activation, which fell at –18 on the anterior-posterior axis. Previous studies concerning facial expression have reported activation in right STS in a more posterior locus (around –35 to –60 mm) (Critchley et al. 2000; Iidaka et al. 2001; Narumoto et al. 2001; Winston et al. 2003a). This more anterior locus is, however, within the portion of STS reported as activated in studies of social cues (Allison et al. 2000; Martin and Weisberg 2003; Ojemann et al. 1992; Saxe and Kanwisher 2003). We have additionally checked our previous data for activation in this area and found it was activated in an explicit emotional judgment task relative to a gender judgment task (x,y,z = 52,−16,−18; Z = 3.96; see Fig. 5A in Winston et al. 2003a). Note also that this region fell within our face localizer mask, and by definition, is responsive to faces or facial expression.
Posterior STS, like the fusiform, showed adaptation to repeated identity. This is contrary to a previous study that failed to observe repetition effects in this region (Henson et al. 2003), although that study used much longer repetition lags. A role for posterior STS in processing personal identity is, however, consistent with a recent human lesion study describing a patient with an infarct in the vicinity of left STS who described novel faces as familiar (Vuilleumier et al. 2003b). Unlike fusiform, however, posterior STS showed a trend toward an additive main effect for repeated emotion, implying that its role in face processing may be multifaceted. Intriguingly, in a recent re-analysis of single neuron data from monkeys, Tiberghien et al. (2003) suggested that all facial features contribute to distinguishing identity, whereas only a subset determine facial expression. They hypothesize that, as a consequence, inferior temporal regions in monkeys may contain identity-selective neuronal populations, while STS might contain populations sensitive to identity and expression. Such a view fits with our demonstration of identity repetition in posterior STS and sensitivity to expression in posterior and mid-STS. However, with regard to the human lesion literature, the majority of reported prosopagnostic patients described have inferior occipitotemporal rather than lateral temporal lesions (see e.g., Damasio et al. 1990; Wada and Yamamoto 2001), presumably corresponding to fusiform rather than STS (but see Fig. 1A in Tranel et al. 1997; see also Rossion et al. 2003). In addition, monkeys with STS lesions appear to have only minor identity discrimination deficits (Heywood and Cowey 1992), and recent evidence from multidimensional scaling analysis of single neuron data from monkey STS and inferior temporal (IT) cortex also suggests that STS is more concerned with analysis of facial view and the code in IT is more concerned by facial identity (Eifuku et al. 2004). This apparent discrepancy with our result of posterior STS responsivity to identity might be explained in a number of ways. First, it is possible that activation in this region is epiphenomenal and of no functional consequence for identity recognition. However, it is also possible that our stimuli tax identity processing across different views of a face, and the role of STS in processing different views of face stimuli is well known (Eifuku et al. 2004; Perrett et al. 1985, 1991). In addition, it has been shown that STS neurons in the macaque monkey process identity, at least in the form of a population code (Baylis et al. 1985), and there is evidence that single neurons in STS code for the same identity across different face views and other STS cells code conjunctions of identity and view (Perrett et al. 1991). It may be the case that the aspect(s) of identity processing that occur in STS do not commonly lead to complaints of prosopagnosia or that tests designed to probe prosopagnosia are relatively insensitive to these aspect(s) of identity processing. As an unpredicted, although significant activation, we would like to see this effect of adaptation for repetitions of identity across different views replicated before drawing strong conclusions.
Previous work has implicated other brain regions in processing facial expressions, most notably the amygdala (Breiter et al. 1996; Morris et al. 1996; Pessoa et al. 2002; Vuilleumier et al. 2001; Whalen et al. 1998; Winston et al. 2003a). There are a number of potential reasons why we did not detect adaptation in this region. One possibility is that amygdala responses are emotion-specific, with greatest responses to fearful faces (Calder et al. 2001), and thus our collapsing across different expression subtypes may have obscured emotion-specific responses. Alternatively, the amygdala might code for facial expression in a different manner from cortical regions such as STS, with a nonspecific code, whereby responses are dependent on the arousal engendered by the emotion (H. D. Critchley, P. Rothstein, and R. J. Dolan, unpublished observations). Another possibility is that an expression-specific amygdala response may be insensitive to adaptation, although this seems unlikely, given positive findings concerning the amygdala and stimulus repetition (Ono and Nishijo 2000; Rotshtein et al. 2001).
Although identity and emotion may be processed by partially dissociable neural pathways, the two pathways are likely to interact in production of behavioral responses. This would appear to be the case for our explicit identity and emotion detection tasks, in which a change in one dimension (identity or emotion) impaired ability to detect changes in the other dimension. In behavioral studies, other authors have also found evidence for the nonindependence of identity and emotion processing (Ganel and Goshen-Gottstein 2004; Schweinberger and Soukup 1998; Schweinberger et al. 1999). A further brain region may be responsible for the integration of distinct aspects of the face that we failed to detect in this study. A more explicit behavioral task during fMRI may help to clarify this issue in future studies.
One issue that deserves consideration is the meaning of BOLD changes in adaptation paradigms such as this. It has been shown that local field potentials (LFPs) correlate with the BOLD signal better than multi- or single-unit activity in the macaque monkey (Logothetis et al. 2001). Thus a region showing fMRI-A may not be transmitting fewer spikes but may either be showing a reduced afferent input or reduced local processing. This highlights one possible dissociation between fMRI-A and response suppression as recorded in single unit work in monkeys (Desimone 1996). We do not think that fMRI experiments based on adaptation are uniquely problematic in this regard, but this is a more general interpretational issue for unifying electrophysiological and fMRI work (see Henson and Rugg 2003 for a more extensive discussion of hemodynamic decreases and response suppression).
In conclusion, we have shown that fusiform cortex shows fMRI-A when the identity of a face is repeated, and a region of STS shows adaptation when the emotional expression of a face is repeated. The response profiles of these two regions were significantly different in the directions predicted by the distributed model of face processing (Haxby et al. 2000), and we suggest that our findings are generally consistent with this model. However, an adaptation response in posterior STS to repeated identity suggests that STS may also manifest a degree of functional segregation in face perception.
This work was carried out under a Wellcome Trust Programme Grant to R. J. Dolan. R.N.A. Henson is supported by the Wellcome Trust.
We thank J. O'Doherty and the radiographers at the FIL for assistance with scanning, and P. Bentley, J. Gottfried, J. Kilner, P. Vuilleumier, and P. Rotshtein for helpful discussions. We thank J. E. Litton for the KDEF stimuli. data.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Copyright © 2004 by the American Physiological Society