Single-cell studies in the macaque have reported selective neural responses evoked by visual presentations of faces and bodies. Consistent with these findings, functional magnetic resonance imaging studies in humans and monkeys indicate that regions in temporal cortex respond preferentially to faces and bodies. However, it is not clear how these areas correspond across the two species. Here, we directly compared category-selective areas in macaques and humans using virtually identical techniques. In the macaque, several face- and body part–selective areas were found located along the superior temporal sulcus (STS) and middle temporal gyrus (MTG). In the human, similar to previous studies, face-selective areas were found in ventral occipital and temporal cortex and an additional face-selective area was found in the anterior temporal cortex. Face-selective areas were also found in lateral temporal cortex, including the previously reported posterior STS area. Body part–selective areas were identified in the human fusiform gyrus and lateral occipitotemporal cortex. In a first experiment, both monkey and human subjects were presented with pictures of faces, body parts, foods, scenes, and man-made objects, to examine the response profiles of each category-selective area to the five stimulus types. In a second experiment, face processing was examined by presenting upright and inverted faces. By comparing the responses and spatial relationships of the areas, we propose potential correspondences across species. Adjacent and overlapping areas in the macaque anterior STS/MTG responded strongly to both faces and body parts, similar to areas in the human fusiform gyrus and posterior STS. Furthermore, face-selective areas on the ventral bank of the STS/MTG discriminated both upright and inverted faces from objects, similar to areas in the human ventral temporal cortex. Overall, our findings demonstrate commonalities and differences in the wide-scale brain organization between the two species and provide an initial step toward establishing functionally homologous category-selective areas.
In humans, object information is represented in a large swath of lateral occipital and ventral temporal cortex. Functional magnetic resonance imaging (fMRI) studies have revealed that a few object categories activate discrete regions within object-responsive cortex. For example, faces have been shown to activate specific regions of the fusiform gyrus (the “fusiform face area” [FFA]), the inferior occipital gyrus (the “occipital face area” [OFA]), and the posterior superior temporal sulcus (pSTS) (Hoffman and Haxby 2000; Ishai et al. 1999; Kanwisher et al. 1997; McCarthy et al. 1997; Puce et al. 1996). These regions have been implicated in a variety of functions in face processing, including the processing of invariant face features, such as identity and gender in the FFA, and the processing of changeable features, such as eye gaze and emotional expression in the pSTS (Andrews and Ewbank 2004; Engell and Haxby 2007; Hoffman and Haxby 2000; Puce et al. 1998). It has been suggested that the OFA detects face features, given its posterior location to the FFA and pSTS (Haxby et al. 2000), but there is some evidence for invariant face processing in the OFA as well (Hoffman and Haxby 2000; Rossion et al. 2003). In addition to faces, bodies and body parts have been found to activate regions in the lateral occipitotemporal gyrus (the “extrastriate body area” [EBA]) and, most recently, an area on the fusiform gyrus that is adjacent to the face-selective area (the “fusiform body area” [FBA]) (Downing et al. 2001; Peelen and Downing 2005; Schwarzlose et al. 2005; Taylor et al. 2007). Other category-selective areas have been reported for scenes and spatial layouts in the parahippocampal cortex (the “parahippocampal place area” [PPA]; Epstein and Kanwisher 1998) and for words in the left fusiform gyrus (the “visual word form area” [VWFA]; Baker et al. 2007; Cohen et al. 2000). These findings suggest that certain categories of stimuli have a special status in the human brain and are represented in anatomically distinct locations. Consistent with such findings are reports of patients with focal lesions in occipitotemporal cortex that have selective impairments in processing faces or bodies (Damasio et al. 1982; McKenna and Warrington 1978) and patients with focal lesions in medial temporal cortex that suffer from topographical disorientation (Aguirre et al. 1998; Whiteley and Warrington 1978).
Single-cell physiology studies in monkeys have found a small proportion of neurons that respond preferentially to faces and bodies. Face-selective cells (i.e., “face cells”) have been found throughout lateral and ventral temporal cortex with the highest concentration in both banks and the floor of the superior temporal sulcus (STS), and thus in both inferior temporal (IT) cortex and the superior temporal polysensory area (STP) (Baylis et al. 1987; Bruce et al. 1981; Desimone et al. 1984; Perrett et al. 1982; Tanaka et al. 1991; Yamane et al. 1988). Even though face cells were often found grouped closely in patches (9–16 mm2) (Harries and Perrett 1991), and even to form columns (Tanaka 1996; Wang et al. 1996), their larger scale organization has been unclear from these invasive investigations. Neurons responsive to hands were found to be less common and appeared scattered throughout IT cortex (Desimone et al. 1984; Gross et al. 1969, 1972; Tanaka et al. 1991). Further, pictures of entire headless bodies and their movements have been shown to evoke selective responses from neurons along the upper bank of the STS, in STP (Oram and Perrett 1996; Wachsmuth et al. 1994). Although single-cell physiology studies have revealed category-selective neural responses, they have not suggested a large-scale organization as found in the human brain.
Using fMRI, the large-scale organization of neural representations related to faces and body parts has been recently investigated in both anesthetized and awake monkeys (Hadj-Bouziane et al. 2008; Logothetis et al. 1999; Pinsk et al. 2005a; Tsao et al. 2003a). Both Tsao and colleagues (2003a, 2006) and Pinsk and colleagues (2005a) found two to five discrete areas in STS and surrounding cortex that were more strongly activated by pictures of faces than by man-made objects in monkeys trained to maintain fixation. In addition, both groups reported a body-selective area in the middle STS region (Pinsk et al. 2005a; Tsao et al. 2003a). Furthermore, both Hoffman and colleagues (2007) and Hadj-Bouziane and colleagues (2008) reported “face-sensitive” regions in macaque temporal cortex by comparing pictures of faces to their scrambled counterparts. Most recently, Moeller and colleagues (2008) reported six face-selective areas along the STS and surrounding cortex from posterior TE/TEO to anterior TE. Thus there seems to be converging evidence of a multitude of face-selective areas and fewer body-selective areas in the macaque STS and surrounding cortex.
Although it seems established at this point that there are face- and body-selective areas in both the macaque and human visual system, it is not clear how these regions correspond across species. Using a direct, comparative approach may provide insights into the functional similarities of the category-selective areas in the two species (Denys et al. 2004a,b; Kourtzi et al. 2003; Koyama et al. 2004; Nakahara et al. 2002; Orban et al. 2003; Sawamura et al. 2005; Tsao et al. 2003b, 2008; Vanduffel et al. 2002). Here, we used fMRI to probe the neural representations of biologically relevant stimuli in both species, allowing for a direct comparison of category-selective areas across species. In a first experiment, monkey and human subjects were presented with several categories of objects (faces, body parts, foods, scenes, and objects) while performing a fixation task to compare responses in each area to preferred and nonpreferred stimuli. In a second experiment, face processing in both species was probed by examining responses in face-selective areas to upright and inverted faces. All data analysis was carried out on cortical surface models to increase sensitivity and spatial localization of the blood oxygen level–dependent (BOLD) signal source. Furthermore, standard-mesh surfaces were used to provide direct node-to-node correspondence between individual subjects' surfaces to facilitate comparison of activations across subjects.
Face- and body part–selective areas in both species were compared on the basis of similarities in their spatial arrangement on the cortical surface and mean signal responses evoked by each stimulus category. Using this direct comparison, we show that adjacent and overlapping areas in the anterior STS/ middle temporal gyrus (MTG) of the macaque and the fusiform gyrus and posterior STS of the human respond strongly to both faces and body parts. Furthermore, face-selective areas on the ventral bank of the STS/MTG in the macaque and in ventral temporal cortex in the human discriminate both upright and inverted faces from objects. Taken together, these findings provide an initial step toward defining functionally homologous areas representing information for faces and body parts in humans and macaques.
Three adult male macaque monkeys (Macaca fascicularis) weighing 4–9 kg participated in the study. All procedures were approved by the Princeton University Animal Care and Use Committee and conformed with National Institutes of Health guidelines for the humane care and use of laboratory animals.
Ten human subjects (six males; age: 22–38 yr) participated in the study, which was approved by the Institutional Review Panel of Princeton University. Six subjects each participated in experiments 1 and 2. Two of the subjects participated in both experiments. All subjects reported being in good health and had no past history of psychiatric or neurological diseases. Subjects had normal or corrected-to-normal visual acuity and gave their informed written consent.
Macaque surgical and training procedures
Each animal was surgically implanted with a plastic head bolt for restraining the head by using ceramic screws and dental acrylic. All surgical procedures were performed under strictly aseptic conditions and under general anesthesia with isoflurane (induction 2–4%, maintenance 0.5–2%) following preanesthetic medication with atropine (0.08 mg/kg, administered intramuscularly [im]), ketamine (2–10 mg/kg, im) and acepromazine (1 mg/kg, im). The animals were treated postsurgically with antibiotics (e.g., Baytril, 2.5 mg/kg, im) and analgesics (e.g., buprenorphine, 0.01 mg/kg, im) and wound margins of skin surrounding the implant were cleaned regularly.
Monkeys were placed in an MR-compatible primate chair prone, in a sphinxlike position, with their heads erect and fixed in a head-holding apparatus (Pinsk et al. 2005b). The animals were acclimated to the scanner environment through the use of a mock setup. Monkeys were trained to fixate on a small dot at the center of a display screen by using an infrared eye-tracking system (Applied Science Laboratories, Bedford, MA). By providing the animals with regular juice rewards while they maintained fixation within a 4° square window, and by systematically increasing the rate of their juice reward by reducing the delivery intervals from 2.5 to 1 s in 500-ms steps, the animals were trained to fixate for several minutes. Further details regarding surgery and training procedures are given in Pinsk et al. (2005b).
Visual display, stimulation, and experimental design
Visual stimuli were projected from a PowerLite 7250 LCD projector (Epson, Long Beach, CA) outside the scanner room onto a translucent screen located at the end of the scanner bore. Monkeys viewed the projection screen directly, whereas human subjects viewed the screen through a mirror attached to the head coil. The total path length from eye to screen was about 60 cm for both species. The screen subtended 30° of visual angle in the horizontal dimension and 26° in the vertical dimension. The stimulus presentation, eye position tracking, and reward delivery were synchronized to the beginning of each scan using a trigger pulse from the scanner and were controlled via a PC computer using Presentation software (Neurobehavioral Systems, Albany, CA).
During all scans, monkeys performed a fixation task, as described earlier. Scans during which the animal broke fixation for >20 times and for >500 ms were excluded from analysis. All human subjects were highly experienced and had performed fixation tasks in several previous studies in our laboratory.
Experiment 1: object category representations.
Monkey subjects (n = 3) viewed color pictures of five object categories: monkey faces, monkey body parts, foods, laboratory scenes, and man-made objects (see Fig. 1 A for examples). Human subjects (n = 6) viewed these same five categories of stimuli and, additionally, color pictures of human faces and human body parts (Fig. 1B). For purposes of direct comparison, our analyses did not consider responses to heterospecific stimuli. All of the stimuli were pictures of things the monkeys had seen, except for the man-made objects. The pictures of the man-made objects were familiar to the monkeys because they had been presented with the same pictures during training sessions. The stimuli subtended 12 × 12° and were presented for 1 s foveally at the fixation point (0.5° diameter), followed by a 1-s blank interval during which only the fixation point was present. Blocks of stimuli from each category were presented interleaved with blank periods, each lasting for 12 s. Each category block was repeated once in a single functional time series (i.e., run), resulting in equal amounts of exposure to each stimulus category. The order of the blocks was pseudorandomized into two different sequences that were repeated in an ABBA scheme during scanning sessions.
During each run, human subjects performed either a passive viewing fixation task or a one-back repetition detection task during which they silently counted the sequential occurrences of two identical stimuli (Haxby et al. 2001). After each run in which subjects performed the one-back task, they reported the number of matches that they had detected. Average subject performance on the one-back task was 91.6 ± 3.8% accuracy, with the poorest subject performing at an average of 89.9 ± 6.7% accuracy and the best subject at an average of 97.3 ± 1.5%. Task type was alternated for each run during the course of a scanning session.
Experiment 2: face inversion.
Monkey subjects (n = 3) viewed color pictures of monkey faces, inverted monkey faces, and man-made objects (see Fig. 1C for examples). Human subjects (n = 6) viewed color pictures of upright and inverted human faces, monkey faces, and man-made objects (Fig. 1D). For purposes of direct comparison, results obtained with heterospecific stimuli were not considered for the purposes of this report. The stimuli subtended 12 × 12° and were presented for 1 s foveally behind the fixation point (0.5° diameter), followed by a 1-s blank interval during which only the fixation point was present. Blocks of stimuli from each category were presented interleaved with blank periods, each lasting for 12 s. Each category block was repeated twice within a single scan.
Data were acquired in both species with a 3-T head-dedicated scanner (Magnetom Allegra; Siemens, Erlangen, Germany).
For macaque scanning, a 12-cm transmit/receive surface coil (Model NMSC-023; Nova Medical, Wakefield, MA) was used for the scanning sessions during which functional images were acquired and a 16-cm transmit/receive quadrature volume coil (Model NM-016; Nova Medical) was used for a scanning session during which high-resolution anatomical images were acquired. Monkey subjects were placed in the “sphinx” position in an MR-compatible primate chair during awake experimental scanning sessions and they were placed prone, without a chair, during the anesthetized structural scanning session. A whole-brain structural volume was acquired with the volume coil while the animals were anesthetized with Telazol (tiletamine/zolazepam, 10 mg/kg, im) in a magnetization-prepared rapid gradient echo (MPRAGE) sequence (0.5 × 0.5 × 0.5-mm resolution; field of view (FOV) = 128 mm; 256 × 256 matrix; repetition [TR] = 2,500 ms; echo time [TE] = 4.4 ms; inversion time [TI] = 1,100 ms; flip angle = 8°; 20 acquisitions). In addition, a second whole-brain structural volume was acquired with the surface coil and the animal placed in the primate chair under anesthesia (MPRAGE sequence; 0.5 × 0.5 × 1.0-mm resolution; FOV = 128 mm; 256 × 256 matrix; TR = 2,500 ms; TE = 4.4 ms; TI = 1,100 ms; flip angle = 8°; 1 acquisition). This second structural volume was acquired with the head in the same location as during the awake experimental sessions and served as an alignment reference for the higher-quality structural volume acquired with the volume coil. All other scan sessions, each lasting about 1.5 h, were performed with awake animals.
For experiment 1, 27 coronal slices were acquired in three to seven series in two monkeys (M1 and M2) using gradient echo echo planar imaging (GE-EPI sequence; 1.25 × 1.25-mm in-plane resolution; 27 slices; 2-mm-thick slices; no interslice gap; FOV = 80 mm; 64 × 64 matrix; TR = 2,400 ms; TE = 32 ms; flip angle = 90°; bandwidth = 2,112 Hz per pixel). An optimized multiecho gradient echo sequence was used in the third monkey (M3) (ME-EPI sequence; 1.2 × 1.2-mm in-plane resolution; 30 coronal slices; 2-mm-thick slices, no interslice gap; FOV = 96 mm; 80 × 80 matrix; TR = 2,400 ms; TE = 18 ms; flip angle = 80°; bandwidth = 1,894 Hz per pixel). This sequence permits the acquisition of data at several echo times, under reversed gradient readouts, thereby allowing for simultaneous estimation of the magnetic field, resulting in reduced image distortions with partial recovery of susceptibility-induced signal loss (Pinsk et al. 2008). The slice prescription started from the posterior pole and covered the brain up to the region of the principal sulcus.
For experiment 2, 33 coronal slices were acquired in six to eight series in the three monkeys (GE-EPI sequence, 1.25 × 1.25 mm in-plane resolution; 2-mm-thick slices; no interslice gap; FOV = 80 mm; 64 × 64 matrix; TR = 2,000 ms; TE = 26 ms; flip angle = 90°; bandwidth = 2,112 Hz per pixel). Monkey M3 was scanned using the multiecho gradient echo sequence with the same parameters as those in experiment 1. In total, for experiment 1, 2,940, 4,830, and 4,410 functional volumes were acquired in monkeys M1, M2, and M3 in a total of 28, 46, and 52 scan series, and 7, 6, and 17 scan sessions per animal, respectively. For experiment 2, 3,325, 7,030, and 2,280 functional volumes were acquired in the three monkeys in a total of 35, 74, and 24 scan series, and 11, 21, and 8 scan sessions per animal, respectively.
For human scanning, a standard volume head coil was used for both functional and structural imaging. Subjects were placed supine with their heads surrounded by foam to reduce head movements. For experiment 1, six subjects were scanned in two separate scanning sessions, each lasting about 2 h. The first session was used to acquire five to six whole-brain structural volumes for high-quality anatomical underlays and cortical surface reconstructions (MPRAGE sequence; 1 × 1 × 1-mm resolution; FOV = 256 mm; 256 × 256 matrix; TR = 2,500 ms; TE = 4.38 ms; TI = 1,100 ms; flip angle = 8°). During the second session, eight functional series were acquired using a GE-EPI sequence covering the entire brain with 30 transverse slices (3 × 3-mm in-plane resolution; 3-mm-thick slices; 1-mm interslice gap; FOV = 192 mm; 64 × 64 matrix; TR = 2,000 ms; TE = 30 ms; flip angle = 90°; bandwidth = 3,126 Hz per pixel). For experiment 2, six subjects were scanned in a scan session that consisted of eight functional series using the same scan parameters as those in experiment 1.
Overview of data analysis.
Both human and monkey data were analyzed using AFNI (Cox 1996), FreeSurfer (Dale et al. 1999; Fischl et al. 1999), and SUMA (http://afni.nimh.nih.gov/afni/suma). After a three-dimensional (3D) rigid motion-correction procedure, the fMRI data were mapped onto cortical surfaces. Mapping the time series data onto the cortical surfaces allowed for surface-based spatial smoothing, which favorably restricts smoothing to data that are primarily within the gray matter, since white matter voxels do not get mapped onto the surface. Surface-based spatial smoothing has been shown to increase both sensitivity and spatial accuracy of BOLD signal sources (Jo et al. 2007, 2008). All subsequent analysis procedures (e.g., multiple regression analysis, region of interest [ROI] analysis) were performed on the surface-mapped data.
In addition, standard-mesh cortical surfaces were created to directly compare subjects using a single surface. Briefly, each individual's surface was inflated and transformed into a sphere in a manner that minimized metric distortion (Fischl et al. 1999). The individual spheres were used to create a template sphere for each hemisphere, where the curvature pattern consisted of the average pattern across all the subjects (i.e., three monkeys or six humans). The individual spheres were nonrigidly aligned to the templates so that the curvature patterns of each subject matched those of the template. To avoid interpolation of the fMRI data to match the warped spheres, the SUMA software package was used to create standard-mesh surfaces from the warped spheres using icosahedral tessellation and projection (Argall et al. 2006; Saad et al. 2004). The geometry of the resulting standard-mesh surfaces is identical to the individual subject's original surface geometry, but the topology is common across all of the subjects. The use of standard-mesh surfaces allowed for node-to-node correspondence across surfaces of different subjects, so that functional data mapped onto one subject's surface could be directly compared with data mapped on another subject's surface.
Monkey data analysis.
Monkey fMRI scans were assessed for excessive head motion using the 3D registration tool provided by AFNI. Each functional image in a time series was registered to the previously acquired image to provide a measure of head movement over time. Scans were excluded if >3% of its images contained >1 mm of translation or 1° of rotation. The original data from all of the scans that passed the motion assessment were motion-corrected by registering each image to a reference EPI volume that was acquired during the structural scanning session. The motion-corrected images were next mapped onto each monkey's standard-mesh surface using SUMA. All subsequent data analyses were performed on the data mapped onto the standard-mesh surfaces. The data were spatially filtered with a 2-mm Gaussian kernel to increase the signal-to-noise ratio (SNR) but still retain spatial specificity. Each time series was normalized to its mean to input all of the time series across scan sessions into a single multiple regression analysis.
For both experiments square-wave functions matching the time course of the experimental design were convolved with a gamma-variate function (Cohen 1997) to generate idealized response functions and used as regressors of interest in a multiple regression model in the framework of the general linear model (Friston et al. 1995). In addition, regressors that accounted for variance due to baseline shifts between time series, linear drifts within time series, and head motion parameter estimates calculated by the registration were included in the regression model.
For experiment 1, brain regions that responded more strongly to faces or body parts were identified by contrasting presentation blocks of faces with objects, and body parts with objects, respectively, similar to contrasts used in previous human fMRI studies to identify category-selective cortical areas (Gauthier et al. 2000; Ishai et al. 1999; Kanwisher et al. 1997; McCarthy et al. 1997). We also explored food-selective activations by contrasting food stimuli with objects, but did not find any consistent brain regions that responded selectively to views of food stimuli. Further, the results regarding regions more strongly activated by laboratory scenes than by objects were partly compromised due to susceptibility artifacts and low signal along the ventral–medial regions of the temporal lobes and will require more detailed future study. Therefore our report will focus on the consistently identified face and body part activations. For experiment 2, brain regions that responded more strongly to faces were identified by contrasting upright monkey face with object stimuli. The statistical maps were thresholded at an F-score of 6.64 (P < 0.01, uncorrected for multiple comparisons).
Regions of interest (ROIs) were defined as clusters of statistically significant nodes located in similar anatomical locations in at least two of the three monkeys. The fMRI signals were averaged across all activated nodes within a given ROI, and across scans, and normalized to the measurement immediately preceding stimulus onset. Because response profiles were similar across hemispheres, they were averaged together (Supplemental Fig. S1).1 For each monkey, the five measurements obtained during each condition, adjusted for hemodynamic lag, were averaged, resulting in mean signal changes. Repeated-measures ANOVAs on the five measurements were performed to test for main effects of stimulus category using all stimulus categories except for the objects category. The inclusion of the objects category in the ANOVA would likely bias the F-values toward stronger selectivity since the objects category was used in the statistical contrast to derive the ROIs. Individual comparisons between the four remaining conditions (faces, body parts, foods, scenes) were performed using matched paired t-tests.
The preferred category selectivity of each ROI was assessed by computing d′ (Afraz et al. 2006; Grill-Spector et al. 2006), defined by the following formula where μpreferred and μnonpreferred are the average responses to the preferred stimulus category (i.e., faces or body parts) and the average responses to the remaining three nonpreferred stimulus categories (i.e., foods, objects, and scenes), respectively; σpreferred and σnonpreferred are the SDs of the responses to the preferred and nonpreferred stimulus categories.
Eye-tracking data acquired in each monkey during scanning sessions were analyzed to rule out eye-movement–related confounds. Because the fixation window was large enough to allow for both eye position drift and saccadic eye movements ≤4°, a measure of eye movements was calculated based on the total amount of horizontal and vertical eye movements in each of the stimulus conditions. This measure captured both large and small eye movements that may have contributed to eye-movement–related confounds. Total eye-movement differences in each monkey were compared using repeated-measures ANOVAs to test for main effects of stimulus condition. Individual comparisons between conditions were performed using matched paired t-tests. Horizontal (X) and vertical (Y) eye movements from each monkey were tested for differences between stimulus conditions. In experiment 1, the eye movements between conditions were not significantly different in monkeys M1 and M3 [M1: X: F(4,51) = 1.9, P = 0.1; Y: F(4,51) = 1.2, P = 0.3; M3: X: F(4,77) = 1.7, P = 0.1; Y: F(4,77) = 2.2, P = 0.07]. In monkey M2, horizontal eye movements showed a main effect of stimulus condition [M2: X: F(4,89) = 7.6, P < 0.05]. Pairwise comparisons revealed significantly less horizontal eye movements between the face condition and the food, scene, and object conditions (fa < fd, sc, ob, P < 0.05). Furthermore, M2's vertical eye movements also showed a main effect of stimulus condition [M2: Y: F(4,89) = 6.5, P < 0.05]. Pairwise comparisons revealed significantly less vertical eye movements for faces compared with scenes and objects (fa < sc, ob, P < 0.05). Such differences in eye movements between the face condition and the other conditions are likely to reduce the sensitivity in detecting category-selective activations and may account for the smaller number of face ROIs in monkey M2 (see Fig. 2 A). In experiment 2, monkeys M1 and M2 showed significantly more eye movements while viewing objects than while viewing faces. These differences in eye movements may account for the smaller number of face-selective areas found in both monkeys compared with monkey M3, who showed the largest number of face-selective areas [M1: X: F(2,65) = 10.2; ob > ufa, ifa, P < 0.05; M2: X: F(2,113) = 7.3; ob, ifa > ufa, P < 0.05; Y: F(2,113) = 3.9; ob, ifa > ufa, P < 0.05; M3: X: F(2,62) = 2.2, P = 0.12; Y: F(2,62) = 1.1, P = 0.33].
Human data analysis.
Analysis of the human data followed procedures similar to the analysis of the monkey data. Standard-mesh surfaces were created for each of the six human subjects just as with the monkey subjects, using an average curvature template of the humans. Functional image volumes were aligned to a volume acquired immediately prior to the whole-brain anatomical image and then mapped onto the standard-mesh surfaces. The surface-mapped data were spatially filtered with a 4-mm Gaussian kernel and each time series was normalized to its mean. Square-wave functions matching the time course of the experimental design were convolved with a gamma-variate function and used as regressors of interest in a multiple regression model. In total, for experiment 1, there were 10 regressors of interest created for the model (i.e., each of the five stimulus categories separated by task type). In addition, regressors that accounted for variance due to baseline shifts between time series, linear drifts within time series, and head motion parameter estimates were included in the regression models. For experiment 1, brain regions responding more strongly to faces and body parts, regardless of task type, were identified by contrasting presentation blocks of faces with objects and body parts with objects, respectively. Similar to results in the monkey, we could not identify any consistent brain regions that responded more strongly to the food stimuli. For experiment 2, brain regions responding more strongly to faces were identified by contrasting presentation blocks of upright faces with objects. The statistical maps were thresholded at an F-score of 15.2 (P < 0.0001, uncorrected).
ROIs were located by identifying clusters in anatomically consistent locations in at least three of the six subjects. ROIs were restricted to lateral and ventral cortex for comparison to the monkey data, which did not cover the entire animals' brains. For both experiments 1 and 2, the spatially smoothed fMRI signals were averaged across all activated nodes within a given ROI across scans and normalized to the measurement immediately preceding stimulus onset. For each subject, the six measurements of the fMRI signal obtained during each condition, adjusted for hemodyamic lag, were averaged, resulting in mean signal changes. Statistical significance was determined by repeated-measures ANOVAs on the six measurements of the fMRI signals. For each subject, structural images were transformed into Talairach space and linked to the surface reconstructions using AFNI software to obtain the standard coordinates for each ROI (Talairach and Tournoux 1988). Preferred stimulus category selectivity was determined for each ROI by calculating d′ as described earlier in the monkey analysis section.
Experiment 1: object category representations
Face-selective activations: macaque.
Areas that responded more strongly to faces than to objects were found in temporal cortex of both monkeys and humans. In the monkeys, these areas were restricted to the middle and anterior portions of the STS, while in the humans they were distributed across several regions of the temporal lobe including both ventral and lateral regions. To reduce the probability of including false-positive activations, only activations in consistent anatomical locations in two of the three monkeys or in three of the six humans were defined as ROIs for further analyses.
To identify candidate face-selective areas in the monkey, the face and object conditions were contrasted and the resulting activations were examined. As shown in Fig. 2A (and Supplemental Fig. S2), two consistent bilateral activations were found in the mid-STS region at approximately A6/7 in the three monkeys. One activation was located in the lower bank of the STS and MTG, whereas another activation was found within the STS fundus and encroaching onto the upper bank of the STS and STG. These two activations are labeled “MLfa” (middle lateral) and “MFfa” (middle fundus), respectively. In the right hemisphere of M3, these two activations were connected by a narrow strand of nodes. In addition to these two mid-STS activations, three bilateral activations were found across the anterior STS region, between A12 and A14. One activation was located in the lower bank of the STS and MTG. Another activation was found in the STS fundus. A third anterior activation was found in the upper bank of the STS and STG. These three activations are labeled “ALfa” (anterior lateral), “AFfa” (anterior fundus), and “ADfa” (anterior dorsal), respectively. In the left hemisphere of M1, the ALfa and AFfa were connected by a narrow strand of activated nodes. Many of the anterior activations in M2 were absent, and only the right AFfa area was identified.
To examine whether the above-defined ROIs appeared in anatomically consistent locations across the three monkeys, the data were analyzed on standard-mesh surfaces created from an average template of the monkeys' brains. Because the topology of standard-mesh surfaces are similar across subjects, nodes that correspond to an anatomical region in one subject will also correspond to a similar anatomical region in the other subjects. This can be easily visualized by assigning similar colors to similarly numbered surface nodes across subjects (Supplemental Fig. S3). Using standard-mesh surfaces allows data from one subject to be mapped onto the surface of another subject while maintaining anatomical relationships. To examine the ROIs across the monkeys, the ROIs from all three monkeys were mapped onto the standard-mesh surface of a single monkey's brain, and color-coded by monkey subject to form a conjunction map (Fig. 2B). All ROIs showed remarkable spatial consistency across the three monkeys. Many of the ROIs showed a significant amount of overlap, or were located adjacent to one another.
To examine the face selectivity of each ROI, response properties were studied by performing a time course analysis of the fMRI signals. As shown in Fig. 2C, the time course analysis revealed a main effect of category presentation condition in all five ROIs (MLfa: F(3,14) = 40.4, P < 0.05; MFfa: F(3,14) = 20.3, P < 0.05; ALfa: F(3,9) = 4.9, P < 0.05; AFfa: F(3,14) = 6.8, P < 0.05; ADfa: F(3,9) = 10.5, P < 0.05). As noted in the Methods section, the objects category was excluded from the ANOVA to reduce the likelihood of biasing the analysis. Pairwise comparisons of the remaining four categories revealed that MLfa, MFfa, and ADfa responded most strongly in the face condition (fa) as compared with the other stimulus conditions (MLfa, MFfa, ADfa: fa > bp, fd, sc, P < 0.05). The other two ROIs, ALfa and AFfa, responded equally strong to both the face condition and the body part condition, which were significantly greater than the responses to the other conditions (AFfa: fa > fd, P < 0.05; bp > fd, sc, P < 0.05; ALfa: fa > fd, sc, P < 0.05; bp > fd, sc, P < 0.05). These response profiles were confirmed by computing a selectivity index for each area that used d' (Fig. 2D).
Taken together, the data from all three monkeys suggest the existence of at least three areas in the macaque temporal cortex that respond selectively to faces. Two areas were found in the mid-STS region, and one area was found in the anterior STS region. Table 1 summarizes the volume sizes of each area. No hemispheric differences in cluster size were found in these areas, thereby not confirming previous reports of laterality effects obtained with a smaller number of monkey subjects (Pinsk et al. 2005a). The response profiles and selectivity indices suggest the MLfa, MFfa, and ADfa areas to be solely face-selective areas, while the ALfa and AFfa, areas showed selectivity for both faces and body parts.
Face-selective activations: human.
Candidate face-selective areas in the humans were identified in the same manner as in the monkeys, except that the face condition contained human face stimuli as opposed to monkey face stimuli. Several ventral occipitotemporal and lateral temporal ROIs were identified. Figure 3 A (and Supplemental Fig. S4A) shows the ventral and lateral activations in three of the six human subjects. A posterior ROI along the lateral and inferior occipital cortex was identified in all six subjects, consistent with previous reports of an “occipital face area” (OFA) (Gauthier et al. 2000; Rossion et al. 2003). Within the fusiform gyrus, two ROIs were identified. The locations of these two ROIs were consistent with previous reports of a “fusiform face area” (FFA), and have been labeled “FFA-1” in six subjects and “FFA-2” in five subjects (Kanwisher et al. 1997; McCarthy et al. 1997; Ishai et al. 1999). A fourth ROI was identified in ventral temporal cortex of three subjects, about 30–40 mm anterior to the fusiform ROIs, and has been labeled “AT” (anterior temporal). In addition to the above four ventral occipitotemporal ROIs, three lateral ROIs were identified along the STS. A posterior STS ROI (“pos-STS”) was identified in all six subjects, consistent with previous reports of a posterior STS face-selective area (Hoffman and Haxby 2000). This activation often extended in the posterior direction past the STS encroaching onto the MTG and onto the adjacent lateral occipitotemporal cortex. A middle STS ROI (“mid-STS”) was identified in five subjects about 30 mm anterior to the pos-STS ROI. In addition, an anterior STS ROI (“ant-STS”) was identified in five subjects about 30 mm more anterior to the mid-STS ROI. Activations along the STS are consistent with reports implicating the STS with the processing of moving and static biological stimuli (for a review see Allison et al. 2000). The spatial variability of the ROIs in the human was examined using standard-mesh surfaces created from an average template of the six subjects' brains. As shown in Fig. 3D, despite the greater anatomical variability of individual human brains, many of the ROIs were easily distinguished on a group conjunction map, thereby qualitatively demonstrating the remarkable consistency of the ROIs across subjects.
A time course analysis of the fMRI signals was performed on each of the ROIs in the group of subjects and collapsed across task type to examine their response profiles to the category stimuli. As shown in Fig. 3B, a main effect of category presentation condition was found in all ROIs [OFA: F(3,5) = 9.8, P < 0.05; FFA-1: F(3,5) = 50.4, P < 0.05; FFA-2: F(3,4) = 20.0, P < 0.05; AT: F(3,2) = 38.6, P < 0.05; postSTS: F(3,5) = 29.9, P < 0.05; mid-STS: F(3,4) = 10.9, P < 0.05; ant-STS: F(3,4) = 16.0, P < 0.05]. The four ventral ROIs showed significantly stronger responses to faces compared with the other stimulus categories (OFA, FFA-1, FFA-2, AT: fa > bp, fd, sc, P < 0.05). In addition, in FFA-1, the response to body parts was significantly stronger than the response to the other stimulus categories (FFA-1: bp > fd, sc, P < 0.05). The three lateral STS ROIs showed strong responses evoked by both the face and the body part stimuli in the pos-STS, and strong face-selective responses in the other two ROIs (pos-STS: fa, bp > fd, sc, P < 0.05, fa > bp, P = 0.26; mid-STS: fa > fd, sc, P < 0.05, fa > bp, P = 0.08; ant-STS: fa > bp, fd, sc, P < 0.05). Face and body part selectivity for each ROI was further quantified with a d′ selectivity index, further confirming the observed response profiles (Fig. 3C).
Taken together, the macaque data suggest the existence of at least three areas that respond selectively to faces. In contrast, the data from the human subjects suggest the existence of six areas in the human occipitotemporal cortex that respond selectively to faces. Four areas were found in ventral occipitotemporal cortex (i.e., OFA, FFA-1, FFA-2, AT), and two areas were found along the STS region (i.e., mid-STS, ant-STS). Tables 2 and 3 provide the average Talairach coordinates and activated volume sizes in each of the human subjects, respectively. As shown in Table 2, many of the face-selective areas showed larger activations in the right hemisphere, but paired t-test did not reach statistical significance. Response profiles and selectivity indices of each area showed strong face selectivity in all of the areas, but also equally strong body part selectivity in the pos-STS area. Body part selectivity was also stronger in the fusiform ventral areas (FFA-1 and FFA-2) compared with the other areas.
Body part–selective activations: macaque.
A single area in the macaque and four areas in the human were found that responded more strongly to pictures of body parts than to objects. In the macaque, comparing the monkey body part condition to the object condition revealed an area in the anterior temporal cortex. This area, labeled ALbp, was located along the ventral lip of the STS and MTG at approximately A12, A14, and A15 in monkeys M1, M2, and M3, respectively (Fig. 4 A, Supplemental Fig. S5). The individual volume sizes for this ROI are summarized in Table 1. Monkey M1 showed a uniquely large activation that extended from the anterior lateral region of the STS into the fundus of the sulcus. An assessment of the spatial variability of this area across monkey subjects using standard-mesh surfaces showed remarkable spatial consistency along the anterior lower bank of the STS/MTG (Fig. 4B).
The response properties of this ALbp ROI were quantified by analyzing the time course of the fMRI signals, averaged across hemispheres (see Supplemental Fig. S1 for response profiles of individual hemispheres). The ROI showed a main effect of category presentation condition [ALbp: F(3,14) = 13.2, P < 0.05] with the strongest response evoked by the body part stimuli (Fig. 4C). The response evoked by body parts was significantly stronger than the responses to the other stimulus categories (bp > fa, fd, sc, P < 0.05). Similarly, the d′ selectivity index for this ROI showed body part, but not face, selectivity (Fig. 4D).
Body part–selective activations: human.
Comparing the human body part condition to the object condition in the human subjects revealed four ROIs that responded more strongly to pictures of body parts (Fig. 5 A). Two ROIs were located ventrally in the fusiform gyrus at a location close to the two face-selective areas in the fusiform gyrus. These ROIs appear to match the location of the “fusiform body area” reported by Peelen and Downing (2005), and are labeled “FBA-1” and “FBA-2.” A more posterior and lateral ROI was identified in occipitotemporal cortex, with Talairach coordinates matching those of the “extrastriate body area” (“EBA”) reported by Downing and colleagues (2001). A fourth ROI was identified in the posterior STS region (“pos-STS”). Similar to the face-selective areas, the body part–selective areas appeared in similar spatial locations across subjects on a standard-mesh surface (Fig. 5D).
A time course analysis was performed to examine the response profiles of each ROI, averaged across subjects and hemispheres. As shown in Fig. 5B, a main effect of category presentation condition was found in all ROIs [EBA: F(3,5) = 18.0, P < 0.05; FBA-1: F(3,5) = 13.6, P < 0.05; FBA-2: F(3,4) = 17.4, P < 0.05; pos-STS: F(3,4) = 15.7, P < 0.05]. Pairwise comparisons of the responses in the EBA and FBA-2 revealed significantly stronger responses to body parts compared with any of the other stimulus categories (EBA, FBA-2: bp > fa, fd, sc, P < 0.05). The responses to body parts in the pos-STS and FBA-1 were greater than the responses to any of the other stimulus categories, except for faces (pos-STS: bp > fd, sc, P < 0.05; bp > fa, P = 0.76; FBA-1: bp > fd, sc, P < 0.05; bp > fa, P = 0.36). The responses to faces in FBA-2 and pos-STS were significantly stronger compared with many of the other stimulus categories (FBA-2: fa > fd, sc, P < 0.05; pos-STS: fa > fd, sc, P < 0.05). Selectivity indices for each ROI confirmed the observed response profiles (Fig. 5C).
Taken together, the macaque data suggest the existence of at least one area located in the anterior ventral bank of the STS/MTG that responds selectively to body parts. In comparison, in the human two ventral areas along the fusiform gyrus were identified that responded strongly to body parts, and to a lesser degree, to faces. A posterior STS area was also identified that responded equally strong to both faces and body parts. Furthermore a lateral area in occipitotemporal cortex was identified that showed the strongest body part response, and did not respond to faces.
Spatial relation of face- and body part–selective areas.
The comparison of faces to objects and that of body parts to objects revealed activations in the anterior STS region of the macaque. When comparing the locations of these activations in each monkey, they were found to be in close proximity to one another, often overlapping. In both hemispheres of M1, the ALbp and AFbp areas were almost completely overlapping with the ALfa and AFfa areas (Fig. 6 A). In the right hemisphere of M2, ALbp was in close proximity to AFfa. In the left hemisphere of M3, ALfa was completely overlapping and surrounded by ALbp, which extended into the fundus and partially overlapped with AFfa. In the right hemisphere of M3, ALfa partially overlapped the ALbp. On average, 57.9 ± 13.1% of ALfa and AFfa nodes overlapped with ALbp and AFbp nodes, whereas 29.0 ± 5.5% of ALbp and AFbp nodes overlapped with ALfa and AFfa nodes.
For the putative face-selective areas, the large amount of overlap (57.9 ± 13.1%) may explain the similar face and body part selectivity (see Fig. 2, C and D). To test this possibility, a time course analysis was performed using the ALfa ROI after excluding the nodes that overlapped with body part–selective nodes. As shown in Fig. 6B, the ALfa area showed a stronger response to faces compared with body parts and other stimulus categories once the overlapping nodes were excluded, thereby supporting the “overlap” hypothesis [ALfa: F(3,9) = 9.8, P < 0.05; fa > bp, fd, sc, P < 0.05].
Although the local topographical arrangement of the anterior category-selective areas may vary, depending on individual anatomical differences, the overall global topography of these areas appeared similar: the anterior areas were located adjacent to each other and portions of the areas often overlapped. Likewise, in the humans, both body parts and faces were represented in several adjacent and overlapping areas on the fusiform gyrus (Fig. 6C; Peelen and Downing 2005). On average, 49 ± 7% of fusiform face-selective areas overlapped with fusiform body part–selective areas, and 49 ± 10% of fusiform body part–selective areas overlapped with face-selective areas. Despite the fact that these areas might be dissociable when using a higher spatial resolution (Schwarzlose et al. 2005), our human subjects were scanned at a more conventional resolution so that the data were more comparable across species given the small macaque brain size. In contrast, the pos-STS area also showed a large amount of overlap in face and body part representations (58 ± 10 and 68 ± 10%, respectively). Furthermore, the pos-STS area appeared to be an anatomically variable and large area that responded equally to both stimulus categories, as opposed to the discrete and anatomically consistent areas on the fusiform gyrus (FBA-1/2, FFA-1/2) that tended to overlap. However, this observation may be a result of our spatial resolution, and higher resolution scanning may yield more isolated face- and body part–selective areas in the pos-STS region similar to what has been shown in the fusiform gyrus.
Taken together, the face- and body part–selective areas in the anterior ventral bank of STS/MTG of the macaque appeared adjacent and overlapping at the current resolution of our MR techniques. We also identified face- and body part–selective areas in the human fusiform gyrus that were adjacent and overlapping at a spatial resolution comparable to the monkey data. Furthermore, we identified a large heterogeneous region in the posterior STS vicinity that contained a large amount of overlapping face- and body part–selective nodes. If spatial topography is used as a criterion to find similar areas across species, these results suggest a potential correspondence between the macaque ALfa/ALbp areas and either the human fusiform cortex or the posterior STS region.
Effect of task type on category selectivity.
Whereas the monkey subjects were trained to maintain fixation passively during the experiment, the human subjects were instructed to either do the same task or to perform a one-back memory task. Since many of the human fMRI studies on object category representations have been carried out with subjects performing a one-back memory task, it is important to confirm that the selectivity of these areas does not change when subjects perform a less demanding task.
Using the same ROIs as in the prior human data analysis, the MR signal time courses were extracted from each category-selective area and analyzed separately for each task. Supplemental Figs. S6 and S7 show the response profiles of each area when subjects performed the fixation task and the one-back task. Repeated-measures ANOVA showed a main effect for object category within each of the face- and body part–selective areas [face-selective areas: OFA: F(3,15) = 9.8, P < 0.05; FFA-1: F(3,15) = 50.4, P < 0.05; FFA-2: F(3,12) = 20.0, P < 0.05; AT: F(3,6) = 38.6, P < 0.05; pos-STS: F(3,15) = 30.0, P < 0.05; mid-STS: F(3,12) = 10.9, P < 0.05; ant-STS: F(3,12) = 16.0, P < 0.05; body part–selective areas: EBA: F(3,15) = 18.0, P < 0.05; FBA-1: F(3,15) = 13.6, P < 0.05; FBA-2: F(3,12) = 17.4, P < 0.05; pos-STS: F(3,12) = 15.7, P < 0.05], and a main effect for task type within a subset of areas due to increased response amplitudes during the one-back task [face-selective areas: FFA-1: F(1,5) = 44.1, P < 0.05; pos-STS: F(1,5) = 10.1, P < 0.05; body part–selective areas: EBA: F(1,5) = 55.9, P < 0.05; FBA-1: F(1,5) = 26.1, P < 0.05]. Importantly, no interactions were observed between object category and task type in any of the category-selective areas. In general, regardless of the task type, the response profiles of each category-selective area remained qualitatively unchanged under active and passive task conditions with smaller response amplitudes during passive conditions.
Experiment 2: face inversion
Monkey and human subjects participated in a second experiment in which they viewed upright and inverted faces, and objects. Stimulus inversion has been widely used to test for “holistic” processing (Leehey et al. 1978; Yin et al. 1969, 1970). For example, a reduced behavioral or neural response evoked by inverted faces as compared with upright faces has been interpreted to suggest that faces are processed in their entirety, as a whole, as opposed to piecemeal or component processing (Yin et al. 1969, 1970; Yovel and Kanwisher 2004; 2005). Following this rationale, we examined whether face-selective areas showed reduced responses to inverted faces in humans and macaques, thereby gaining insights regarding the type of face processing occurring within each area.
Face inversion: macaque.
All three monkeys were presented with upright faces, inverted faces, and objects. Voxels that were activated when contrasting upright faces and objects were used as regions of interest in a time course analysis. All five of the areas identified in experiment 1 were also identified in experiment 2. Area ADfa was activated bilaterally in only one monkey subject (M3).
As shown in Fig. 7 A, all five areas showed a main effect of category presentation condition (MLfa: F(2,9) = 11.4, P < 0.05; MFfa: F(2,9) = 4.0, P < 0.05; ALfa: F(2,14) = 17.5, P < 0.05; AFfa: F(2,9) = 11.0, P < 0.05; ADfa: F(2,4) = 14.7, P < 0.05). Both areas MLfa and ALfa, located along the ventral bank of the STS/MTG, discriminated upright (ufa) and inverted (ifa) faces from each other and from objects (ob) (ufa versus ifa versus ob: MLfa: 1.24%±0.24 versus 1.03%±0.39 versus 0.64%±0.07; ALfa: 0.74%±0.03 versus 0.56%±0.19 versus 0.28%±0.11). The two areas located in the STS fundus (MFfa and AFfa) discriminated upright faces from the other two stimulus categories, but processed inverted faces no different from object stimuli (MFfa: 1.36%±0.11 versus 0.81%±0.16 versus 0.65%±0.20; AFfa: 1.37%±0.16 versus 0.86%±0.47 versus 0.71%±0.11). Finally, the most dorsal area, ADfa, distinguished between face and object stimuli, but did not discriminate between upright and inverted faces (ADfa: 0.89%±0.10 versus 0.94%±0.05 versus 0.30%±0.01). Taken together, the differences in responses between the ventral STS/MTG areas and the other areas suggest differences in functional roles for face processing across areas. The observation that areas in the fundus and upper bank of the STS/STG did not discriminate the inverted faces from objects suggests that these areas may process faces in more “holistic” ways, as compared with the other areas.
Face inversion: human.
Six human subjects participated in the second experiment examining face inversion. Similar to experiment 1, four areas were identified in ventral-temporal cortex, and three areas were identified in lateral temporal cortex. All areas showed a main effect of category presentation [OFA: F(2,5) = 14.1, P < 0.05; FFA-1: F(2,5) = 12.4, P < 0.05; FFA-2: F(2,4) = 13.7, P < 0.05; AT: F(2,2) = 10.8, P < 0.05; pos-STS: F(2,5) = 21.4, P < 0.05; mid-STS: F(2,5) = 4.4, P < 0.05; ant-STS: F(2,3) = 6.3, P < 0.05]. As shown in Fig. 7B, all of the four ventral areas (OFA, FFA-1, FFA-2, AT) showed statistically significant differences or clear trends toward discriminating between upright faces (ufa) and inverted faces and objects (ifa), and also toward discriminating between inverted faces and objects (ob) (OFA: 1.73 ± 0.26 vs. 1.50 ± 0.26 vs. 0.87 ± 0.27%; FFA-1: 1.39 ± 0.14 vs. 1.05 ± 0.26 vs. 0.50 ± 0.21%; FFA-2: 0.84 ± 0.24 vs. 0.46 ± 0.24 vs. 0.36 ± 0.16%; AT: 0.53 ± 0.09 vs. 0.27 ± 0.18 vs. −0.15 ± 0.23%). In contrast, the responses in the three lateral areas (pos-STS, mid-STS, ant-STS) suggested that they discriminated only upright faces from inverted faces and objects and treated inverted faces and objects similarly (pos-STS: 0.48 ± 0.09 vs. 0.04 ± 0.14 vs. −0.15 ± 0.11%; mid-STS: 0.35 ± 0.09 vs. 0.08 ± 0.24 vs. −0.15 ± 0.22%; ant-STS: 0.23 ± 0.04 vs. −0.19 ± 0.13 vs. −0.23 ± 0.14%). The response differences evoked by inverted faces observed in the ventral-temporal areas and the lateral STS areas suggest that the lateral areas may be engaged in more “holistic” processing compared with the ventral ones. These response differences were similar to those found in the macaque, between face-selective areas in the ventral bank of the STS/MTG and face-selective areas in the fundus and dorsal bank of the STS/STG.
We directly compared the spatial topography and response profiles of areas responding selectively to biologically relevant categories of objects in macaques and humans with virtually identical experimental procedures. In the macaque, at least three face-selective areas were identified in and around the STS. Two areas were found in the middle STS: one along the ventral lip (MLfa) and the other in the fundus and dorsal lip (MFfa). One area was found in the anterior STS, along the dorsal lip (ADfa). Two additional areas in anterior temporal cortex showed both face and body part selectivity at our current spatial resolution. All areas showed remarkable anatomical consistency across the three monkeys, as was revealed by analyses based on standard-mesh surfaces. In comparison, four ventral (OFA, FFA-1, FFA-2, AT) and three lateral (pos-STS, mid-STS, ant-STS) putative face-selective areas were identified in the human subjects. In the macaque, a body part–selective area (ALbp) was found adjacent and overlapping with a putative face-selective area (ALfa). In the human, ventral body part–selective areas (FBA-1, FBA-2) were found adjacent and partially overlapping with ventral face-selective areas (FFA-1, FFA-2) and large amounts of overlapping face- and body part–selective nodes were found in the posterior STS region as well. In both species, the inversion of face stimuli revealed different response profiles across face-selective areas. Macaque face-selective areas on the ventral bank of the STS/MTG discriminated all three stimulus classes from each other (i.e., upright faces, inverted faces, and objects). However, face-selective areas in the STS fundus and upper bank discriminated only upright faces from inverted faces and objects. Human face-selective areas in ventral-temporal cortex discriminated upright faces from objects and showed a response to inverted faces that fell in between the two, suggesting discriminability across all three stimulus classes. In contrast, human face-selective areas in lateral STS cortex tended to discriminate only upright faces from the other two stimulus classes. Taken together, these findings suggest several potential correspondences between category-selective areas in humans and monkeys based on the response profiles obtained in experiments 1 and 2 and the spatial relationship of category-selective areas, which will be discussed in the following text.
Relation to previous monkey neuroimaging and physiology studies
Our findings of several face-selective areas confirm and extend previous monkey neuroimaging studies. Our posterior face-selective areas were found in the mid-STS at approximately A6, anterior to ventral V4, and similar in location to the “middle face-selective area” reported by Tsao and colleagues (2003a); and the “posterior face (pFace) area” reported by Pinsk and colleagues (2005a) and Hadj-Bouziane and colleagues (2008). Most recently, Moeller and colleagues (2008) and Tsao and colleagues (2008) reported the existence of six face-selective areas, many of which were in similar locations to the areas we report. The posterior areas reported in these two studies include a posterior/lateral area (PL), a middle/lateral area (ML), and a middle area in the STS fundus (MF). In contrast, we were not able to disambiguate two lateral areas and, instead, identified a single area (MLfa) along the ventral lip of the STS/MTG and a more dorsal area that was often located within the STS fundus and toward the upper bank (MFfa). In addition to the posterior areas, we identified several anterior areas as well that appeared in similar locations to a previously reported anterior STS face-selective area (Hadj-Bouziane et al. 2008; Pinsk et al. 2005a; Tsao et al. 2003a). Two of the anterior areas (ALfa and AFfa) were in similar locations to the AL and AF areas reported by Moeller and colleagues (2008). However, we also identified a more dorsal area (ADfa) along the dorsal bank of the STS and onto the superior temporal gyrus. A body part–selective area (ALbp) was identified on the lower bank of the anterior STS and MTG that was adjacent to and partially overlapped the location of the ALfa area (Pinsk et al. 2005a). This area is apparently different from a previously reported body-selective area that was activated by grayscale images of entire (headless) human bodies and was located more posterior in close proximity to the posterior face-selective areas (Tsao et al. 2003a).
In the macaque, neurons responding selectively to faces and body parts have been found on both banks and fundus of the STS and, less prominently, on the lateral and ventral convexity of the MTG in IT cortex (Baylis et al. 1987; Bruce et al. 1981; Desimone et al. 1984; Gross et al. 1969, 1972; Perrett et al. 1982; Tanaka et al. 1991; Yamane et al. 1988). Face cells have been most frequently found on both the upper bank and the lower bank and lip, where they constitute 20–30% of the overall population (Baylis et al. 1987; Desimone et al. 1984; Perrett et al. 1982, 1984). In good agreement with these findings, the areas that we identified were found in locations ranging from the lateral convexity of IT to the superior temporal gyrus. Although it may be argued that some of our areas were separated artificially due to partial volume effects at our current spatial resolution (such as areas ALfa and ADfa, which are located on direct opposite banks of the STS), several reasons speak against this possibility. First, the data were analyzed on the cortical surface to improve the specificity of spatial smoothing (Jo et al. 2007, 2008). Second, the response profiles of each area to the five stimulus categories revealed differences in face selectivity, suggesting that neuronal populations with different response properties contributed to the fMRI signals measured in these different areas. Third, the effects of face inversion differentially modulated activity, again suggesting that the underlying neural populations subserved different functions. Finally, recent results by Moeller and colleagues (2008) corroborate our findings by showing specific interconnected networks between similarly located face-selective areas using electrical microstimulation and fMRI.
Neurons responding to views of entire bodies, body postures, and body actions have been found dorsally on the upper bank of the STS in the superior temporal polysensory (STP) area (Bruce et al. 1981; Oram and Perrett 1996; Wachsmuth et al. 1994;). In area TE, on the ventral bank of the STS and laterally on the inferior temporal gyrus, neurons responding selectively to pictures of hands have been reported (Desimone et al. 1984; Gross et al. 1969, 1972). Our body part–selective area (ALbp) was located on the lower bank of the STS and the lateral convexity of the MTG, likely in TEa and TEm. This location is similar to the reported locations of hand cells (Desimone et al. 1984; Gross et al. 1969, 1972) and it does not seem to overlap with the reported locations of cells selective for biological motion in the upper regions of the STS/STG (Bruce et al. 1981; Oram and Perrett 1996; Wachsmuth et al. 1994). Given the location of our body part–selective area, this area may contain cells similar to those reported by Gross et al. (1972) and Desimone et al. (1984). The ALbp area is located anterior to an area in the STS reported by Tsao and colleagues (2003a) that responded most strongly to headless human bodies.
Interestingly, however, although most of these physiology studies in macaques covered the STS from 3 to 21 mm anterior to the interaural line, no evidence was found for areas consisting mainly of category-selective face or body part cells. It is not yet clear what exact percentage of category-selective cells would be necessary to evoke a BOLD response. Recently, Tsao and colleagues (2006) recorded from an fMRI-defined face-selective area, their “middle face-selective area,” that appears to correspond to our MLfa or MFfa areas, and reported that the area consisted almost entirely of face-selective cells (Tsao et al. 2006). It is possible that Tsao and colleagues (2006) recorded from a small, highly concentrated subregion of that face-selective area and other subregions may contain a lower concentration of face-selective cells. A recent study by Kiani and colleagues (2007) reported that IT cells with similar category selectivity were found in the same electrode penetration, but not in neighboring penetration sites. However, their results do not preclude the possibility of large spatial clusters that are not spherical in shape and, furthermore, one may consider the possibility that multiple, small selective clusters could form a spatially large BOLD activation. Future fMRI-guided physiology studies will be necessary to establish the link between cell concentration and BOLD responses.
Comparison of human and monkey category-selective areas
In close agreement with previous human studies, we identified ventral face-selective areas in the inferior occipital gyrus (OFA) and fusiform gyrus (FFA-1, FFA-2) (Ishai et al. 1999; Kanwisher et al. 1997; McCarthy et al. 1997; Puce et al. 1996; Rossion et al. 2003). Although typically a single area along the fusiform gyrus has been reported, we identified two spatially distinct areas in a majority of subjects. Furthermore, we identified a face-selective area in anterior ventral temporal cortex in half the subjects, located in or near the collateral sulcus. This area has rarely been reported in previous fMRI studies (see Tsao et al. 2008), but has been suggested by previous electrophysiology and positron emission studies (Allison et al. 1994, 1999; Sergent et al. 1992). It is typically difficult to obtain fMRI signals from this region due to low SNR as a result of nearby susceptibility artifacts (Gorno-Tempini et al. 2002). It is possible that our use of surface-based smoothing increased the sensitivity in the anterior temporal cortex sufficiently to permit the identification of this area in a few of our subjects. In addition to these ventral areas, we identified three face-selective areas along the human STS. Consistent with previous studies, our posterior STS area included activations that were located both within the sulcus and on neighboring cortex, including the angular, superior, and middle temporal gyri (Allison et al. 2000; Hoffman and Haxby 2000; Puce et al. 1998). Two additional areas were found in at least half the subjects in the middle and anterior STS. These areas may be similar to previously reported areas along the STS that have been implicated in the processing of facial expressions and social cues (Allison et al. 2000; Martin and Weisberg 2003; Ojemann et al. 1992; Winston et al. 2004). We also confirmed previous reports of body part–selectivity in human ventral temporal cortex and lateral occipitotemporal cortex (Downing et al. 2001; Peelen and Downing 2005; Schwarzlose et al. 2005). Similar to the two face-selective areas along the fusiform gyrus, we were able to identify two distinct body part–selective areas in the same brain region that previous studies have identified only one. Furthermore, we found that the posterior STS area, traditionally associated with face processing, was equally responsive to body parts. This is consistent with reports implicating the STS with the perception of actual and implied biological motion (Allison et al. 2000; Grossman and Blake 2002; Puce and Perrett 2003; Vaina et al. 2001). Importantly, our data suggest that both face and body part processing can occur in the same pos-STS region (also see Hein and Knight 2008 for additional functions associated with the STS).
We compared the response profiles of these human face- and body part–selective areas with those found in the macaque. Both posterior monkey face-selective areas (MLfa and MFfa) and the anterior dorsal area (ADfa) responded most strongly to faces, unlike the anterior lateral (ALfa) and anterior fundus (AFfa) areas, which showed similarly strong responses to both faces and body parts. In comparison, whereas the majority of the human face-selective areas responded most strongly to faces, the pos-STS and FFA-1 areas showed different response profiles. The pos-STS area did not discriminate between faces and body parts—both categories showed an equal response significantly higher than the other object categories. The FFA-1 area discriminated both faces and body parts from each other and from the other stimulus categories. The monkey body part–selective area (ALbp) responded most strongly to body parts, similar to the human EBA and FBA-2. Thus if the response profiles alone are used as a basis for comparison and correspondence, the anterior areas in the macaque (ALfa/AFfa) may correspond to the human pos-STS or FFA-1, whereas the macaque body part–selective area (ALbp) may correspond to the human EBA or FBA-2.
In addition to response profiles, the spatial relationships between areas may be used as a further criterion that helps establish commonalities across species. An examination of the spatial topography of the category-selective areas in both species reveals that when using comparative resolutions relative to brain size, the macaque ALbp and ALfa tend to be adjacent and overlap, similar to the face- and body part–selective areas along the human fusiform gyrus—suggesting that the anterior ventral STS/MTG region in the monkey may be organized similarly to the human fusiform gyrus. It should be noted that a recent study using higher spatial resolution found that the human FFA and FBA are separate areas that do not overlap (Schwarzlose et al. 2005). It may be the case that if the macaque brain were scanned at a higher resolution, a similar dissociation of face- and body part–selective areas may be revealed. In fact, we performed a relatively coarse analysis that redefined ALfa to exclude body part–selective nodes and the result from that analysis suggests that, indeed, there may be two separate areas that are difficult to distinguish at the current spatial resolution. Furthermore, it is worth noting that the human pos-STS area displayed a much more complete overlap of face and body part selectivity, compared with the areas along the fusiform gyrus, suggesting the pos-STS area is just one area that processes both stimulus categories. It should also be noted that the macaque AFfa area also showed a strong body part–selective response, but this is likely due to the unusually large body part activations found in monkey M1, which encompassed the AFfa. In contrast, Tsao and colleagues (2003a) suggested that their middle face area (which seems to correspond to our posterior face-selective areas) is homologous to the human FFA based on its location when computationally deformed onto a human flattened cortex.
In summary, based on the topographic relations of the areas and their response profiles and stimulus selectivity in experiment 1, the anterior areas in and near the ventral bank of the STS/MTG in the macaque may correspond to areas in the human fusiform gyrus or the posterior STS. In agreement with our results, Tsao et al. (2008) recently reported six macaque face-selective areas along the STS and three to five human face-selective areas in ventral temporal cortex in a study that compared face-selective areas in the macaque and human using BOLD and MION signals. In extension of these findings, our investigation of body part–selective responses and topography allowed us to suggest more specific correspondences between the anterior macaque areas and both the lateral and ventral human areas.
We presented both human and monkey subjects with upright and inverted conspecific faces to examine face processing within each face-selective area. Behavioral studies in humans have found that face recognition is severely impaired when faces are inverted (Leehey et al. 1978; Yin et al. 1969). Furthermore, patient studies have revealed that damage to the right hemisphere impairs recognition of upright faces, but not inverted faces, leading to the notion that the right hemisphere processes faces in terms of their feature configuration, whereas inverted faces are processed less holistically by both hemispheres (Yin 1970). However, recent human imaging studies have had difficulty identifying a neural correlate for face inversion. Often, only small decreases of activation levels during face inversion have been reported in face-selective areas (Aguirre et al. 1999; Epstein et al. 2006; Haxby et al. 1999; Kanwisher et al. 1998), but stronger signal reductions in the FFA and posterior STS have been reported when investigators used tasks that induced large behavioral inversion effects (Yovel and Kanwisher 2004, 2005). In addition, activation in the occipital face area has been shown to increase for inverted faces (Haxby et al. 1999). With human subjects performing a passive fixation task, we found that face inversion affected the responses in face-selective areas to varying degrees. The response profiles suggest a dissociation between the ventral areas and the lateral STS areas with regard to the processing of inverted faces. Although all areas showed some signal attenuation to inverted faces compared with upright faces, the ventral occipitotemporal areas, and not the lateral STS areas, could still discriminate between inverted faces and objects, suggesting the former areas may be processing faces less “holistically” compared with the latter areas.
The response profiles from the monkey areas also revealed face inversion differences between the ventral bank STS/MTG areas (MLfa and ALfa) and the other areas in the fundus and upper bank of the STS/STG (MFfa, AFfa, and ADfa). Whereas the ventral bank areas discriminated between all three stimulus conditions, the other areas discriminated only between upright and inverted faces. Our findings are consistent with single-cell physiology studies in macaques showing that the orientation of a face can reduce the response latency of face-selective cells (Perrett et al. 1988). Behavioral studies in normal and split-brain monkeys have reported reaction time advantages for recognizing and discriminating upright faces (Perrett et al. 1988; Vermeire and Hamilton 1998). However, other studies have failed to find inversion effects in monkeys (Bruce 1982; Rosenfeld and Van Hoesen 1979). Our face inversion results supplement the data from experiment 1, suggesting the lower bank STS/MTG areas in the macaque may correspond to the ventral areas in the human, whereas the upper bank STS/STG areas in the macaque may correspond to the lateral STS areas in the human. Such a correspondence has been suggested in single-cell physiology studies examining neural responses to facial identity and facial expression between the lower and upper banks of the STS (Hasselmo et al. 1989). Our results provide first evidence that fMRI-defined face-selective areas in the macaque may be functionally dissociated and their underlying roles in face perception may be disambiguated by future studies.
Using virtually identical experimental paradigms, methods, and analyses, we have used fMRI to compare face and body part representations in behaving monkeys and humans. We identified at least three face-selective areas and one body part–selective area in the monkey cortex. Two additional areas in anterior temporal cortex showed both face and body part selectivity at our current spatial resolution. We identified six human face-selective areas, one of which in anterior temporal cortex was newly found using fMRI. One additional area (i.e., posterior STS) showed both face and body part selectivity. Comparing the MR signals of the areas in the human fusiform gyrus and posterior STS and the macaque anterior STS revealed similar response profiles. Furthermore, an examination of the spatial topography of the category-selective areas across the two species revealed similarities between the human fusiform gyrus and posterior STS and the anterior STS of the macaque, all containing adjacent overlapping face- and body part–selective representations at comparable spatial resolutions with respect to differences in brain sizes across species. In a second experiment, the inversion of faces suggested a dissociation between ventral and lateral face-selective areas in the human and, similarly, between the face-selective areas along the ventral bank of the STS/MTG and those in the fundus and upper bank of the STS/STG in the macaque. Taken together, we have directly compared the organization of object category representations across species and have revealed several candidate areas that may have similar functions that need to be further characterized in future investigations.
This work was supported by a McKnight Foundation grant to S. J. Inati, National Institutes of Health Grants R01 EY-11347 to C. G. Gross and P50 MH-62196 to S. Kastner, National Science Foundation Grant BCS-0633281 to S. Kastner, and an Autism Speaks Foundation grant to S. Kastner.
We thank K. DeSimone for assistance with monkey training and scanning, K. J. Montgomery for assistance with data analysis, E. Zwemer for stimulus material creation, M. S. A. Graziano for help with monkey surgery, and C. Konen and Y. Saalmann for reading the manuscript.
Present addresses: K. S. Weiner: Department of Psychology, Jordan Hall Bldg 420, Stanford University, Stanford, CA 94301; J. F. Kalkus: 1585 Bevan Rd., Apt 22, Pittsburgh, PA 15227.
↵1 The online version of this article contains supplemental data.
- Copyright © 2009 the American Physiological Society
- Afraz et al. 2006.↵
- Aguirre et al. 1999.↵
- Aguirre et al. 1998.↵
- Allison et al. 1994.↵
- Allison et al. 2000.↵
- Allison et al. 1999.↵
- Andrews and Ewbank 2004.↵
- Argall et al. 2006.↵
- Baker et al. 2007.↵
- Baylis et al. 1987.↵
- Bruce 1982.↵
- Bruce et al. 1981.↵
- Cohen et al. 2000.↵
- Cohen 1997.↵
- Cox 1996.↵
- Dale et al. 1999.↵
- Damasio et al. 1982.↵
- Denys et al. 2004a.↵
- Denys et al. 2004b.↵
- Desimone et al. 1984.↵
- Downing et al. 2001.↵
- Engell and Haxby 2007.↵
- Epstein and Kanwisher 1998.↵
- Epstein et al. 2006.↵
- Fischl et al. 1999.↵
- Friston et al. 1995.↵
- Gauthier et al. 2000.↵
- Gorno-Tempini et al. 2002.↵
- Grill-Spector et al. 2006.↵
- Gross et al. 1969.↵
- Gross et al. 1972.↵
- Grossman and Blake 2002.↵
- Hadj-Bouziane et al. 2008.↵
- Harries and Perrett 1991.↵
- Hasselmo et al. 1989.↵
- Haxby et al. 2001.↵
- Haxby et al. 2000.↵
- Haxby et al. 1999.↵
- Hein and Knight 2008.↵
- Hoffman and Haxby 2000.↵
- Hoffman et al. 2007.↵
- Ishai et al. 1999.↵
- Jo et al. 2008.↵
- Jo et al. 2007.↵
- Kanwisher et al. 1997.↵
- Kanwisher et al. 1998.↵
- Kiani et al. 2007.↵
- Kourtzi et al. 2003.↵
- Koyama et al. 2004.↵
- Leehey et al. 1978.↵
- Logothetis et al. 1999.↵
- Martin and Weisberg 2003.↵
- McCarthy et al. 1997.↵
- McKenna and Warrington 1978.↵
- Moeller et al. 2008.↵
- Nakahara et al. 2002.↵
- Ojemann et al. 1992.↵
- Oram and Perrett 1996.↵
- Orban et al. 2003.↵
- Peelen and Downing 2005.↵
- Perrett et al. 1988.↵
- Perrett et al. 1982.↵
- Perrett et al. 1984.↵
- Pinsk et al. 2008.↵
- Pinsk et al. 2005a.↵
- Pinsk et al. 2005b.↵
- Puce et al. 1996.↵
- Puce et al. 1998.↵
- Puce and Perrett 2003.↵
- Rosenfeld and Van Hoesen 1979.↵
- Rossion et al. 2003.↵
- Saad et al. 2004.↵
- Sawamura et al. 2005.↵
- Schwarzlose et al. 2005.↵
- Sergent et al. 1992.↵
- Talairach and Tournoux 1988.↵
- Tanaka 1996.↵
- Tanaka et al. 1991.↵
- Taylor et al. 2007.↵
- Tsao et al. 2003a.↵
- Tsao et al. 2006.↵
- Tsao et al. 2008.↵
- Tsao et al. 2003b.↵
- Vaina et al. 2001.↵
- Vanduffel et al. 2002.↵
- Vermeire and Hamilton 1998.↵
- Wachsmuth et al. 1994.↵
- Wang et al. 1996.↵
- Whiteley and Warrington 1978.↵
- Winston et al. 2004.↵
- Yamane et al. 1988.↵
- Yin 1969.
- Yin 1970.↵
- Yovel and Kanwisher 2004.↵
- Yovel and Kanwisher 2005.↵