Regional cerebral blood flow (rCBF) PET scans were used to study the physiological bases of lipreading, a natural skill of extracting language from mouth movements, which contributes to speech perception in everyday life. Viewing connected mouth movements that could not be lexically identified and that evoke perception of isolated speech sounds (nonlexical lipreading) was associated with bilateral activation of the auditory association cortex around Wernicke's area, of left dorsal premotor cortex, and left opercular-premotor division of the left inferior frontal gyrus (Broca's area). The supplementary motor area was active as well. These areas have all been implicated in phonological processing, speech and mouth motor planning, and execution. In addition, nonlexical lipreading also differentially activated visual motion areas. Lexical access through lipreading was associated with a similar pattern of activation and with additional foci in ventral- and dorsolateral prefrontal cortex bilaterally and in left inferior parietal cortex. Linear regression analysis of cerebral blood flow and proficiency for lexical lipreading further clarified the role of these areas in gaining access to language through lipreading. The results suggest cortical activation circuits for lipreading from action representations that may differentiate lexical access from nonlexical processes.
Lipreading, or speechreading—the ability to extract speech information from the seen action of jaws, lips, tongue, and teeth—is a natural skill in hearing people. Lipreading can influence auditory speech perception. It improves the understanding of speech in noisy conditions (Reisberg et al. 1987). The fusion illusion is a classical demonstration of integration between lipreading and heard speech (McGurk and MacDonald 1976). For instance, if we see a speaker saying ga and hear a different syllable (e.g., ba), then we may perceive something intermediate like da (McGurk and MacDonald 1976). These findings suggest that auditory speech perception can also capitalize on this form of visual information and that this information may work interactively across phonetic, phonemic, and lexical levels (Campbell et al. 1990).
The neural mechanisms underlying lipreading and language comprehension are not fully understood. Neuropsychological studies suggest that lipreading provides an extra phonetic resource that supports the phonological speech recognition system: accordingly, patients with acquired cortical “deafness” benefit from lipreading only when some form of higher-order (phonological) acoustic and lexical representations are still available (Campbell et al. 1990). On the other hand, several studies have found that patients with phonological lexicon deficits cannot compensate through lipreading (Campbell et al. 1990). The visual system appears to contribute to lipreading at a relatively early processing stage. Campbell and colleagues (1997) described patient L.M. with bilateral lesions of area MT/V5, the visual motion perception cortex (Zeki et al. 1991), who was markedly impaired in lipreading and not prone to the fusion illusion.
Taken together, these findings support the idea that the phonological analysis of speech is multimodal, making use of visual (lip movements) and auditory information together (and see Massaro 1999, for one combinatorial scheme).
A recent functional MRI (fMRI) study by Calvert and colleagues demonstrated that silent lipreading of digits from 1 to 10 activates the temporal lobes, including some activation in primary auditory areas (BA 41/42 on the lateral surface of the planum temporale, bilaterally), area MT/V5, and temporal lobe language areas outside primary auditory cortex, including posterior parts of the superior temporal sulcus (Calvert et al. 1997). More recent imaging studies of lipreading are somewhat at variance with the early report by Calvert et al. (1997). MacSweeney and colleagues (2000) used an event-related fMRI protocol to measure lipreading in the absence of acoustic scanner noise in 3 volunteers. Under these conditions, no clear-cut activation was seen in primary auditory cortex, whereas a focus of activation was seen in Brodmann's area 44 (opercular part of Broca's area). Bernstein et al. (2002) compared the pattern of activation for lipreading and pure tone perception. They found no evidence of coactivation of primary auditory cortex by the two sets of stimuli. Taken together, the available imaging literature has provided controversial results on the cortical processing stages that are necessary for successful lipreading. In particular, because the stimuli used in previous studies always involved real words (numbers from 1 to 10 in Calvert et al. 1997 and MacSweeney et al. 2000; real monosyllabic words in Bernstein et al. 2002), no information is available on separate aspects of speechreading, phonological decoding, and lexical identifications.
Previous studies of phonemic discrimination of auditory or visually presented stimuli (Démonet et al. 1992, 1996; Paulesu et al. 1993, 1996b; Pugh et al. 1996) systematically showed a role of frontal structures (e.g., Broca's area) in these tasks. These regions are also thought to be important for gesture recognition/imitation (Decety et al. 1997; Iacoboni et al. 2000), and might “... provide a necessary bridge from doing to communicating” suggesting a link between speech and action representation (Rizzolatti and Arbib 1998).
In this study we used PET measurements of regional cerebral blood flow (rCBF), an index of synaptic activity, to further investigate the neurophysiological bases of lipreading, as a model of multimodal integration. To correlate lexical and nonlexical lipreading with activation of specific brain regions, we compared lipreading of real words with lipreading of nonlexical utterances of similar visual complexity and duration. Activation tasks involved watching a face silently mouthing high-frequency bisyllabic words [lexical lipreading (LLR)] or a video containing the same words as in the LLR condition but played backward [nonlexical lipreading (NLLR)]. The words played backward were not recognizable as real words; instead they evoked perception of meaningless syllables/phonemes. rCBF was also measured during a baseline task (BT) when subjects observed a video showing a still face.
Eight male right-handed subjects (age range: 22–28), with no history of neurological disorders, participated in the experiment. All subjects gave informed written consent, and the local hospital ethics committee approved the study.
In designing the experimental conditions, we were concerned to have well-balanced stimuli as far as the duration of the visual input. In addition, to assess nonlexical lipreading, it was important to have nonword stimuli that could not be interpreted as real words during the task. Simple single syllables were not suitable because they were not balanced with respect to duration with real bisyllabic words. In a pilot study, we assessed whether mouthed nonwords are perceived as such. Ten subjects not involved in the PET scans viewed 24 bisyllabic high-frequency words and 24 bisyllabic legal nonwords derived from the above real words by consonant substitution. We also presented the same 24 bisyllabic high-frequency words played backward. Each set of stimuli was presented twice. Subjects were invited to name the perceived speech sound for each stimulus. The results demonstrated that 1) the vast majority of real words were named with (perceived as) real words (words named with real words: 42.5/48, SD 1.6; words named with nonwords: 4.9/48, SD 1.9); 2) legal nonwords were perceived as real words in the majority of cases (nonwords named with words: 30.2/48, SD 5.5; nonwords named with nonwords: 16.8/48, SD 6.1); and 3) words played backward elicited perceptions of syllables or bisyllabic nonwords for the vast majority of stimuli (stimuli perceived as syllables or nonwords: 40.8/48, SD 3.6; stimuli named with real words: 3.8/48, SD 2.9). On average, the hits on real words (real words correctly identified, rather than simply named with words) were 17.2, SD 4.0. This rate of lip-read word identification is compatible with recently published work for high-proficiency subjects (Ludman et al. 2000).
On the basis of the above pilot study, the experimental stimuli used for PET scanning on 8 different subjects were videos showing 1) a still face (BT) or 2) a face silently mouthing 20 bisyllabic, high-frequency words (LLR task); the same stimuli were presented in a randomized order across task replications; 3) the same 20 bisyllabic, high-frequency words played backward (NLLR task). For the PET scans, subjects were instructed to perceive the stimuli and refrain from articulation. Each task and the baseline were repeated 4 times in a counterbalanced order. Lip-reading accuracy in the covert LLR task was assessed after each LLR PET scan with a further presentation of the stimuli and the request of overt naming. No such test was attempted for the NLLR task so as not to contaminate the subjects' attitude and to ensure that a naming strategy would not be favored during the PET scanning time.
rCBF was measured by recording the distribution of radioactivity after the intravenous injection of 15O-labeled water (H2 15O) with the GE-Advance scanner (General Electric Medical System, Milwaukee, WI), which has a field of view of 15.2 cm, allowing sampling of the entire brain and cerebellum at once. Data were acquired by scanning in the 3D mode. A 7-mCi slow bolus of H2 15O was injected as a tracer of blood flow and 90-s scans were acquired. After attenuation correction (measured by a transmission scan), the data were reconstructed as 35 transaxial planes by 3D-filtered back projection using a Hanning filter (cutoff 4 mm filter width) in the transaxial plane, and a Ramp filter (cutoff 8.5 mm) in the axial direction. The integrated counts accumulated over 90-s scans were used as an index of rCBF.
PET data were analyzed using Statistical Parametric Mapping (SPM) 1999 (Wellcome Department of Imaging Neuroscience). The original brain images were first realigned and then transformed into a standard stereotactic anatomical space. Sterotactically normalized images were also smoothed with a Gaussian filter (16 × 16 × 12 mm) (Friston et al. 1995a). Statistical analyses were performed according to the SPM implementation of the general linear model (Friston et al. 1995b). Global differences in rCBF across conditions and subjects were corrected by proportional scaling and comparisons across means were made using the t-statistic. Only regional activations significant at P < 0.001 were considered. The following comparisons are reported: 1) NLLR minus baseline, 2) LLR minus baseline, 3) NLLR minus LLR, 4) LLR minus NLLR, and 5) conjunction analysis of comparison 1 and 2 (which identifies the neural system shared by nonlexical and lexical lipreading). The direct comparisons of the two lip-reading tasks and their conjunction analysis were masked on the relevant comparisons against the baseline.
rCBF images collected during lexical lipreading were also correlated with the number of hits recorded after each of 4 PET scans using linear regression. Anatomical localization of the activated cerebral areas was performed according to the Talairach anatomical atlas (Talairach and Tournoux 1988) and to MR image templates.
Visualization of the results was made using MRIcro (http://www.psychology.nottingham.ac.uk/staff/crl/mricro.html).
To assess whether the primary auditory cortex was involved in lipreading, the reported activations were compared with a recently published probabilistic map of the primary auditory cortex (Rademacher et al. 2001).
Behavioral data during PET scanning
For the NLLR task, all subjects reported perceiving isolated syllables or nonwords with no real word identification. Word identification during the LLR task showed a steady increase with each repetition: first scan: mean 10.2, SD 3.6; second scan: mean 11.6, SD 3.4; third scan: mean 12.2, SD 3.8; fourth scan: mean 15.9, SD 2.8. Repeated-measures ANOVA demonstrated that this effect was significant [F(3) = 13, P < 0.0001].
Compared with the baseline task, the NLLR task was associated with bilateral activation of the superior temporal association cortex, of the left dorsal premotor cortex, inferior frontal gyrus, and insula (Fig. 1 and Table 1). There was also activation in a region falling within the boundaries of a probabilistic map for the right visual area MT/V5 as defined by Mendola et al. (1999). An analogous left-sided mirror area within the boundary of MT/V5 was significantly active, although at a lower threshold level (Z score = 2.3).
Direct comparison of the NLLR task versus the LLR task revealed a significant difference in the middle temporal visual motion area MT/V5, bilaterally and in the left inferior temporal gyrus (Fig. 2 and Table 1).
Overall, LLR involved a similar pattern of brain areas as in NLLR, with activation in prefrontal, ventral premotor, superior temporal and dorsoparietal cortex, bilaterally. The direct comparison with the NLLR task showed significant differences in the left inferior and middle frontal gyrus, dorsal precentral gyrus, and the right inferior frontal gyrus.
The conjunction analysis confirmed that NLLR and LLR have comparable patterns of activation in the left opercular inferior frontal gyrus and insula, and in the superior and middle temporal gyri bilaterally (Fig. 1 and Table 3).
The presence of activations within Brodmann's area 41 (primary auditory cortex) was assessed by comparing the location of the local maxima of activations within the temporal lobes and the distribution of the postmortem microanatomically defined probabilistic maps of area A1 (Rademacher et al. 2001). For this comparison, we considered areas that were labeled as BA 41 in ≥5 out of 10 brains examined by the authors. On the basis of this analysis, we cannot attribute any temporal lobe activation by lipreading to BA 41.
The regression analysis between rCBF measured during the LLR task and the number of hits for the LLR task showed only negative correlations: these were in the left prefrontal and ventral premotor cortex, the superior temporal gyrus, and the inferior parietal lobule bilaterally, in the left middle temporal gyrus, and the left insula (Fig. 3 and Table 4). These were areas also identified by the comparison LLR minus baseline. Outside that pattern of activation, a negative linear regression was present between performance and rCBF in the left MT/V5 area (BA 37) and the inferior temporal gyrus (Table 4). The peaks identified by the regression analysis in the left opercular inferior frontal gyrus and superior temporal gyri bilaterally were also sites of activation for the NLLR task.
This study sheds new light on the neurophysiological correlates of lipreading. Combined subtractive and correlation analyses of the PET data offer a rich account of its neurophysiology.
From phonology to meaning
Our results indicate that lexical and nonlexical lipreading depend on a distributed partially overlapping neural system and suggest the neurocognitive circuits that support representations necessary for successful lipreading. The perisylvian language regions, which are associated with segmental phonological processing and high-level articulatory planning, were equally active for the two sets of stimuli, words and words played backward. In other words, within the time and anatomical resolution limits of functional imaging, no double dissociation was found within the perisylvian language areas for perception of isolated syllables and real words. Brain regions more active in nonlexical lipreading were in areas concerned with visual motion perception. This could be interpreted to mean that decoding of mouth gestures and phonological mediation are required for successful access to the lexicon through lipreading. On the other hand, lexical lipreading also requires the activation of further language areas, particularly in frontal regions previously implicated in access to lexical semantic knowledge (Paulesu et al. 1997; Poldrack et al. 1999).
The candidate areas for phonological mediation for lipreading are those areas activated in both nonlexical and lexical lipreading, that is, the left inferior frontal gyrus and insula, and the superior/middle temporal gyri bilaterally, areas also identified by the conjunction analysis of the two simple main effects. Importantly, opercular Broca's area is part of this cortical network, suggesting that phonological mediation in lipreading may involve matching of articulatory plans of the speaker and of the viewer. The left inferior frontal gyrus, classically conceived as a motor speech center, has now been implicated in aspects of phonetic speech discrimination (Démonet et al. 1992, 1996; Paulesu et al. 1993, 1996b; Pugh et al. 1996).
The involvement of Broca's area in lipreading was unclear from previous studies, which had variable outcomes with respect to this aspect of activation (Calvert et al. 1997; Campbell et al. 2001). Our evidence, based on both subtractive and correlational analyses, allows us to associate Broca's region with lipreading definitively, and invites us to reconsider its general role in speech, especially with respect to a recent theory of speech perception (Rizzolatti and Arbib 1998). The theory originates from neurophysiological studies in monkeys showing the existence of a class of neurons in the ventral premotor area (area F5) that are active not only when the monkey grasps or manipulates objects, but also when the monkey observes the experimenter doing a similar gesture (Gallese et al. 1996). The authors called this set of neurons “mirror neurons,” and interpreted their function as connected to the matching of observed and executed gestures, and therefore to imitation and understanding of actions. The same authors (Gallese et al. 1996; Rizzolatti and Arbib 1998) suggest that the human homolog of area F5 in monkeys may be in the ventral premotor cortex 6 and opercular component 44 of Broca's area. Experiments in humans based on magnetic transcranial stimulation (Fadiga et al. 1995) or PET and fMRI activation paradigms (Grafton et al. 1996; Iacoboni et al. 2000; Rizzolatti et al. 1996) have given support to the hypothesis that a matching system for action observation and production/imitation may exist in humans, in Broca's area.
Observation of Broca's area activation during lipreading is important for the theory of motor perception and speech. We believe that activation of Broca's region may depend, at least in part, on the activity of the mirror neurons postulated by Gallese et al. (1996). The fact that activations of inferior frontal regions (BA 6, 44) during lip-reading tasks were strictly lateralized to the left hemisphere provides a functional link between motor observation and language processing and suggests that in humans the mirror system may not be restricted to hand actions; rather it may include a repertoire of mouth actions related to language. In addition, the observation of mouth actions produces an activation of ventrally located foci in the premotor cortex (BA 6). A similar somatotopic representation for the mouth was observed by Buccino et al. (2001), in an fMRI study where subjects observed actions involving different body effectors.
Similar conclusions concerning the role of frontal regions in the observation and planning of speech can be drawn from recent transcranial magnetic stimulation findings (Watkins et al. 2003), which included viewing a talking speaker as well as listening to speech.
However, a more trivial interpretation of the finding that Broca's region is activated in lipreading should also be considered. It could be interpreted as the mere counterpart of inner articulation of the stimuli. Two facts militate against this interpretation. Subjects were instructed to passively perceive the stimuli, avoiding voluntary articulation. In addition, the linear regression analysis showed a negative correlation between hits for the lexical lip-reading task and rCBF. Any inner articulation should have increased with increasingly successful identification of the stimuli, leading to a positive linear correlation with rCBF rather than the negative one that we observed. Indeed, the presence of a negative correlation places at the decoding end the physiological contribution of the opercular Broca's region so that the rCBF is higher the more effort is required for phonological analysis.
Lipreading: before and after phonological mediation
Our data also give information about different processing stages than those involved in phonological decoding. Visual processing, beyond earlier visual activity concerned with faces, involves cortical regions falling within the boundaries of visual motion area MT/V5 (Mendola et al. 1999; Tootell et al. 1995; Watson et al. 1993). Because this brain region is more active for nonlexical than for lexical lipreading, it suggests that extra visual processing resources are required. This may in turn be driven by higher centers, to force a (phonological) representation from the “difficult” material (seen speech when played backward does not have the normal dynamic properties of natural speech, although individual mouth patterns are often identical). This interpretation is also supported by the fact that, at least for the left area MT/V5, a negative correlation was found between number of hits and rCBF, indicating a less prominent contribution of this area when identification of the stimuli is consistently achieved.
Beyond the sublexical phonological processing stage, both subtractive analyses and correlation analysis indicate a number of candidate cortical regions that contribute to successful lexical access in lipreading, particularly the more anterior part of the inferior frontal gyrus (BA 45): this was significantly more active for LLR, it correlated with the hits curve for the LLR task, and was not a site of activation for the NLLR task.
Another area of likely involvement in lexical access through lipreading is the left inferior temporal cortex region, which has been involved in word retrieval across a range of tasks including object naming and reading (meta-analysis in Paulesu et al. 2000; Table 2). Interestingly, this is seen only in the correlation analysis, suggesting a contribution in lexical access through lipreading at a decoding stage.
Specifically connected with the lexical lip-reading task was also the activation of the dorsal part of the inferior parietal lobule around the intraparietal sulcus (IPL, BA 40) bilaterally. For the monkey, there is now a detailed characterization of the cortical areas buried within the intraparietal sulcus (review in Rizzolatti et al. 2000). One of these, called VIP (ventral intraparietal region), has polymodal tactile and visual neurons centered around the face and mouth. There are recently published fMRI studies demonstrating the location of this region in humans (Bremmer et al. 2001; Lloyd et al. 2003). However, comparison of the stereotactic coordinates for human VIP cortex with our own activation foci shows that our activations are overall more lateral. Therefore we infer that our parietal activations represent a lexical lipreading–specific brain region beyond human area VIP.
As discussed in the introduction, there were a number of discrepancies in the previous functional imaging literature on lipreading; for example, the inconsistent findings on Broca's area or on primary auditory cortex.
Our data help to resolve these discrepancies in linking different parts of Broca's region to discrete aspects of lipreading, from phonological mediation to lexical access. Why did Calvert et al. (1997) not observe activation in Broca's area and nearby frontal cortices activation? Calvert and colleagues used digits from 1 to 10 as stimuli. This limited and closed repertoire of stimuli may have become rapidly automatized, contributing to a failure to demonstrate activation in this region. As far as temporal auditory cortex is concerned, our results confirm that associative auditory cortex of the temporal lobe is activated by prelexical and lexical lipreading. However, no activation was detectable in primary auditory cortex (BA 41), when defined in appropriate anatomical terms. This is in line with recent negative findings by Bernstein et al. (2002). It is very unlikely that this has occurred because of limited spatial resolution or sensitivity of the PET technique. When direct auditory stimulation is involved, activation of BA 41 is easily observed and can be isolated from other foci in the nearby temporal cortex (Price et al. 1996). Taken together, the available evidence does not point to primary auditory cortex as a key cortical processing stage for successful lipreading. This is also in line with psychological and imaging observations that other silent-language tasks, such as phonological awareness tasks or phonological short-term memory tasks are independent from primary auditory codes (for discussion, see Dolan et al. 1997; Paulesu et al. 1996a). The role of the regions abutting primary auditory cortex, including lateral parts of the planum temporale, which are reliably activated by speechreading (Calvert and Campbell 2003) and even by some forms of reading (Raij et al. 2000), remains controversial.
This study was supported in part by a European Community-Biomedical and Health Research II Grant (contract BMH4-CT96-0274) and by a Cofinentiamento Progetti di Ricerca di Interesse Nationale-2001 Ministero dell'Istruzione dell'Universitá e delle Ricerca Grant to E. Paulesu.
We thank Dr. J. Rademacher for providing probabilistic maps of area A1 and two anonymous referees for constructive comments. We are also grateful to our colleagues from the physics and radiochemistry sections of the Centro Ciclotrone/PET of the Scientific Institute H San Raffaele, Milan, for making this research possible.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Copyright © 2003 by the American Physiological Society