Representation of the Temporal Envelope of Sounds in the Human Brain

Anne-Lise Giraud, Christian Lorenzi, John Ashburner, Jocelyne Wable, Ingrid Johnsrude, Richard Frackowiak, Andreas Kleinschmidt


The cerebral representation of the temporal envelope of sounds was studied in five normal-hearing subjects using functional magnetic resonance imaging. The stimuli were white noise, sinusoidally amplitude-modulated at frequencies ranging from 4 to 256 Hz. This range includes low AM frequencies (up to 32 Hz) essential for the perception of the manner of articulation and syllabic rate, and high AM frequencies (above 64 Hz) essential for the perception of voicing and prosody. The right lower brainstem (superior olivary complex), the right inferior colliculus, the left medial geniculate body, Heschl's gyrus, the superior temporal gyrus, the superior temporal sulcus, and the inferior parietal lobule were specifically responsive to AM. Global tuning curves in these regions suggest that the human auditory system is organized as a hierarchical filter bank, each processing level responding preferentially to a given AM frequency, 256 Hz for the lower brainstem, 32–256 Hz for the inferior colliculus, 16 Hz for the medial geniculate body, 8 Hz for the primary auditory cortex, and 4–8 Hz for secondary regions. The time course of the hemodynamic responses showed sustained and transient components with reverse frequency dependent patterns: the lower the AM frequency the better the fit with a sustained response model, the higher the AM frequency the better the fit with a transient response model. Using cortical maps of best modulation frequency, we demonstrate that the spatial representation of AM frequencies varies according to the response type. Sustained responses yield maps of low frequencies organized in large clusters. Transient responses yield maps of high frequencies represented by a mosaic of small clusters. Very few voxels were tuned to intermediate frequencies (32–64 Hz). We did not find spatial gradients of AM frequencies associated with any response type. Our results suggest that two frequency ranges (up to 16 and 128 Hz and above) are represented in the cortex by different response types. However, the spatial segregation of these two ranges is not systematic. Most cortical regions were tuned to low frequencies and only a few to high frequencies. Yet, voxels that show a preference for low frequencies were also responsive to high frequencies. Overall, our study shows that the temporal envelope of sounds is processed by both distinct (hierarchically organized series of filters) and shared (high and low AM frequencies eliciting different responses at the same cortical locus) neural substrates. This layout suggests that the human auditory system is organized in a parallel fashion that allows a degree of separate routing for groups of AM frequencies conveying different information and preserves a possibility for integration of complementary features in cortical auditory regions.


Continuous speech shows pronounced low-frequency AM in its temporal envelope, with most prominent modulation frequencies near the average syllabic rate of 3–4 Hz (Houtgast and Steeneken 1985). A number of studies conducted with normal-hearing listeners, listeners with sensorineural hearing loss, and cochlear implantees have shown that modulation frequencies below about 50 Hz are both necessary (Drullman et al. 1994a,b;Duquesnoy and Plomb 1980; Houtgast and Steeneken 1985) and almost sufficient (Hochmair and Hochmair-Desoyer 1984; Shannon et al. 1995;Tasell et al. 1987) for accurate speech recognition in silence and in noise. Various psychophysical methods have been developed to assess human sensitivity to AM. A common approach is to measure the auditory temporal modulation transfer function (TMTF), that is, the listener's threshold for detecting a sinusoidal AM applied to a noise carrier as a function of modulation frequency (Bacon and Viemeister 1985; Viemeister 1979). In such a task, detection is only based on temporal envelope cues since modulation of white noise does not affect its long-term magnitude spectrum. These measurements show that TMTFs obtained with normal-hearing listeners are typically low-pass in shape with preserved sensitivity up to about 16–50 Hz.

A classical model used to account for human sensitivity to AM assumes that the temporal envelope of stimuli is smoothed by a single low-pass filter (or a temporal integrator) operating at a postcochlear level (Strickland and Viemeister 1996; Viemeister 1979). The similarity in the TMTFs measured in listeners with normal-hearing and listeners with cochlear damage (Bacon and Viemeister 1985; Moore et al. 1992) indicates that such a low-pass filter is located at a central rather than at a peripheral level. Lesion data in humans support this hypothesis and further suggest that this low-pass filter could be situated cortically (Albert and Bear 1974; Auerbach et al. 1982; Chocholle et al. 1975; Efron et al. 1985; Lorenzi et al. 2000; Phillips and Farmer 1990; Praamstra et al. 1991; Robin et al. 1990; Tanaka et al. 1987; Yaqub et al. 1988).

The results of psychophysical adaptation and masking experiments (Bacon and Grantham 1989; Houtgast 1989;Tansley and Suffield 1983; Yost et al. 1989) performed in humans with sinusoidally AM noises and tones have, however, indicated an alternative model (Dau et al. 1997; Hewitt and Meddis 1994; Langner 1992; Lorenzi et al. 1995) in which a bank of perceptual channels, each tuned to a different low AM frequency, decomposes the temporal envelope of sounds at a central level. This alternative hypothesis is supported by electrophysiological recordings performed in the cochlear nucleus (e.g., Frisina et al. 1990; Møller 1976), the inferior colliculus (e.g., Langner and Schreiner 1988; Rees and Møller 1987; Rees and Palmer 1989), and the auditory cortex (e.g., Eggermont 1994;Schreiner and Urbas 1986, 1988) of various mammals (e.g., guinea pig, gerbil, cat). These studies show that most neurons of the cochlear nucleus and inferior colliculus are selectively tuned to high AM frequencies, the best modulation frequencies ranging from about 50 to 500 Hz. In comparison, most neurons of the auditory cortical fields are selectively tuned to low AM frequencies, the best modulation frequencies ranging from about 3 to 30 Hz. Such a cascade of AM filters should allow for the decomposition of the temporal envelope at subcortical and cortical levels.

Recent data obtained with functional magnetic resonance imaging (f-MRI) in humans show that each level of the auditory system responds preferentially to a given stimulus repetition rate within a low frequency range of 3–35 Hz (Harms et al. 1998). The inferior colliculus responds better to 35 bursts/s, the medial geniculate body to 20 bursts/s, Heschl's gyrus between 2 and 10 bursts/s, and the superior temporal gyrus responds preferentially to a rate of 2 bursts/s. Thus f-MRI data in humans and single-unit recordings in other mammals provide general agreement that1) processing of AM frequencies in humans may be subserved by different subcortical and cortical regions, each tuned to a different AM frequency; 2) this AM frequency decreases from the brainstem to the cortex.

Finally, MRI data also suggests a dependency of response properties on stimulus repetition rate. Cortical regions show transient responses at high repetition rates (35 bursts/s) and sustained responses at low rates (2 bursts/s) (Harms et al. 1998). Thus a coding by temporal response properties rather than by topographical representation may also enable the decomposition of the temporal envelope of sounds.

In summary, the analysis of modulated sounds may depend on different parts of the brain, each tuned to a different AM frequency. It may also depend on specialized cortical regions that contain arrays of neurons tuned to different AM frequencies and/or show different response properties. In other words, each level of the auditory pathway could be considered as a filter in the AM domain with a best frequency that decreases from the periphery to the cortex. Alternatively, cortical regions could contain neuronal maps representing AM frequencies. In such a case, cortical neurons would be considered as AM filters and the array of neurons would behave collectively as a modulation filter bank (Dau et al. 1997; Langner 1992).

Despite correspondence between psychophysical and electrophysiological data, the controversy about the location and nature of the temporal processor (a single temporal integrator, a single modulation filterbank, or a cascade of subcortical and cortical modulation filters) persists. Recently, a magnetoencephalography (MEG) study using complex tones demonstrated a topographic organization of the human auditory cortex for high AM frequencies (from 50 to 400 Hz) (Langner et al. 1997). This so called “periodotopic” organization (Langner 1992) is consistent with the modulation filter bank model suggested by electrophysiological studies in animals and psychophysical studies in humans. However, it is not clear whether the described maps indicate an intra-regional gradient (distinct auditory regions tuned to different AM frequencies) or a topographical organization of AM frequencies limited to a particular auditory field. Since that study aimed to investigate the cortical representation of the periodicity pitch, the AM frequencies used were higher than those known to generate the best cortical response (Schreiner and Urbas 1988). Moreover, these AM frequencies, located in the range 50–400 Hz, are less crucial to speech processing than lower ones (4–16 Hz), as degradation of high AM frequencies does not affect speech recognition (e.g., Drullman et al. 1994a,b). Considering the number of hypotheses, techniques, and stimuli, further investigation of temporal envelope coding in humans is warranted. Our study investigated the cortical representation of the temporal envelope of sounds using the f-MRI technique and a set of white noises sinusoidally modulated in amplitude at frequencies crucial for speech recognition (4–32 Hz) as well as at higher frequencies (up to 256 Hz) that are known to be important for the perception of voicing and prosody (see Rosen 1992for a review). These stimuli were used to 1) identify the cerebral structures contributing to the human sensitivity to the temporal envelope of sounds, 2) analyze their functional organization, and 3) investigate their response properties.



All white-noise stimuli were generated using a 16-bit D/A converter at a sampling frequency of 44.1 kHz. White noises were either unmodulated or sinusoidally modulated in amplitude at 4, 8, 16, 32, 64, 128, and 256 Hz, with a modulation depth of 100% (Fig.1 A). All stimuli were shaped by rising and falling 25-ms cosine ramps/damps, equated in energy, and presented to the right ear at 75–80 dB sound pressure level (SPL) via a plastic tube plugged in the outer ear canal. Scanner noise during functional imaging was 75 dB SPL under the headphones (attenuation by headphones = 30 dB). We could not measure the scanner noise level in the outer ear canal after insertion of the earplug but the attenuation provided by the earplug was estimated to be another 30 dB (subject to inter-individual variations in the earplug positioning). The signal-to-noise ratio was thus estimated to be about 30 dB. Preliminary testing ensured that all stimuli were clearly audible and presented at a comfortable level.

Fig. 1.

A: stimuli used: white noise and white noise sinusoidally amplitude-modulated (SAM) with a modulation depthm = 100%. B: experimental design. 42 sessions of 28 s with alternation of baseline (B = unmodulated noise) and noises modulated at various AM frequencies between 4 and 256 Hz.

A series of measurements of the output sounds (output of the plastic tube inserted in the outer ear canal) confirmed that under experimental conditions, all stimuli reached the ear with a flat spectrum between 20 Hz and 10 kHz and an effective modulation depth of at least 50–60% (Fig. 2). Under scanning conditions, the modulation depth was well above the detectability thresholds which were around 10% for the highest frequency.

Fig. 2.

Power spectrum of the stimuli (A) and waveform of a white noise sinusoidally modulated at 4 Hz (B) measured at the output of the stimulation system.

f-MRI acquisition

Functional imaging was performed in five normal-hearing subjects (1 female and 4 males, mean age 29 yr) at 2T (Siemens Vision MR scanner) using a gradient echo EPI (48 slices, intervolume time = 4s). The voxel size was 3 mm isotropic. A session included 42 condition epochs of 28-s duration with alternation of unmodulated and modulated noises (Fig. 1 B). We used a sinusoidal acquisition sequence that peaked at 833 Hz and had a periodicity at around 10 Hz in the temporal domain due to the slice selection. The periodicity of the scanner noise could have interfered with the detection of the stimulus modulated at 8 Hz. However, in a pilot study performed in one subject, we measured the detection thresholds for all AM rates. The detection threshold for white noise modulated at 8 Hz was situated within the range of the detectability thresholds for other modulation rates (<10%). The functional results further confirmed that the response at this particular frequency was not obscured.

Data analysis

Data were realigned, normalized, and spatially smoothed with a Gaussian filter of 6 mm, using the SPM99 software ( Subsequent analyses were performed based on single subject data. The statistical analysis, performed with SPM99, employed a general linear model (Worsley and Friston 1995) with a design matrix that comprised event- and epoch-related regressors. These regressors were obtained by convolving series of delta functions with a set of temporal basis functions (as detailed below for each type of analysis).

In a first step, we identified brain regions responding to modulated as opposed to unmodulated sounds by applying an epoch-related analysis (Friston et al. 1995), where an epoch was modeled by a hemodynamically smoothed box-car function (28-s baseline followed by 28 s of modulated noise, etc.). As single-subject data had been co-registered into a standard stereotactic space, we were able to specify the regions that were significantly activated in every subject using a conjunction analysis across subjects. We set the level of significance at P = 0.05 (corrected for multiple comparisons on a large number of volume elements).

Further analyses were performed to determine the following:1) the shape of the hemodynamic response to AM sounds and its frequency-dependency (coding of AM frequency by response properties rather than topographical organization), 2) whether there was a topographic organization of AM frequencies across brain regions responsive to AM (hypothesis of a cascade of discrete filters),3) whether there was a topographical organization of modulation frequencies within each responsive cortical regions (modulation filter bank hypothesis).

1) We investigated frequency dependent effects in the time course of the response to AM by analyzing poststimulus-onset responses. According to Harms and Melcher (1999), the shape of the f-MRI signal varies with stimulus repetition rate with an increasing response decay at increasing stimulus rates. We confirmed this finding and subsequently modeled separately a sustained response (epoch-related analysis) and a transient response related to the onset of the modulated noise (event-related analysis). An event was modeled by a linear combination of the hemodynamic response function and its temporal derivative (Friston et al. 1998). To ensure that transient effects were specific to AM sounds and were not related to any change in the quality of sounds, we excluded regions also showing a response to the onset of the unmodulated noise (baseline). Within regions exhibiting transient or sustained responses to any AM frequency (selected using individual inclusive masks of regions responding to any modulation rate against baseline at P= 0.0005, uncorrected for multiple comparisons), we determined preferential responsiveness to low (4, 8, 16 Hz) and high AM frequencies (64, 128, 256 Hz) and for high/low AM-frequency interactions. Fitted responses corresponding to the f-MRI signal convolved with the response model were used to assess frequency-dependency effects at a given location.

2) As we found activations over a large range of the auditory cortex, we assumed that they covered different auditory regions. An intra-regional gradient was assessed by calculating best modulation frequencies (BMFs) in regions of interest. Considering the difficulties of defining regions of interest using anatomical landmarks with MRI (Jäncke et al. 1999), we used spheres (diameter = 1 cm) centered around the peaks of response observed in a group analysis. The BMF of a region was obtained from averaged tuning curves of all voxels contained in that region. We considered only left-hemisphere regions because we used right monaural stimulation. Spheres of interest were located in the lower brainstem (superior olivary complex), the inferior colliculus, the medial geniculate body, Heschl's gyrus, the superior temporal sulcus, the superior temporal gyrus, and the supramarginal gyrus/inferior parietal lobule (coordinates given in Table 3). In all these regions, we also assessed the response type (decaying or sustained). This was done with a mixed model that modeled an epoch of the duration of the stimulus but additionally allows an exponential decay to occur during that epoch. More precisely, the box-car function was linearly combined with an exponential function of peri-onset time to model within-epoch adaptation. Both were convolved with a hemodynamical response function (Friston et al. 1995). Responses were classified as sustained when there was <25% response decrease during the epoch and as decaying below 25% decrease.

3) Intra-regional gradients were assessed by calculating the best frequency in all voxels responsive to AM. Each voxel was assigned its best frequency. Maps of BMF were visualized using a color code (Fig. 7). Since we found different frequency-dependent effects according to the model used, we compared maps of the epoch-related analysis, the event-related analysis, and the combined model. The contour of the maps was defined for each analysis by the response to all AM frequencies in all subjects at P = 0.0005, uncorrected. This mask encompassed all regions sensitive to AM including Heschl's gyrus, the superior temporal gyrus, the superior temporal sulcus, and the inferior parietal lobule. This procedure was used to optimize our chances of detecting systematic spatial patterns. We also built up BMF maps using single-subject functional data to guide boundary definition. These individually shaped maps were used to count voxels across AM frequencies (Fig. 8).


Brain regions sensitive to AM

In all subjects, AM sounds activated the posterior regions of the temporal sulcus (BA 22) and the superior temporal gyrus (BA 42), Heschl's gyrus (BA 41), and a region of the inferior parietal lobule (BA 40) (Table 1 and Fig.3). Pooling data across subjects and thus enhancing sensitivity, we also found subcortical activations in the right lower brainstem (superior olivary complex), the right inferior colliculus, and the left medial geniculate body, as shown in Fig. 6. These activations are consistent with a monaural stimulation to the right ear. They were not significant at P = 0.05, corrected, in every single subject data set and were therefore not detected by a conjunction analysis.

View this table:
Table 1.

Conjunction across subjects for amplitude modulated-unmodulated noise (in response to a right monaural stimulation)

Fig. 3.

Brain regions sensitive to AM (unmodulated noise − modulated noise) from group analysis (top left), from conjunction of 6 subjects (top right), in two single subjects (middle and bottom). Sections are sliced at the level of Heschl's gyrus (z = 5 mm). For reasons of display, these pictures are presented atP = 0.001, uncorrected.

Time course and response types

Figure 4 shows the time course of the MRI signal intensity change response for four AM frequencies, in Heschl's gyrus of three of the subjects. We observed an effect of AM frequency both on the amplitude and on the shape of these responses. The fitted response (represented by a solid line) indicated a significant effect of AM up to 16 Hz. The raw response (dotted line) showed a transient effect following the onset of the AM stimulus that was preserved at high AM frequencies. This observation motivated the event-related analysis that sought brain regions showing such transient responses at the onset of modulated noise. The event-related analysis revealed the same set of regions as the epoch-related analysis but showed a different spatial distribution of BMFs.

Fig. 4.

In Heschl's gyrus of three subjects, single trial time courses of the adjusted data (raw data) (dotted line) and fitted response (raw data convolved with the box-car model) (solid line) for AM frequencies ranging from 4 to 32 Hz. The bar indicates stimulus (modulated noise).

The epoch-related analysis did not model the responses to AM frequencies above 32 Hz optimally and therefore failed to show a significant effect at high AM frequencies. The response to high AM frequencies was better modeled by an event related to stimulus onset. Using such a transient response model, we found cortical regions specifically responsive to 64, 128, and 256 Hz. Figure5 shows a complementary pattern for high and low AM frequencies, and Table 2 gives the statistical results associated with the high versus low frequency comparison in these regions.

Fig. 5.

Comparison of epoch- and event-related analyses. Left: SPMs of both analyses. Red: response to low AM frequencies (4, 8, and 16 Hz). Green: response to high AM frequencies (34, 128, and 256 Hz). High AM frequencies give a response only when modeled by a transient response. With the transient response model, high and low AM frequencies can be spatially segregated. Right: in the same voxel, AM-frequency dependency effects according to the type of response modeled. Low AM frequencies are better modeled with a sustained response (epoch-related analysis), high AM frequencies with a transient response (event-related analysis). Red, 4 and 256 Hz; blue, 8 Hz; green, 16 Hz; cyan, 32 Hz; yellow, 64 Hz; magenta, 128 Hz.

View this table:
Table 2.

Groups results and transient response model

In individual data, we looked at the effect of AM frequency with both transient and sustained response models, in voxels showing a large AM-frequency effect in the event-related analysis. We found an interaction between frequency and response type. In the same voxels, the epoch- and event-related analyses revealed reverse AM-frequency dependency patterns, i.e., a response decrease with AM-frequency increase and a response increase with AM-frequency increase, respectively. In other words, in the same volume element, opposite AM-frequency gradients were found, depending on the response type. Such an interaction is presented in Fig. 4 for one subject. In this figure, it can be noted that the response to the AM frequency of 32 Hz was fitted with both models. To achieve a good fit of these intermediate frequencies, we used an analysis combining both response types.

Global tuning and response patterns

We used the combined model to assess the global tuning of the different regions responsive to AM. For all subjects, BMFs in these regions are presented in Table 3 along with the dominant response pattern. In the lower brainstem, at a location compatible with that of the superior olivary complex, the responses were mostly tuned to the two highest AM frequencies (128 and 256 Hz) and showed a decaying response. In the inferior colliculus, BMFs were equal to 128 and 256 Hz with a decaying response in four subjects, and to 32 Hz with a sustained response in one subject. In the medial geniculate body, BMFs ranged between 16 and 32 Hz with both response patterns. Heschl's gyrus was consistently tuned to 8 Hz with a sustained response pattern. In four subjects, the activated regions of the superior temporal sulcus and superior temporal gyrus were tuned to the two lowest AM frequencies of 4 and 8 Hz with sustained response patterns. The inferior parietal lobule/supramarginal region showed no consistent tuning across subjects but consistently responded with a transient pattern to the whole range of AM frequencies. In most brain regions, we observed consistent tuning across subjects. Overall, an AM frequency of 4 Hz always produced a sustained response, and AM frequencies of 128 and 256 Hz always produced a transient response. AM frequencies of 16 and 32 Hz showed sustained and decaying responses depending on region and subject.

View this table:
Table 3.

BMFs and best response types

Figure 6 shows fitted responses in voxels sampled from different brain regions in subject 3. Although we found that regions of interest were globally tuned to a given AM frequency, voxels sampled in such regions could be tuned to other AMs, but these were always in the same range as the dominant BMF.

Fig. 6.

In a single subject, typical response types obtained in response to different AM frequencies in four brain regions: superior temporal sulcus (STS), Heschl's gyrus (Heschl), medial geniculate body (MGB), inferior colliculus (IC). Although regions show a best modulation frequency, a large range of AM frequencies are represented by different response types. Red, 4 and 256 Hz; blue, 8 Hz; green, 16 Hz; cyan, 32 Hz; yellow, 64 Hz; magenta, 128 Hz.

Intra-regional spatial gradients

The analysis of the global regional tuning was based on the average of the tuning curves of voxels contained in a region of interest. This method was therefore blind to the tuning characteristics of individual voxels. Possible intra-regional spatial gradients of AM frequencies were studied with cortical maps in which each voxel was assigned its best AM frequency. Because our previous analyses had shown an interaction between AM frequency and response type, we addressed the issue of spatial segregation at the level of the three response models applied (epoch-related, event-related, and combined). Maps from the epoch-related analysis were expected to reveal patterns in the low AM-frequency domain and maps from the event-related analysis patterns in the high AM-frequency domain. The combined model was expected to be sensitive to the whole range of AM frequencies, including the intermediate AM frequencies that were poorly modeled by a purely sustained or a purely transient response. Maps from all single subjects are shown in Fig. 7 on horizontal sections. We analyzed horizontal, coronal, and sagittal sections but found no consistent spatial gradient across subjects in any of these maps. However, a high degree of spatial segregation of high and low AM frequencies, as demonstrated by the group event-related analysis, was confirmed in every single subject.

Fig. 7.

Maps of best modulation frequencies (BMFs) in 5 subjects obtained with sustained (epoch-related analysis), transient (event-related analysis), and mixed response models (epoch + exponential decay analysis). The epoch-related analysis shows a coherent pattern in the low AM-frequency domain. The event-related analysis shows a mosaic-like pattern including high AM frequencies. The combined model, which allows for the best fit for each AM frequency, reveals a coherent pattern with a good representation of high AM frequencies. There is no consistent topographical organization of the AM frequencies across subjects.

As expected, the epoch-related analysis yielded homogeneous maps in the low AM-frequency domain, reflecting the fact that most cortical regions responded in a sustained way. In contrast, the event-related analysis revealed a mosaic-like pattern mostly comprising high AM frequencies. The combined model showed a homogeneous pattern, revealing a clustered segregation of high and low AM frequencies rather than intra-regional periodotopic gradients. Epoch-related and combined models both yielded coherent patterns in the low AM-frequency domain in four subjects. This similarity suggests that the sustained response component to low AM frequencies predominated in the cortex, although the same cortical regions remained responsive to high AM frequencies but with a different (decaying) response mode. In summary, the event-related patchy patterns can be superimposed onto the sustained response pattern. Together, they reveal co-localized response properties differing with respect to time courses and AM-frequency range.

Although the combined analysis provided a better fit to the intermediate AM-frequency responses better, we noted that these AM frequencies were still under-represented in the maps. A count of the number of voxels responsive to each AM frequency (Fig.8) in all regions sensitive to AM defined individually, revealed a consistent “notch” between low and high AM frequencies. This notch ranged from 16 to 64 Hz, depending on the subject. Since it was not situated close to the repetition rate of the scanner noise (10 Hz), the notch was probably not due to an interaction with the stimuli. Intermediate AM frequencies appear to truly have poorer cortical representation than higher and lower AM frequencies.

Fig. 8.

Distribution of the number of voxels tuned to each AM frequency in the combined analysis, and individual corresponding BMF maps rendered on individual structural scans. Maps are thresholded atP = 0.05, corrected.

In summary, we did not find consistent periodotopic gradients across subjects. However, the combined model revealed clustered segregation, indicating that voxels tuned to a given frequency are gathered in large clusters rather than being randomly distributed.


Brain regions subserving temporal envelope processing

We investigated the cortical response to the temporal envelope of sounds using AM noises with a flat spectrum contrasted against white noises with the same spectrum but no AM. This design ensures that the observed brain responses specifically reflect temporal processing. AM frequencies ranging from 4 to 256 Hz activated essentially identical cortical regions, although, when tested in isolation, only the lowest frequencies (4–16 Hz) yielded a response associated with a probability below P = 0.001 (corrected). In all subjects, we found bilateral activations in Heschl's gyrus, with locations consistent with that of the primary auditory cortex (Penhune et al. 1996). Stronger responses, however, were observed lateral and posterior to the primary auditory cortex, in association auditory regions situated in the superior temporal sulcus (BA 22, for the largest response) and externally on the lateral surface of the superior temporal gyrus in BA 42. The location of these regions, in terms of gross functional neuroanatomy, is consistent with the location of regions assigned to the analysis of the fine temporal structure of sounds identified by Griffiths et al. (1998), with positron emission tomography, using a parametric increase in temporal regularity of “delay-and-add” noises (i.e., iterated rippled noises).

The activation obtained in five subjects in the supra-marginal gyrus/inferior parietal lobule is more difficult to relate to other functional imaging data in the auditory perception domain. The right homologue of this region is considered a specific substrate for sound movement processing (Griffiths et al. 1998). Although activated by moving sounds, the left inferior parietal lobule is not critical for sound motion perception but seems to be largely multi-modal since it is also recruited during visual tasks involving written material (Menard et al. 1996; Rumsey et al. 1997). The left inferior temporal/supramarginal region is mostly involved in visual tasks requiring phonological processing. Whatever the modality, an involvement of this region in phonological segmentation is not implausible but currently remains speculative.

Pooling our data across subjects, we also found activations at the level of the right superior olivary complex, the right inferior colliculus, and the left medial geniculate body. This emphasizes that already subcortical stages in the auditory pathways play a specific role in the processing of periodicity.

Spatial representation of AM frequencies


The response of cortical regions sensitive to AM was mostly low-pass with a cutoff frequency ranging from 16 to 32 Hz. We found a bottom-up inverse gradient of AM frequency, the superior olivary complex responding best to high AM frequencies up to 256 Hz, the inferior colliculus to AM frequencies ranging from 32 to 256 Hz, depending on the subject. The medial geniculate body preferred AM frequencies around 16 Hz, Heschl's gyrus AM frequencies around 8 Hz, and regions lateral and posterior to Heschl's gyrus the lowest AM frequencies (4–8 Hz). These results are consistent with the observation of Harms et al. (1998), using different presentation rates of noise bursts, with the exception of the inferior colliculus which these authors found tuned to much lower rates. However, such differences in BMF might be related to differences in the stimulus type. In the auditory cortex of the cat, for instance, AM sounds produced higher BMFs than periodic click trains (Eggermont 1998).

The bottom-up inverse gradient we found is in agreement with a set of electrophysiological studies in nonhuman species, showing that:1) neurons situated in the superior olivary complex of rabbits respond preferentially to AM frequencies around 200 Hz (Kuwada and Batra 1999), 2) inferior colliculus neurons respond preferentially to AM frequencies between 50 and 1000 Hz (Langner and Schreiner 1988), 3) the BMFs in most cortical auditory fields range from 2 to 30 Hz (Schreiner and Urbas 1986, 1988), and 4) neurons located in auditory fields posterior to primary regions tend to have longer latencies than AI neurons and are therefore more likely to encode slower variations in signal amplitude (Heil and Irvine 1998). Such a gradient supports the hypothesis of a decomposition of the temporal envelope of sounds by a hierarchically organized series of AM filters.

Despite good concordance between our results in humans and electrophysiological data obtained in animals, an influence of the background noise, on the absolute regional BMFs found, remains possible.

Clustered segregation and selective processing of AM

We found distinct clusters of voxels responding selectively to specific AM frequencies over the whole range under study (4–256 Hz), but no consistent periodotopic gradient across subjects, neither within auditory fields (AI, for example) nor across the whole set of cortical regions responsive to AM. The existence of clusters of voxels globally tuned to specific AM frequencies is consistent with the notion of modulation channels (or AM band-pass filters) proposed byTansley and Suffield (1983), Houtgast (1989), Langner (1992), and Dau et al. (1997). However, the absence of a clear periodotopic gradient suggests that the organization of such modulation channels at a cortical level may be more complex than previously thought. We review the possible explanations for this negative finding.

Along with electrophysiological studies (Schreiner and Urbas 1988) that show differential tuning in cortical auditory fields, the predictions of a mathematical model recently proposed byMay (1999) suggest the existence of periodicity mapping in the auditory cortex. This model consists of coupled excitatory and inhibitory cells. Under tonic periodic input, the system behaves as a harmonic oscillator with damping, oscillating with a given resonance frequency that increases with the gain of the feedback loop. As inhibition increases in strength, the wavelength of the periodic stimulation to which the system is maximally responsive decreases. If the gain is set in a spatial fashion, the model reproduces cortical tonotopy with one best wavelength per location and one characteristic place per wavelength. In response to different stimulus rates, the model generates a complex map of periodicity. Each rate yields resonances at multiple harmonic places with a best resonance locus, and conversely, at each location several AM frequencies resonate. A subsequent prediction is that spatial resolution should be much coarser for periodicity maps than for tonotopic maps as the complete spatial segregation of rates requires a larger field. Such a difference in spatial resolution might account for the inconsistency between the scale of tonotopic maps (about 1.5 cm in AI, for example) and the scale of the periodotopic map proposed by Langner (1997), which spread over 4–5 cm and thus over several distinct auditory fields.

Although the model proposed by May (1999) predicts a coherent map, no clear periodotopic gradient emerges from these simulations. The very notion of a gradient is questionable since the model predicts not only variations in location but also variations in the amplitude of responses depending on AM frequency. This is not the case for tonotopic maps which show few amplitude differences as a function of wavelength. In the periodotopic map, low AM frequencies are represented by larger responses than high frequencies. This is in accordance with our data showing separate clusters for different groups of AM frequencies, with larger responses for low frequencies than for high frequencies. The absence of homogeneity in amplitude and spread of responses contradicts the idea of a true gradient (spatial segregation) but rather suggests a mapping by clustered segregation (spatial clustering) which agrees with our findings. To model their MEG data,Langner et al. (1997) assumed a single dipole per periodicity frequency (50, 100, 200, and 400 Hz). While necessary and common for technical reasons, this approach inadvertently biases the analysis toward the detection of a spatial segregation that may appear as a systematic gradient. Thus their finding probably reflects a incomplete picture of periodotopy.

Representation of AM frequencies by response patterns

We have seen that, despite a degree of spatial segregation, several AM frequencies are represented at each locus with a gradient in amplitude. We indeed found that most AM frequencies are represented in each volume element, and that they are grouped according to the response type analyzed. In most brain regions, we found that high and low AM frequencies gave rise to different response patterns, with sustained responses for low AM frequencies and decaying responses for high AM frequencies. Intermediate frequencies (32 and 64 Hz) gave rise to a mixed pattern but failed to produce activations as large as higher and lower AM frequencies even when modeling a mixed contribution from both response types.

Electrophysiological evidence is consistent with the idea that distinct response patterns may encode different AM-frequency ranges. For instance, in the superior olivary complex of the rabbit, Kuwada and Batra (1999) have characterized two categories of neurons showing a sustained response during the stimulus and a transient response to the offset of the stimulus, respectively. The so called “sustained” (excitatory) and “off” (inhibitory) neurons show this very distinct pattern of response for high AM frequencies (>50 Hz). For low AM frequencies (<50 Hz), all neurons show synchronous responses to the stimulus period, since the offset neurons discharge for each fall in amplitude. In the superior olivary complex, AM frequencies are encoded by two populations of cells with similar or distinct patterns depending on AM frequency.

In the cortex of the cat, Schreiner and Urbas (1988)showed high BMF neurons (BMF around 100 Hz) with high characteristic (audio) frequencies (>10 kHz) mainly clustered in the anterior auditory field, whereas neurons with lower characteristic frequencies generally had BMF below 20 Hz. Eggermont (1998) showed that most neurons exhibit synchronous responses to the AM period (up to an AM frequency of 16 Hz in the anterior auditory field and AII, and an AM frequency of 32 Hz in AI). Eggermont reported, however, variable correlations between characteristic frequency and BMF and a general predominance of low BMF neurons in all cortical fields. This is consistent with the patterns of BMF we obtained using the combined model that showed an overall predominance of low AM frequencies over the whole set of cortical regions responsive to AM sounds. Our results are thus in agreement with these electrophysiological studies, as we demonstrate not only a partial degree of spatial segregation between high and low BMF but also that high and low BMFs are represented by different response patterns at the same cortical sites.

The notion of a single locus encoding two ranges of modulation rates by two distinct response types finds support in electrophysiological recordings in awake monkeys. Steinschneider et al. (1998) observed that multi-unit activity evoked by click trains changed from a phase-locked pattern of response, following stimulation rates up to 100 Hz, to a transient pattern of response associated with a response at the onset of the stimulus, for higher stimulation rates. The sustained activity observed with f-MRI may reflect phase-locked periodic synaptic discharges and transient f-MRI responses may correspond to a transient pattern of response at the onset of the stimulus.

It is also possible that a distinction by neuron population and by response type in the same neuron accounts for the two categories of responses observed with f-MRI. Bieser and Muller-Preuss (1996) distinguished neurons that were able to code the AM of sounds from neurons that discharged only at the beginning and end of stimulation. The neurons are able to code AM displayed two response modes, phase-locking and spike-rate coding, to encode variations of the modulation rate. It remains, uncertain, however, whether a coding of AM by spike rate would be better estimated with a sustained, transient, or mixed f-MRI response model.

Although there is much evidence for a coupling between the hemodynamic response and synaptic activity, it is still speculative to relate the different types of response we observed to response properties of single neurons, as the time constants used in electrophysiological studies are very different from those used in the present study (millisecond vs. second). It is therefore unclear whether the two components of f-MRI responses arise from distinct neuronal populations (sustained versus transient neurons) or from distinct response patterns changing as a function of the AM-frequency range produced by a single neuronal population.

May (1999) proposed a model for periodotopic mapping which opens a field for speculation about segregation and integration mechanisms involved in the coding of high (50–500 Hz) and low (<50 Hz) AM frequencies, referred to as periodicity and temporal envelope, respectively, by Rosen (1992). In this model, inhomogeneous properties of neurons over space (such as spatial gradients of feedback strength) are not required. A simple timing mechanism that increases the responsiveness of cells periodically enables spatial separation of AM frequencies. According to electrophysiological studies (Eggermont 1998;Schreiner and Urbas 1988), most cortical neurons discharge periodically at slow rates (2–16 Hz) in a sustained fashion. These regular oscillations could impose a rhythm on the responsiveness of the system and elicit another periodotopic mapping mechanism, in another range of AM frequencies. Slow oscillations (2–10 Hz) could signal the temporal limits of syllables and trigger a maximal responsiveness at the very beginning of the syllabic segment when the cues permitting voicing detection (relying on higher AM frequencies) are present. With such a mechanism, the system's ability to segregate high AM frequencies spatially and thereby categorize phonemes would be maximal at the exact time when the essential features are available.

Studies relying on hemodynamic measures lack temporal resolution to assert this speculative mechanism. However, the repercussion of such a mechanism onto temporally low-pass filtered responses, as recorded in our study, would indeed consist of a mixed contribution of sustained and transient response components to low and high AM-frequency responses. Studies combining hemodynamic and electrophysiological recordings in humans therefore appear warranted to address this issue.


We thank K. Friston for the contribution to data analysis.

This work was supported by The European Commission and The Wellcome Trust. A.-L. Giraud is funded by Alexander von Humboldt Stiftung.


  • * A.-L. Giraud and C. Lorenzi contributed equally to the study.

  • Present address and address for reprint requests: A. L. Giraud, Physiologisches Institut III, Universtitätsklinikum, Theodor-Stern-Kai 7, 60590 Frankfurt/M, Germany (E-mail:Giraud{at}

  • The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.


View Abstract