Journal of Neurophysiology

Responses to Interaural Time Delay in Human Cortex

Katharina von Kriegstein, Timothy D. Griffiths, Sarah K. Thompson, David McAlpine


Humans use differences in the timing of sounds at the two ears to determine the location of a sound source. Various models have been posited for the neural representation of these interaural time differences (ITDs). These models make opposing predictions about the lateralization of ITD processing in the human brain. The weighted-image model predicts that sounds leading in time at one ear activate maximally the opposite brain hemisphere for all values of ITD. In contrast, the π-limit model assumes that ITDs beyond half the period of the stimulus center frequency are not explicitly encoded in the brain and that such “long” ITDs activate maximally the side of the brain to which the sound is heard. A previous neuroimaging study revealed activity in the human inferior colliculus consistent with the π-limit. Here we show that cortical responses to sounds with ITDs within the π-limit are in line with the predictions of both models. However, contrary to the immediate predictions of both models, neural activation is bilateral for “long” ITDs, despite these being perceived as clearly lateralized. Furthermore, processing of long ITDs leads to higher activation in cortex than processing of short ITDs. These data show that coding of ITD in cortex is fundamentally different from coding of ITD in the brain stem. We discuss these results in the context of the two models.


Many mammals, including humans, are sensitive to small differences in the timing of sounds at the two ears (interaural time differences [ITDs]). ITDs are one of the two binaural cues (the other being interaural intensity differences [IIDs]) underpinning the location of the source of a sound in the horizontal plane. Jeffress (1948) proposed a theory for the processing of ITD and, following this theory, a dominant model has emerged of how sensitivity to ITDs is achieved. The model, assumed to hold in mammals (Goldberg and Brown 1969; Yin and Chan 1990) and birds (Carr and Konishi 1990; Overholt et al. 1992; Sullivan and Konishi 1986), consists of an array of coincidence detectors innervated by a series of delay lines—axons of differing path length from each ear. Coincidence detectors are activated when nerve action potentials from each ear arrive simultaneously, with axonal conduction delays compensating for the difference in timing of the neural signals from the two ears. Differences in the timing of a sound at each ear thus excite different neurons in the array, creating a topological map of preferred ITDs (Fig. 1). In mammals, the medial superior olive (MSO) of the brain stem appears to be the primary site of binaural coincidence detection (Goldberg and Brown 1969; Spitzer and Semple 1995; Yin and Chan 1990), although neurons at higher levels in the auditory pathway are also found to be tuned for ITD (Fitzpatrick et al. 2002; Kuwada and Yin 1983; Reale and Brugge 1990). Usually these centers are maximally excited by sounds leading at the opposite (contralateral) ear. This contralateral representation is consistent with the representation of space in other modalities.

FIG. 1.

Models for interaural time difference (ITD) processing. A: representation of the weighted-image model. Density of ITD detectors is uniform across frequency bands and is greater for short than for long ITDs (denoted by gray scale). For sounds with consistent ITD across frequency and of sufficient bandwidth (e.g., −1,500 μs centered at 500 Hz), neural activation at brain centers central to the primary binaural integrators is greater contralateral to the side at which the sound is leading in time (see curve above matrix). B: representation of ITD detectors constrained by the “π-limit” (red curves). The same broadband stimulus with the same ITD (i.e., −1,500 μs) as in A generates greater neural activation ipsilateral to the side at which the sound is leading in time.

Because sound is encoded in discrete frequency bands, models of ITD processing usually include the existence of a full representation of frequency and ITD channels in a three-dimensional matrix (the third dimension corresponding to the magnitude of the activation)—the so-called cross-correlogram (Trahiotis et al. 2001). It is generally considered that the cross-correlogram represents an explicit code for ITD, with all possible values of ITD represented by neurons of various internal delay-line differences (including ITDs beyond the physiological range). Neurons with positive internal delay difference compensate for negative ITDs (sounds heard to originate on the left) and are located at the right side of the brain. Neurons with negative internal delay differences compensate for positive ITDs and are located on the left side of the brain. It is assumed that more coincidence-counting units exist that are wired for shorter ITDs and fewer for longer ITDs. This nonuniform distribution of ITD detectors is called centrality-weighting and has been implemented to explain a range of binaural psychophysical phenomena including the side and extent to which a sound is heard to be lateralized, depending on its ITD, bandwidth, and spectral content (Jeffress 1972; Stern et al. 1988; Trahiotis and Stern 1994). For example, a relatively narrow noise band (50 Hz) with a relatively long negative ITD (center frequency of 500 Hz and ITD of −1,500 μs)—i.e., leading at the left ear—is heard on the right, i.e., on the wrong side. In contrast, a noise with the same ITD but a larger bandwidth (400 Hz) is perceived correctly lateralized to the left. This apparent error in perceptual lateralization is difficult to explain with the existence of a full representation of frequency and ITD channels in a two-dimensional matrix, but can be explained by centrality-weighting. The “weighted-image model” (Stern and Trahiotis 1997; Stern et al. 1988; Trahiotis and Stern 1989, 1994) assumes the existence of more coincidence-counting units wired for shorter ITDs (centrality-weighting) and progressively fewer for ITDs beyond the physiologically relevant range (±700 μs in humans) (Fig. 1A). Because of the periodicity of the waveform (the period of 500 Hz is 2,000 μs), there will also be an activation peak at +500-μs delay. In the case of the narrow noise band (50 Hz), the auditory system resolves the ambiguity by selecting the activation peak that is more central, i.e., nearest to the delay-line difference of zero. This lateralizes the percept of the sound to the “wrong” side. The correct perceptual lateralization of the wider bands of noise has been explained by a second level of coincidence detection, suggested to reside in the inferior colliculus (IC; Trahiotis and Stern 1994)—the major projection nucleus of the MSO—that emphasizes across-frequency consistency of the activation peak at the true delay of −1,500 μs. Such a straightness weighting amplifies the true activation maximum in the cross-correlogram at −1,500 μs that is thought to be coded in the brain hemisphere opposite to the ear at which the sound is leading (see Fig. 1A, Supplemental Table S1).1

Recent in vivo recordings made in multiple brain centers in a variety of mammalian species indicate a restricted range of ITD detectors for each low-frequency sound channel; there are no neurons showing tuning for ITDs beyond approximately ½ cycle of the center frequency of each auditory filter (i.e., frequency band) (Brand et al. 2002; Hancock and Delgutte 2004; Joris and Yin 2007; McAlpine et al. 2001). This π-limit could be viewed as an extreme form of centrality-weighting; instead of progressively fewer detectors tuned to larger values of ITD, there is thought to be a complete absence for detectors beyond π radians interaural-phase difference in each frequency band. Because of this lack of ITD detectors for delays beyond this limit (denoted by the red curves in Fig. 1B), the π-limit model assumes that there is no neural substrate on which a weighting for straightness might act. Accordingly, a model that incorporates the π-limit predicts a brain response to broadband sounds with ITDs beyond the π-limit, opposite to that of the weighted-image model (see Fig. 1 and Supplemental Table S1 for an overview of the predictions of the two models).

This notion is supported by a recent functional magnetic resonance imaging (fMRI) study showing that activation to interaurally delayed broadband noise in the inferior colliculus (IC) of the human midbrain is greater ipsilateral to the perceived location of the source (i.e., to the ear at which the sound leads in time) for ITDs beyond the π-limit (±1,500 μs) (Thompson et al. 2006). The data suggest that ITDs beyond the π-limit are represented by activity of neurons tuned to ITDs within the π-limit and that human IC has no, or fewer, neurons tuned to ITDs beyond the π-limit. No evidence of straightness weighting was found at the level of the IC, at least with respect to elements contributing to the blood oxygenation level–dependent (BOLD) response. Nevertheless, it remains possible that some form of neural computation takes place at a later point in the auditory neuronal hierarchy, which changes the side of activation at the cortical level in such a way that the higher activation is in the brain hemisphere opposite to the perceived location of the sound.

Here, we examine cortical fMRI activation patterns to 400-Hz-wide band-pass noises, centered at 500 Hz, of the same subjects for which IC activation was assessed previously (Thompson et al. 2006). Whereas ITDs within the π-limit (i.e., ±500 μs) evoked a greater BOLD activation in the hemisphere contralateral to the side at which the sound was leading, and from which the sound was heard to originate, ITDs beyond the π-limit (±1,500 μs) generated a pattern of BOLD activity that was balanced between the brain hemispheres. Furthermore, BOLD activity in primary auditory cortex (PAC) was stronger for sounds with ±1,500 μs than for sounds with ±500 μs. The data suggest that the side to which a sound is heard to be lateralized is not necessarily reflected in the lateralization of the cortical response.


All of the cortical data described in the current study were recorded in the same subjects and during the same recording sessions as data previously described for the inferior colliculus (Thompson et al. 2006).

Sound stimuli

As described previously (Thompson et al. 2006), stimuli were noise exemplars (fixed-amplitude, random-phase) with a bandwidth of 400 Hz and a center frequency of 500 Hz. Each noise had a duration of 1,000 ms, including 40-ms raised-cosine gates, and was generated at a sample rate of 44,100 Hz (and 16-bit resolution) using the Binaural Toolbox (Akeroyd 2001; Eighty independent noise exemplars were used in each of five conditions, which differed only in ongoing ITD (i.e., there was no difference in onset time at the two ears). In all other respects the waveforms presented to the left and right ears were identical. The conditions were 1) left 500 (−500-μs ITD, left ear leading), 2) right 500 (+500-μs ITD, right ear leading), 3) left 1,500 (−1,500-μs ITD, left ear leading), and 4) right 1,500 (+1,500-μs ITD, right ear leading). We chose these conditions because they permit one to test 1) a common prediction for the weighted-image model and the π-limit model and 2) an opposing prediction of the two models (see Supplemental Table S1). Perceptually, left 500 and left 1,500 are lateralized to the left, and right 500 and right 1,500 to the right, consistent with the side on which the sounds lead in time. A condition with no ITD (Center, 0 ITD) was included as a control and silent trials were included in the paradigm to provide a measure of baseline level of neural activity. Stimuli were presented using Cogent 2000 software ( at maximum system volume (∼80 dB SPL) through in-house modified electrostatic headphones (Koss, Milwaukee, WI).

Data collection

For fMRI, BOLD contrast images were acquired using T2*-weighted echo-planar imaging (EPI) on a Siemens 3-Tesla scanner at the Functional Imaging Laboratory (London). A sparse imaging protocol was used to counteract the effects of scanner noise on the BOLD signal (Hall et al. 1999). Our stimuli consisted of eight consecutive noise bursts from the same condition (8 s) or 8 s of silence. During the stimulus presentation, the scanner was silent. We then acquired a single volume of the whole brain, consisting of 48 slices (slice thickness, 2 mm; interslice distance, 1 mm) in an ascending axial sequence (echo time [TE] = 65 ms; acquisition time [TA] = 3.12 s). Cardiac triggering was applied to lessen artifacts caused by pulsatile motion of the brain stem, by ensuring that the volume acquisition always began at the same point in a subject's cardiac cycle. Because of this, there was a variable scan repetition time (TR ≈ 11.12 s ± ½ cardiac cycle).

Ten volumes were collected per condition in each of four functional runs for each subject. Trial orders were randomized on-line in each run. We therefore collected 40 volumes for each of the six conditions: 240 in total from each subject. Two dummy scans at the beginning of each run were discarded. A T1-weighted structural scan for each subject was also collected during the same session as the functional images (apart from two subjects, who had preexisting structural scans acquired on the same machine). Fourteen subjects (six male, eight female), between 23 and 57 yr of age (mean = 31 yr), took part. All subjects gave informed consent and the experiment was carried out with the approval of the Institute of Neurology Ethics Committee (London). All were right-handed and none had any hearing impairment or history of neurological disorder. During scanning, they were asked to pay attention to the noises and to press a button on their keypad at the end of each trial to maintain alertness. They were also asked to keep their eyes open and fixate on a central cross, to counteract any confound caused by correlated eye movements. Subjects' eye movements were monitored on-camera and subjects did not report any difficulty in maintaining fixation.

Data analysis

Imaging data were analyzed using the statistical parametric mapping algorithm (Friston et al. 1995) implemented in SPM2 ( Functional scans for each individual subject were first realigned and unwarped, to correct for movement-related artifacts and geometric distortion (Andersson et al. 2001). Structural and EPI images were coregistered and normalized to a standard gray-matter template (Montreal Neurological Institute [MNI]). They were thereby transformed into a standard stereotaxic space and subsampled with a voxel resolution of 2 × 2 × 2 mm (original voxel resolution was ∼3 × 3 × 3 mm). Data were spatially smoothed with a Gaussian smoothing kernel of 10 mm. SPM2 was used to compute individual subject analyses according to the general linear model by fitting the data time series with the canonical hemodynamic response function at the onset of each trial. Each condition was modeled as an individual regressor in the design matrix and statistical parameter estimates were computed individually for each brain voxel. Individual statistical maps were generated for the following contrasts of interest: −500-μs ITD > +500-μs ITD and vice versa; +1,500-μs ITD > −1,500-μs ITD and vice versa; −500-μs ITD > 0-μs ITD; −1,500-μs ITD > 0-μs ITD; +500-μs ITD > 0-μs ITD; +1,500-μs ITD > 0-μs ITD; and ±1,500-μs ITD > ±500-μs ITD. The resulting contrast maps for individual subjects were entered into a second-level analysis to allow population-level inferences to be drawn (random effects). Activity was considered significant at P < 0.001, uncorrected, if the location was in accordance with prior hypothesis or previous literature. To locate the PAC we used the probability maps provided in the Anatomy toolbox for SPM (Eickhoff et al. 2007; Morosan et al. 2001).


Cortical BOLD activation for contrasts between +500- and −500-μs ITDs

Implicit in many models of auditory spatial processing is that brain activation for ITDs of ±500 μs in the 500-Hz band (i.e., within the π-limit) will be lateralized according to the side from which the sound is heard to originate. Thus BOLD activation should also be greater in the hemisphere contralateral to the perceived (lateralized) location. We first contrasted conditions of −500-μs ITD versus +500-μs ITD. In accordance with the expectations, the contrast −500-μs ITD > +500-μs ITD activates right planum temporale (PT) (Fig. 2, red) and the reverse contrast activates left PT (Fig. 2, green). This pattern of activity is consistent with the BOLD activation in the midbrain obtained for the same stimulus configurations in the same subjects (Thompson et al. 2006); i.e., activations of both the inferior colliculus and the auditory cortex are located in the same brain hemisphere. There were no further consistently lateralized activations in other cortical areas even at a low statistical threshold (P = 0.05, uncorrected) (Supplemental Fig. S1).

FIG. 2.

Representation of short ITDs in the cortex. Group statistical parametric maps for the contrasts between −500-μs ITD > +500-μs ITD (red) and +500-μs ITD > −500-μs ITD (green) are overlaid on a structural image of one of the participants (n = 14, P < 0.001).

Cortical BOLD activation for contrasts between +1,500- and −500-μs ITDs

Our previous analysis showed that ±1,500-μs ITDs generate greater activity in the IC ipsilateral to the perceived location of the sound source. However, contrary to the activity suggested by the π-limit, the observations in the IC, and to the immediate prediction of the weighted-image model, the contrasts between −1,500- and +1,500-μs ITDs show that, at the cortical level, neither hemisphere is activated to a greater extent than the other; no significant clusters exist in PT or Heschl's gyrus (HG) even at a low statistical threshold (P = 0.05, uncorrected) for the contrast −1,500-μs ITD > +1,500-μs ITD, and for the reverse contrast. Also in other cortical areas there were no consistently lateralized activations even at a low statistical threshold (P = 0.05, uncorrected) (Supplemental Fig. S2).

Cortical BOLD activation for contrasts between conditions with ITDs of ±1,500 and ±500 μs and 0-μs ITD

The lack of differential activation for the contrasts between the conditions with long ITDs can be explained by the bilateral activity in PT/HG when contrasting 1,500-μs ITD > 0-μs ITD (Fig. 3, Table 1). As expected, the same contrast for small ITDs (500-μs ITD > 0-μs ITD) reveals lateralized activity in PT (Fig. 3, Table 1).

FIG. 3.

Representation of short and long ITDs in the cortex. A: group statistical parametric maps for the contrasts between −500-μs ITD > 0-μs ITD (top) and −1,500-μs ITD > 0-μs ITD (bottom) overlaid on a structural image of one of the participants (n = 14, P < 0.001). B: group statistical parametric maps for the contrasts between +500 μs > 0 μs (top) and +1,500 μs > 0 μs (bottom) are overlaid on a structural image of one of the participants (n = 14, P < 0.01). The responses to positive ITDs are weaker than the responses to negative ITDs. This asymmetry in responsiveness is in accord with a previous report (Krumbholz et al. 2005).

View this table:

Local activation maxima for short and long ITDs in planum temporale (PT) and Heschl's gyrus

Cortical BOLD activation for contrasts between conditions with ±1,500- and ±500-μs ITDs

Long ITDs in contrast to small ITDs (±1,500-μs ITD > ± 500-μs ITD) activate bilateral PAC (TE1.0) (Fig. 4, Table 1). Thus long ITDs generate greater PAC activation than short ITDs, regardless of perceptual lateralization.

FIG. 4.

Activation of primary auditory cortex by long ITDs. Group statistical parametric map for the contrast between (−1,500-μs ITD and +1,500-μs ITD) > (−500-μs ITD and +500-μs ITD) overlaid on a structural image of one of the participants (n = 14, P < 0.001).


Consistent with existing models of ITD processing, the cortical BOLD response for short ITDs (500 μs) is greater in the hemisphere opposite to the perceived location of the source. In contrast, sounds with long ITDs (1,500 μs) lead to bilateral activation in the cortex. There was no difference in activation in auditory cortex of the two hemispheres (nor in other areas of the cortex) for these long ITDs. This bilateral activation is a surprising result because it does not accord with either the explicit predictions of the π-limit or with those of the weighted-image model. It questions the principle that the lateralization of cortical activation is required for perceptual lateralization. Also, the greater activation to long ITDs compared with that to short ITDs in both hemispheres was not predicted by either model.

A potential explanation for the lack of hemispheric lateralization in BOLD response could be a difference in the amount of perceptual lateralization for the stimuli with 500- and 1,500-μs ITDs used in the current experiment. However, experiments explicitly testing the extent of perceptual laterality for these sounds support the conclusion that—although the sounds with 1,500-μs ITD have a more diffuse intracranial percept—they are experienced as being strongly and equally lateralized (Buell et al. 1991, 1994; Schiano et al. 1986; Trahiotis et al. 2001; Yost et al. 2007). The extent of laterality has been quantified by investigating the amount of perceptual laterality induced by an interaural intensity difference (IID) that is required to match the lateral position of a sound image produced by different values of ITDs. Importantly, the matching IIDs for bands of noise with ITDs of 500 and 1,500 μs are reported to be of similar magnitude. The equivalence is similar to the maximum extent of perceptual lateralization for IIDs per se in human listeners, i.e., 12- to 15-dB IIDs (Buell et al. 1994). Thus it appears unlikely that the failure to observe any hemispheric lateralization in the BOLD response in cortex is attributable to a lesser, or absent, perceived lateralization of the sounds with ITDs of 1,500 μs. On the contrary, the data rather imply that the amount of hemispheric lateralization of cortical activity is not associated with the amount of perceptual lateralization for ITDs of 1,500 μs.

Bilateral activation to long ITDs could occur if one assumes a summing of activity within the cross-correlogram (Shackleton and Meddis 1992) instead of (or as first step toward) the weighting function proposed in the weighted-image mode (Trahiotis and Stern 1994). Such summing could explain bilateral activation: for a broadband sound with −1,500-μs ITD, there would be a summing across the straight activation at the −1,500-μs ITD detectors and a summing of the curved activation one cycle of phase away (and within the π-limit). At the −1,500-μs channels, activity would be straight, but within a lower density of ITD detectors. At the +500-μs channels, the activity would be curved but at a site with a high density of ITD detectors (see Fig. 1A). However, we see two drawbacks to this explanation. First, it assumes the existence of “Jeffress-type” ITD detectors beyond the π-limit, which, to date, have not been found in single-neuron recordings (Joris and Yin 2007; McAlpine et al. 2001). Furthermore, a straightness summation cannot explain the greater overall activation for sounds with an ITD of 1,500 μs compared with the activations to ITDs of 500 μs (Fig. 4). Long delays (1,500 μs) show a “straight” activity pattern in the cross-correlogram at sites with no (or fewer) ITD receptors, but “curved” activity pattern at sites with (or with more) ITD detectors (Fig. 1). In contrast, short delays (500 μs) show a “straight” activation pattern, at a site with (or with more) ITD detectors, but a “curved” activity pattern at a site with no (or fewer) ITD detectors. In theory, one would therefore expect that sounds with short ITDs lead to higher overall activity at the level of auditory cortex. However, the results show that the reverse is the case, i.e., the BOLD response in HG is higher for long ITDs in contrast to short ITDs. This suggests that the influence of the long ITDs on generators of the BOLD response outweighs even the central (and straight) activation of the shorter delays.

How can one explain greater overall activation in HG (PAC) for long ITDs in contrast to short ITDs (Fig. 4)? One potential explanation is based on the assumption that there might be a π-limit for ITD detectors. Under this assumption, 400-Hz-wide sounds with ITDs beyond the π-limit produce a reduced interaural correlation (IAC) within the π-limit of the cross-correlogram, i.e., only the damped side peak is present (Fig. 1). For the 400-Hz-wide noises used in our study, the IAC of the side peak at +500 μs would fall to about 0.89 when an ITD of ±1,500 μs is used. If a reduced IAC within the π-limit elicits the increased BOLD response to long ITDs, it is expected that it is located in an area known to be responsive to reduced IAC, which seems to be the case. Hall et al. (2005) showed that the BOLD response in a region in HG is higher for sounds with low IAC in contrast to sounds with high IAC (MNI coordinate, which is most consistent across subjects: −50, −26, 4). This activation is located in a very similar location to the one we find with the contrast 1,500 μs > 500 μs (MNI coordinates: −50, −22, 10; 52, −16, 8).

A reduction in IAC might be an explanation—for the increase of responses in PAC for ±1,500-μs ITD stimuli in contrast to ±500-μs ITD stimuli—but gives no answer of how ±1,500-μs ITD stimuli give rise to a lateralized percept. One possibility could be that the pattern in the cross-correlogram is analyzed. Such a pattern analysis could be based on the tilt of the activation pattern. A positive ITD results in a left-to-right tilt with increasing frequency and negative ITD results in a right-to-left tilt. Zero ITD produces a pattern symmetric around zero. Thus perceptual lateralization would rather rely on the pattern of activation within the ITD-detector map than on the cortical activation balance. Such pattern analysis could be performed by cortical areas in both hemispheres, which operate on the activation pattern of the ITD-detector maps as represented in the IC.

In summary, perception of sound lateralization is not necessarily associated with a lateralization of cortical activity. Although there is lateralization of activity for short ITDs there is bilateral activation for long ITDs, despite a clear perceptual lateralization for both magnitudes of ITD. The pattern of activity in response to long ITDs in cortex is fundamentally different from the pattern in the brain stem. These cortical responses are not compatible with either the immediate predictions of the weighted-image model or the observed π-limit. Furthermore, these data challenge the traditional view that space is necessarily encoded by the side of the brain that is opposite to the perceived location.


This work was supported by a Medical Research Council Programme Grant (UK) to D. McAlpine and S. K. Thompson, a Volkswagen Foundation (Germany) grant to K. von Kriegstein, and a Wellcome Trust (UK) grant to T. D. Griffiths.


We thank T. Marquardt for helpful discussions and comments on the manuscript.


  • 1 The online version of this article contains supplemental data.

  • The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.


View Abstract