Microsecond Precision of Phase Delay in the Auditory System of the Barn Owl

Hermann Wagner, Sandra Brill, Richard Kempter, Catherine E. Carr


The auditory system encodes time with sub-millisecond accuracy. To shed new light on the basic mechanism underlying this precise temporal neuronal coding, we analyzed the neurophonic potential, a characteristic multiunit response, in the barn owl's nucleus laminaris. We report here that the relative time measure of phase delay is robust against changes in sound level, with a precision sharper than 20 μs. Absolute measures of delay, such as group delay or signal-front delay, had much greater temporal jitter, for example due to their strong dependence on sound level. Our findings support the hypothesis that phase delay underlies the sub-millisecond precision of the representation of interaural time difference needed for sound localization.


The barn owl is well known for its superb sound-localization capabilities (Bala et al. 2003; Carr and Konishi 1990; Gerstner et al. 1996; Keller et al. 1998; Kempter et al. 2001; Koppl 1997; Moiseff and Konishi 1981; Pena et al. 1996; Sullivan and Konishi 1984, 1986; Viete et al. 1997). The owl uses interaural time difference (ITD) to encode sound azimuth with a behavioral accuracy of <10 μs (Bala et al. 2003) and a neuronal sensitivity of 25–100 μs (Bala et al. 2003; Moiseff and Konishi 1981). The highest overall monaural temporal sensitivity has been measured in the third-order nucleus laminaris (NL) where binaural convergence creates tuning to ITD (Carr and Konishi 1990; Reyes et al. 1996; Schwarz 1992; Sullivan and Konishi 1986). The NL exhibits a characteristic frequency-following multiunit response termed the neurophonic (Schwarz 1992; Snyder and Schreiner 1984; Sullivan and Konishi 1986), which we used to study timing. The neurophonic well represents both monaural and binaural temporal sensitivity (Sullivan and Konishi 1986).

Earlier measurements of the conduction time from the ear to NL have found values between 2 and 3 ms (Carr and Konishi 1990). Sullivan and Konishi (1984) already mentioned the importance of phase but did not quantify temporal precision. Koppl (1997) quantified temporal precision in the second-order nucleus magnocellularis that provides input to the nucleus laminaris but found that the response delay depended on sound level. The rate of change amounted to ∼5 μs/dB around the characteristic frequency. Taking into account that the range of interaural level differences experienced by barn owls is ∼20 dB (Keller et al. 1998; Viete et al. 1997), how is it possible that the ITD can be represented in NL of owls with a neuronal precision much sharper than 100 μs?

The auditory system encodes delay both as conduction time, an absolute time measure (Fitzgerald et al. 2001; Goldstein et al. 1971; Ruggero 1980) and, by phase locking, a relative time cue (Anderson et al. 1971; Carr and Konishi 1990; Koppl 1997; Reyes et al. 1996; Sullivan and Konishi 1984, 1986). Absolute and relative time codes are also known in physics, where a distinction is made between group velocity and phase velocity, thus resulting in group and phase delays (Anderson et al. 1971; Fitzgerald et al. 2001; Goldstein et al. 1971; Koppl 1997; Ruggero 1980). While group delay describes the latency of the envelope of a band-pass-filtered signal, phase delay refers to the times of occurrence of its peaks and troughs. The high-frequency limit of group delay, the signal-front delay, has also been used as a measure of delay (Fitzgerald et al. 2001; Ruggero 1980). We analyzed data obtained from the NL of the barn owl to determine which measure of delay was suited for precise and level-independent representation of ITD.

Nine barn owls (Tyto alba pratincola) were used in this study. The procedures conformed to National Institutes of Health guidelines for animal research and were approved by the animal care and use committee of the University of Maryland. In contrast to earlier studies, the analogue waveform of the neurophonic potential in or close to NL was recorded at a sampling period of 20.8 μs with commercial, Epoxylite-coated tungsten electrodes (Frederick Haer, Brunswick, ME) with impedances of 2–8 MΩ. Neurophonic recordings had the advantage of being stable for ≥1 h and allowed measurements of local multiunit activity. Specific recording sites were defined by combining stereotaxic techniques, physiological characterization, and histologically verified lesions.

Acoustic stimuli (clicks and noises) were digitally generated by custom-written software (“Xdphys” written in Dr. M. Konishi's lab at the California Institute of Technology, Pasadena, CA) driving a signal-processing system (Tucker Davies Technology, Gainesville, FL). Clicks had a rectangular form of varying intensity [0 dB (corresponding to 65 dB SPL) to 40-dB attenuation] and a duration of two samples (equivalent to 41.6 μs). Only condensation clicks were used. The standard click had 0 dB attenuation.

Neurophonic responses to clicks were recorded in the 3.5- to 7-kHz region of the tonotopically organized NL. The spontaneous activity (10 ms before click presentation) as well as the driven activity (10 ms after click presentation) were stored. Clicks were repeated 128 times (Fig. 1 A). The driven activity contained an oscillatory response (Fig. 1, A and B). Its envelope increased smoothly within ∼1 ms and fell off almost symmetrically. The oscillation under the envelope typically exhibited a complex waveform containing several spectral components. Fourier analysis showed that one or two components were <2 kHz (Fig. 1C). Another component was close to the best frequency as obtained from iso-level frequency response curves. Because we wanted to study processes related to frequency tuning, only the high-frequency component was analyzed. Therefore the neurophonic potential was high-pass filtered to reveal the oscillation of the high-frequency component alone (Fig. 1D). Auditory filtering is well described by gammatone functions and their derivatives (Irino and Patterson 2001; Tan and Carney 2003). Thus the high-pass filtered click-evoked response was fitted with a Gammatone function of order 3 (Fig. 1D).

FIG. 1.

Analysis of neurophonic potentials evoked with standard clicks. Data from monaural stimulation. A: time course of neurophonic potentials. The x axis shows the time after recording onset. The click was presented at 10 ms after recording onset. The 128 individual traces are stacked on top of each other. Scale bar: 25 mV. B: mean of the 128 traces shown in A. Scale bar: 2.5 mV. C: normalized amplitude of the Fourier transform (512 points) of the mean response from 9.984 ms (data point 480) to the end of the recording. Note the different spectral components. The dotted line indicates the high-pass filter used for the separation of the components. D: high-pass filtered click-evoked mean response (—) and fit of this response with a Gammatone function of order 3 ( · · · ). The envelope of the Gammatone function is shown by the · - · line. The group delay (τgr) corresponds to the peak of the envelope. The signal-front delay (τfr) and the phase delay (τph) were determined from the filtered, but unfitted data (see also text). Scale bar: 2.5 mV. E: histogram of the variation of the 3 delay measures obtained from 96 recording sites (176 data sets, 89 left and 87 right stimulation, 128 repetitions each). · · · : phase delay, - - -: signal-front delay, —: group delay. The bin width was 10 μs for the phase delay, 20 μs for the group delay, and 30 μs for the signal-front delay. Note that SDs are plotted, allowing for an analysis with a bin width that is smaller than the sampling interval.

In the click-evoked response, typically several peaks and troughs could be distinguished. We considered only local extrema occurring after stimulus presentation and having an amplitude greater than the mean plus 2 SDs of the background noise for at least three consecutive data points. The latency of the first extremum detected in this way was the signal-front delay (Fig. 1D). The group delay was assigned to the maximum of the envelope of the Gammatone function (Fig. 1D). The extremum closest to the group delay, determined in the first of the 128 trials that was within 1 SD of the average group delay, was chosen for the phase delay (Fig. 1D). To estimate the variability or “jitter” of each type of delay, we used the respective extrema in each of the 128 traces (Fig. 1A). The SD of these delays determined over the 128 repetitions was taken as a measure for the variability and was plotted as a data point in a histogram (Fig. 1E). In our sample of 176 data sets, the signal-front delay had the largest jitter (median: 500 μs), whereas the jitter in the phase delay was smallest (median: 10.4 μs). The jitter in the group delay (median: 64 μs) was between these two extremes. Thus the temporal precision necessary for ITD coding (Bala et al. 2003; Moiseff and Konishi 1981) could be achieved with the phase delay and the group delay but not with the signal-front delay. Therefore we did not consider the signal-front delay further.

In a second experiment, we decreased click amplitudes ≤40 dB. As is typically observed in audition, the group delay increased as the stimulus level decreased (Fig. 2). In the example shown in Fig. 2A, however, peak number 4 at 0 dB attenuation coincided with peak number 3 at 20-dB attenuation and with the barely visible peak number 1 at 40-dB attenuation (vertical line in Fig. 2A). Note that the definition of phase delay allows for a jumping between subsequent peaks (Fitzgerald et al. 2001). In 72 data sets obtained from 43 recording sites, phase delay remained essentially constant with a narrow distribution around one sample point (mean: −3 μs, Fig. 2B). In contrast, group delay increased ≤0.6 ms when click level was reduced by 20 dB (Fig. 2B), in agreement with the result in Carr and Konishi (1990).

FIG. 2.

Level dependence of click-evoked responses. Data from monaural stimulation. A: when the stimulus level decreased (numbers on left side refer to attenuation in dB), the shape of the waveform remained oscillatory, the amplitude decreased and the response occurred at longer latencies. However, as indicated by the vertical line, the position of the response maxima remained stable. Scale bar: 10 mV. B: level dependence of delays in the pooled data of 72 data sets. Stimulus amplitudes were decreased from 0- to 20-dB attenuation. To determine the changes in phase or group delay induced by changes in click level, the times of arrival of the respective response extrema evoked by the different click intensities were picked from the mean curves and subtracted from each other. For phase delay values, this sometimes included jumps between peaks as explained in the text. Each difference provided a data point for the histograms. —, group delay; · · · , phase delay. Bin widths: 20 μs for the phase delay and 40 μs for the group delay.

The low variability and the level invariance of phase delay implied that this delay measure would be the most reliable code for representing the behaviorally relevant interaural time difference (ITD). The “best ITD” at a given recording site may be computed from two phase delays obtained through monaural stimulation (Sullivan and Konishi 1986): we subtracted the phase delay for stimulation of the left ear from the phase delay for stimulation of the right ear. This subtraction lead to a “best ITD” that was independent of interaural level difference (ILD) (compare Fig. 3, A for 0 dB ILD with B for 10 dB ILD). On the other hand, ITDs between two group delays depended on the ILD. Even though an ITD computed from the group delay may be zero for 0 dB ILD (Fig. 3A), it changed significantly for 10 dB ILD (Fig. 3B).

FIG. 3.

Influence of interaural level difference (ILD) on interaural time difference (ITD). Data from monaural stimulation (A and B) and binaural stimulation (C). A: at equal levels of the clicks presented to the left and right ears (5 dB attenuation: L5, R5), the difference between right and left group delays (Δτgr) and phase delays (Δτph) were near 0 μs. B: when the level on the left side was decreased by 10 dB (L15), group delay increased, while the phase delay stayed constant. C: in the binaural situation with broadband noise as the stimulus, the positions of the extrema of the ITD curve for an ILD of 0 dB (—) were almost identical to the extrema of the ITD curve measured with an ILD of 10 dB ( · · · ). Events were obtained from an arbitrary setting of a TTL level trigger. Because Δτgr had changed but the ITD curve did not change, the phase delay, but not the group delay is important for representing ITD. Scale bars in A and B: 2.5 mV.

Because the delay differences obtained from monaural responses in Fig. 3, A and B, only indirectly represented the binaural situation, we also tested the level dependence of ITD tuning with binaural stimuli in a few (n = 6) cases. In Fig. 3C, we plotted the response at a given recording site as a function of the ITD of binaural noise in contrast to our previous results obtained through neurophonic potentials in response to monaurally presented clicks. The ITD tuning curves for ILDs of 0 and 10 dB were virtually identical in agreement with previous findings that ITD tuning for binaural stimulation is stable under varying conditions of stimulus level (Pena et al. 1996; Viete et al. 1997). Thus the level tolerance of ITD tuning as shown in Fig. 3C independently demonstrates the importance of phase delay.

Thus phase delay, and not group delay or signal-front delay, appears to underlie ITD tuning. Single-unit recordings from auditory nerve and cochlear nucleus support this conclusion (Koppl 1997; Sullivan and Konishi 1984). Likewise, auditory nerve fibers of squirrel monkeys show very low phase-delay jitter when stimulated with sinusoids near the characteristic frequency (Anderson et al. 1971).

The level independence of phase delay is consistent with a variety of filters proposed for peripheral auditory processing (Irino and Patterson 2001; Tan and Carney 2003). The remarkable stability of phase delay is consistent with the model of Gerstner et al. (1996) and Kempter et al. (2001), who predicted that during development synapses from NM to NL and axonal arbors from NM to NL are selected in such a way that phase delays are similar. These authors also argued that only such a selection allows for using phase delay to code temporal information in the NL and to represent ITD. Note that this conclusion is in line with the existence of a neurophonic potential in adult animals when it is assumed that the neurophonic potential is typically the summed response of an ensemble of magnocellular axons. A coherent summation of responses of different axons is only feasible when we have a coincident arrival of volleys of phase-locked spikes at the borders of NL and a coherent transmission of spikes through the nucleus. In other words, theory predicts that phase delays in different magnocellular axons must be similar.

Timing is important in many neuronal systems. It plays a role in models of learning (spike-timing-dependent plasticity) in feature binding as well as in precise reactions to dynamic stimuli—such as approaching targets. To compare the precision of the different systems, a temporal quality factor is helpful. The coefficient of variation (CV), defined as the quotient of SD and mean, may be a good criterion. In our example, the CV for the phase delay is ∼0.01. In systems such as the visual cortex, temporal jitter is much larger (Bair et al. 2002; Bisley et al. 2004) (CV ∼ 0.1), whereas values similar to the CV observed in the owl's NL are found in the electrosensory system (Carr et al. 1986) and the auditory system of bats (Covey and Casseday 1991) and may be computed from synfire chains (Abeles et al. 1993).


This research was sponsored by the German Research Foundation (DFG, Wa-606/12, Ke-788/1–3) and by National Institutes of Health Grants DC-000636 to C. E. Carr and by P30 04664.


M. Knepper and E. Smith helped with measuring and analyzing click responses. R. Schätte made helpful comments on the manuscript.


  • The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.


View Abstract