Journal of Neurophysiology

Error message

  • Notice: PHP Error: Undefined index: custom_texts in highwire_highwire_corrections_content_type_render() (line 33 of /opt/sites/jnl-jn/drupal-highwire/releases/20151124215058/modules/highwire/plugins/content_types/
  • Notice: PHP Error: Undefined index: atom:link in HWCitedBy->getResults() (line 35 of /opt/sites/jnl-jn/drupal-highwire/releases/20151124215058/modules/highwire/
  • Warning: PHP Error: Invalid argument supplied for foreach() in HWCitedBy->getResults() (line 35 of /opt/sites/jnl-jn/drupal-highwire/releases/20151124215058/modules/highwire/
  • Notice: PHP Error: Undefined index: atom:author in HWCitedBy->getResults() (line 65 of /opt/sites/jnl-jn/drupal-highwire/releases/20151124215058/modules/highwire/

Frequency-dependent interaural delays in the medial superior olive: implications for interaural cochlear delays

Mitchell L. Day, Malcolm N. Semple


Neurons in the medial superior olive (MSO) are tuned to the interaural time difference (ITD) of sound arriving at the two ears. MSO neurons evoke a strongest response at their best delay (BD), at which the internal delay between bilateral inputs to MSO matches the external ITD. We performed extracellular recordings in the superior olivary complex of the anesthetized gerbil and found a majority of single units localized to the MSO to exhibit BDs that shifted with tone frequency. The relation of best interaural phase difference to tone frequency revealed nonlinearities in some MSO units and others with linear relations with characteristic phase between 0.4 and 0.6 cycles. The latter is usually associated with the interaction of ipsilateral excitation and contralateral inhibition, as in the lateral superior olive, yet all MSO units exhibited evidence of bilateral excitation. Interaural cochlear delays and phase-locked contralateral inhibition are two mechanisms of internal delay that have been suggested to create frequency-dependent delays. Best interaural phase-frequency relations were compared with a cross-correlation model of MSO that incorporated interaural cochlear delays and an additional frequency-independent delay component. The model with interaural cochlear delay fit phase-frequency relations exhibiting frequency-dependent delays with precision. Another model of MSO incorporating inhibition based on realistic biophysical parameters could not reproduce observed frequency-dependent delays.

  • interaural time delay
  • interaural time difference
  • sound localization
  • gerbil
  • superior olivary complex

humans and many other animals localize sound in the horizontal plane using interaural sound cues. In humans, the interaural time difference (ITD)—the difference of arrival time of sound to the two ears—is the dominant cue to localize sound azimuth whenever low-frequency information is available (Wightman and Kistler 1992). The medial superior olive (MSO) is considered the primary site along the auditory pathway to encode ITDs in its neuron's firing patterns (Yin 2002), as it receives input bilaterally from monaural brain stem nuclei [the anteroventral cochlear nuclei (AVCN)] and yields spike rate responses that are tuned to the ITD of sound (Goldberg and Brown 1969; Yin and Chan 1990).

As initially proposed by Jeffress (1948), the excitatory inputs onto the ITD processor phase-lock to pure tones (e.g., Joris et al. 1994a) and elicit a tuned rate response to ITD in the MSO via coincidence detection (Goldberg and Brown 1969). While the basic construction of ITD sensitivity is clear, there has been much debate over the mechanisms of internal delay that determine which ITD elicits the greatest rate response, i.e., the best delay (BD) (Joris and Yin 2007; McAlpine and Grothe 2003). The dependence of BDs across the range of frequencies used for ITD processing has implications for the neural coding of ITD (e.g., Harper and McAlpine 2004; McAlpine et al. 2001).

The traditional candidate mechanism of internal delay is differences in axonal conduction time of the bilateral inputs onto MSO. Jeffress (1948) originally proposed systematic delay lines of axonal conduction differences onto a binaural nucleus (now known to be the MSO); anatomical reconstruction of these axonal projections found the pattern of delays to be inconsistent with Jeffress' proposal and, further, could not fully account for the distribution of BDs observed physiologically (Karino et al. 2011). Another mechanism of internal delay is the contralateral glycinergic inhibition onto MSO from the medial nucleus of the trapezoid body, whose pharmacological manipulation was found to shift the BDs of gerbil MSO neurons (Brand et al. 2002; Pecka et al. 2008). Differences in the synaptic dynamics of ipsilateral and contralateral excitation onto MSO have also been proposed as a mechanism of internal delay (Jercog et al. 2010). Finally, interaural cochlear delays—the topic of the present study—may be a mechanism of internal delay, as first proposed by Schroeder (1977) and later elaborated on in a computational model by Shamma et al. (1989).

Interaural cochlear delays arise from the mismatching of bilateral axons onto a target MSO neuron (Fig. 1). Central auditory nuclei are cochleotopically arranged, with a given neuron receiving inputs that originate from a common location along the cochlear basilar membrane. If the convergence of inputs from the ipsilateral and contralateral AVCN onto a given MSO neuron is not cochleotopically precise, an interaural internal delay will be created because of the extra time it takes the cochlear traveling wave to propagate along the basilar membrane on one side (Fig. 1A). The frequency tuning of the ipsilateral and contralateral inputs onto an individual MSO neuron will largely overlap but may differ slightly in characteristic frequency (CF)—the frequency with the lowest threshold (Fig. 1B). Modeling studies have shown that small CF mismatches can qualitatively account for large delays in experimental data (Bonham and Lewis 1999; Shamma et al. 1989). Furthermore, a simulation of MSO response via a cross-correlation of physiologically recorded spike trains of auditory nerve (AN) fibers with slightly different CFs demonstrated that interaural cochlear delays could account for the observed dependence of BDs on CF (Joris et al. 2006). CF mismatches have been measured in the barn owl nucleus laminaris (the avian analog of MSO) and found to be clustered near zero but could be as large as 500 Hz for CFs in the barn owl range of 3–8 kHz (Fischer and Pena 2009; Pena et al. 2001). These barn owl studies found that CF mismatches did not correlate with the characteristic delay of laminaris neurons. However, this does not rule out the influence of interaural cochlear delays in conjunction with other mechanisms of internal delay. In the mammalian MSO, there are currently no data on monaural threshold frequency tuning or other compelling evidence for interaural cochlear delays.

Fig. 1.

Interaural cochlear delay. A: the eventual input to a medial superior olive (MSO) neuron from each side originates in the cochlea from spiral ganglion neurons that are excited by slightly different locations along the basilar membrane (gray dots). An interaural delay arises from the difference in propagation time of the cochlear traveling wave to each location. B: the frequency tuning of the ipsilateral (dashed) and contralateral (solid) input onto the MSO neuron will largely overlap but have slightly different characteristic frequencies. C: the interaural cochlear delay shifts the best delay of the interaural time difference (ITD) tuning function. Here, sound with a positive (contralateral leading) ITD compensates for the longer propagation time of the cochlear traveling wave on the contralateral side.

In this study, we report data from the gerbil MSO that implicate the influence of interaural cochlear delays in addition to other mechanisms of internal delay. We observed a tendency in many MSO neurons of the BD to shift systematically with tone frequency, i.e., a frequency-dependent internal delay. Frequency-dependent delays have been reported previously in data from the superior olivary complex (SOC) (e.g., Batra et al. 1997), although in some cases the anatomical localization of these responses to the MSO has been unclear. It has been suggested that frequency-dependent delays could arise from interaural cochlear delays (Yin and Kuwada 1983) or phase-locked inhibition (Batra et al. 1997; Leibold 2010). Bonham and Lewis (1999) found in a MSO model consisting of the cross-correlation of model AN output that CF mismatches created frequency-dependent delays. We used a similar model and show that interaural cochlear delay can both qualitatively and quantitatively account for the types of frequency-dependent delays occurring in our data. We also found that a model of phase-locked inhibition using realistic biophysical parameters could not account for frequency-dependent delays.


All experimental procedures were approved by the Institutional Animal Care and Use Committee at New York University.


Adult (P58–P105) Mongolian gerbils (Meriones unguiculatus) of both sexes, weighing between 52 and 90 g, were used. Gerbils were initially anesthetized with pentobarbital sodium (60 mg/kg ip). Anesthetic state was monitored every 15 min during surgery and every 30 min during recording, with additional doses of pentobarbital sodium (∼12 mg/kg ip) and ketamine hydrochloride (∼12 mg/kg im) as needed to maintain slow respiration rate and no withdrawal to foot pinch. A heating pad was used to maintain a constant rectal temperature of 38°C. The trachea was cannulated. The skull was exposed, and two small screws were placed laterally into the parietal bones, followed by the fixing of a metal headpost rostral to the lambdoid suture with bone cement (Biomet Orthopedics). A craniotomy was performed immediately caudal to the transverse sinus and centered on the midline. The headpost was inserted into a stereotaxic apparatus with the rostrocaudal axis of the head tilted to make the eyes level with the ear canals. The pinnae were removed, and sound delivery earpieces were sealed around the openings of the ear canals. The electrode was lowered vertically into the cerebellum through a hole in the dura.

Electrophysiological recording.

Extracellular recordings were made with parylene-coated tungsten microelectrodes (Microprobe) with tips plated with gold and then platinum. Electrical signals recorded in the brain stem were amplified and band-pass filtered at 0.8–10 kHz. Filtered neural signals were fed to an event processor with associated data acquisition/analysis software (MALab, Kaiser Instruments).

Single unit isolation in the MSO is notoriously difficult, owing to the small, graded amplitude of the action potential (Scott et al. 2005, 2007) in the midst of a large, coherent local field potential (the neurophonic) driven by highly phase-locked inputs. Spontaneous fluctuations of the neurophonic in silence and at low sound levels were not readily distinguishable from spikes, nor were the sharp fluctuations of the neurophonic in response to noise stimuli. Unlike other areas of the brain, action potential amplitudes recorded with in vitro whole cell recordings at the soma of MSO neurons have been shown to be highly variable within the same unit (Scott et al. 2007). Furthermore, the action potential amplitude decreases with the rise time of the underlying excitatory postsynaptic potential (EPSP) (Scott et al. 2007), i.e., multiple, coincident synaptic events summate to produce a short rise time of the EPSP and also a large-amplitude action potential. We were able to successfully isolate single units by limiting our stimulus set to moderate- to high-level pure tones—levels that would increase the rate of coincident EPSPs and likely produce more higher-amplitude action potentials to rise above the neurophonic. Binaural beats (dichotic pure tones; see Acoustic stimulation) at 70 dB SPL were used to search for units and produced highly stereotyped oscillations in the neurophonic, unlike the erratic neurophonic fluctuations in response to noise. Spikes could then be identified visually as small, sharp deflections riding on top of the oscillating neurophonic (see Figs. 5A and 6F, insets). Spike events were recorded with a 1-μs resolution after crossing an adjustable voltage threshold chosen to capture the small spikes but avoid the neurophonic.

To gain confidence that threshold crossings were spike events and not simply transient fluctuations in the neurophonic, we performed analysis of the interspike intervals (ISIs) off-line. Units reported here all had gaps in the ISI histogram below 1 ms (see Fig. 5A, inset), characteristic of the refractory period of a single unit but not of multiunit or neurophonic recordings.

Electrolytic lesions were placed in all electrode tracks containing recording sites by passing 5 μA of anodal DC current through the electrode for 15 s. Lesions were placed directly at most recording sites. In those cases in which two or more sites were relatively near each other in the same track (≤300 μm), a lesion was placed at one site and the locations of the other sites were determined relative to it.


After each experiment a lethal dose of pentobarbital sodium (180 mg/kg ip) was administered. The gerbil was perfused intracardially with 0.12 M phosphate-buffered saline (PBS) followed by buffered 3.7% formaldehyde solution. After perfusion, the head was immersed in fixative for at least 24 h. The brain was then extracted and immersed in a solution of 30% sucrose in fixative until sinking (∼2 days). Coronal sections (50 μm) of the brain stem were sliced on a freezing microtome and transferred to slides. Because electrolytic lesions were much more visible before staining, unstained sections with lesions were traced onto transparent paper. Sections were then stained with cresyl violet and coverslipped. The MSO column was highly visible after staining; however, the lesions were difficult to identify unaided and usually looked like a lighter-stained patch in the Nissl background (Fig. 2A). Tracings from the unstained sections were superimposed over the stained sections to aid localization of recording sites.

Fig. 2.

Location of recording sites. A: coronal section of the ventral brain stem showing 2 electrolytic lesions at recording sites. Arrows point to tissue damaged by electrolytic lesions at the recording sites: one at the dorsal end of the MSO column and one in the medial dendritic field. Triangles mark tissue damage from the corresponding electrode tracks. Scale bar, 200 μm. B: composite map of all recording sites. Left and right recording sites were grouped into right-hand coronal sections along the rostrocaudal extent of the MSO as indicated in the parasagittal section at top right (arrow indicates trajectory of electrode). Sites within 200 μm of the center of the MSO column were classified as MSO (★, n = 30) and the rest as non-MSO superior olivary complex (SOC) (○, n = 28). LNTB, lateral nucleus of the trapezoid body; LSO, lateral superior olive; MNTB, medial nucleus of the trapezoid body; SPN, superior paraolivary nucleus; VNLL, ventral nucleus of the lateral lemniscus; VNTB, ventral nucleus of the trapezoid body.

Acoustic stimulation.

Acoustic stimuli were digitally generated and converted to analog signals by a synthesizer (Kaiser Instruments) controlled by stimulation software (MALab, Kaiser Instruments). The analog signal was attenuated (STAX) and transduced through electrostatic earspeakers (STAX Lambda), whose sound output was delivered through earpieces sealed to the ear canals. Before each experiment, the sound delivery system was calibrated under computer control for level from 100 Hz to 40 kHz and for phase from 100 Hz to 3 kHz with a previously calibrated probe tube and condenser microphone (Brüel and Kjaer, 4134). Specifically, a sequence of pure tones at different frequencies was presented at a fixed amplitude and phase and compared with the amplitude and phase of the waveform measured in the probe tube to determine correction values.

All aspects of sound stimuli were controlled dichotically. The search stimulus was a 1-Hz binaural beat, created by presenting a pure tone at fc − 0.5 Hz to the left ear and a pure tone at fc + 0.5 Hz to the right ear, where fc is the center frequency varied under user control from 0.05 to 2.5 kHz. The binaural beat is equivalent to a binaural pure tone of frequency fc undergoing continuous 1 cyc/s interaural phase shift. After a unit was isolated, the response to a binaural beat was collected at several frequencies at 60 or 70 dB SPL (10-s stimulus, 2-s silence, 4 repetitions, 10-ms on/off cosine ramp). The best frequency (BF) was determined as the frequency that produced the greatest spike count in response to a binaural beat at a sound level of 60 or 70 dB SPL. Responses to a binaural beat at BF were then collected for sound levels varied above and below 70 dB SPL. At low sound levels, the coherent, stereotyped oscillations of the neurophonic (described above) were reduced and became obscured by spontaneous, erratic fluctuations, preventing the choice of an event voltage threshold that could adequately segregate neurophonic events from spike events. Therefore, the CF (frequency at response threshold) could not be measured.

For each unit, responses were also collected to monaural pure tones at BF presented ipsilaterally and contralaterally at 70, 80, 90, and sometimes 100 dB SPL (100-ms stimulus, 200-ms silence, 100 repetitions, 3-ms on/off cosine ramp). Other stimuli were also presented to each unit, unrelated to the present study.

Data analysis.

The 1-Hz interaural phase difference (IPD) modulation of the binaural beat is sufficiently slow such that the beat period histogram of spikes (the dynamic IPD tuning function) was equivalent to the static IPD tuning function. Best phases (BPs, the mean phase) derived from responses to binaural beats were highly correlated with BPs derived from responses to stimuli with static IPDs (ρCC = 1.84, P < 0.001, n = 48, circular-circular correlation, fit line slope = 0.98). The first second of response to binaural beat stimulation was excluded from analysis to prevent transient rate adaptation from distorting the steady-state IPD function, although subsequent inclusion of these data produced no significant change in BP. The strength of IPD tuning was measured by the vector strength (Zar 1999). The first 10 ms of response to monaural stimulation was excluded from analysis because of the likely false triggering on large, transient upswings in the neurophonic at onset.

A binaural beat run was included in the analysis if the total spike count in the analyzed time window over all repetitions was >100 and the IPD tuning function was unimodal and reasonably symmetric and had a vector strength ≥ 0.2. Those runs that met these criteria had IPD tuning functions that failed the Rayleigh test of uniformity (P < 0.001) (Zar 1999). IPD tuning functions that were bimodal or asymmetric indicated multiunit contamination. Similarly, a monaural pure tone run was included if the spike rate was ≥3 spikes/s and the period histogram was unimodal and reasonably symmetric and failed the Rayleigh test of uniformity (P < 0.001). Peak binaural spike rate was determined as the highest spike rate across bins when the beat period histogram was computed with 18 bins.

The relation between BP and tone frequency of each unit was fit with a straight line by the least-squares method after weighting each data point by its spike count. Composite ITD tuning functions were calculated by averaging no less than four ITD tuning functions collected at equally spaced frequencies. The composite peak was chosen as the ITD of the highest peak of the composite ITD function. If a side peak was within 15% of the amplitude of the highest peak, both peaks were plotted as the composite peaks, as a choice of which peak was most “central” was ambiguous.

Correlation of linear variables was assessed by using Kendall's rank correlation τ, a nonparametric statistical test with τ taking values between −1 and 1. Correlation of circular variables was assessed with circular-circular correlation ρCC, a parametric statistical test with ρCC taking values between 0 and 2 (Batschelet 1981). Confidence intervals of BP were calculated assuming a von Mises distribution of circular values (Zar 1999). A test comparing the dispersions of two samples of circular values was performed with the statistic listed by Mardia and Jupp (2000).

Interaural cochlear delay model.

The phase-frequency relations of neurons were compared with the output of an interaural cochlear delay model similar to that of Bonham and Lewis (1999) (see Fig. 6A). First, a pure tone sound pressure waveform was synthesized at a given tone frequency (500-ms stimulus, 10-ms on cosine ramp, 70 dB SPL) and discretized at a sampling rate of 50 kHz. The sound pressure waveform was input into an AN model whose parameters had been fit to experimentally derived reverse correlation functions of low-frequency cat AN fibers (Tan and Carney 2003) (available online: The only free parameter of the AN model was its CF. The sound pressure waveform was input into ipsilateral and contralateral AN models with separately chosen CFs; the output of each AN model was a time-varying spike rate, r. The output of the MSO model was approximated by a cross-correlation of ipsilateral and contralateral rates: X(τ)=tbegintendrI(t+τ+τfixed)rC(t)dt where τfixed is a frequency-independent interaural delay and τ is the ITD. The cross-correlation was performed over the ITD values, τ = [−2, 2 ms], discretized at the sampling rate, with positive τ indicating contralateral-leading sound. The integration over time-varying rates was confined to a 50-ms segment near the end of the response in order to calculate the correlation at steady state. The ITD that maximized X(τ) was selected as the BD, and this value was transformed into the BP by dividing by the tone period. Altogether, there were three parameters in the interaural cochlear delay model: the ipsilateral and contralateral CFs of the AN modules and the frequency-independent interaural delay, τfixed. The parameters were adjusted to fit the modeled phase-frequency relation to measurements from individual MSO units. Either the ipsilateral or the contralateral CF was initially given a value near the BF of the neuron, τfixed was then adjusted to approximate the slope of the phase-frequency relation, and then the CF mismatch was adjusted to more precisely fit the relation. All parameters were then varied in small increments until the mean absolute residual between modeled and measured phase-frequency relations was minimized.

Biophysical MSO model with inhibition.

The effect of synaptic inhibition on interaural delay was predicted with a Hodgkin-Huxley-type, point-neuron MSO model as implemented previously (Day et al. 2008) and similar to the model in Brand et al. (2002). The ion channel composition and dynamics were based on a type II VCN model (Rothman and Manis 2003) with a fast membrane time constant and a low-threshold potassium conductance, similar to MSO (Scott et al. 2005). The following excitatory and inhibitory synaptic currents, IE(t,V)=gE(t)(VEE) and II(t,V)=gI(t)(VEI), were added to the current-balance equation, where gE(t) and gI(t) were the time-varying excitatory and inhibitory conductances, V was the membrane potential, and EE = 0 mV and EI = −90 mV were the excitatory and inhibitory reversal potentials, respectively. Miniature postsynaptic conductance events were modeled by an α-function: gmini(t)=g^exc,inhtτexc,inhexp(1tτexc,inh) where ĝexc and ĝinh were the excitatory and inhibitory maximum unitary conductances and τexc = 0.1 ms and τinh = 0.1 ms (or 0.4 ms) were the excitatory and inhibitory time constants, respectively. Synaptic input was modeled as the convergence of 32 independent excitatory fibers from each side and 32 contralateral inhibitory fibers, with each fiber firing in a time-varying Poisson-like manner at ∼160 spikes/s for 2 s, and phase-locked to an input frequency f, with a vector strength R = 0.9. The number of excitatory and inhibitory fibers was larger than the estimated minimum number necessary to evoke an action potential in MSO (Couchman et al. 2010); however, a larger number may be necessary to maintain sustained activity given the substantial short-term synaptic depression in MSO (Couchman et al. 2010). Phase-locking was achieved by choosing a synaptic event time in a given tone period as a Gaussian random number centered on zero with SD = sqrt(−2logR)/(2πf) (Mardia and Jupp 2000). Ipsilateral excitatory synaptic event times were shifted by the appropriate ITD (spaced every 20 μs spanning 1/f), then combined with the contralateral event times, and convolved with gmini(t) to produce gE(t). Similarly, inhibitory synaptic event times were shifted by a delay, ΔtI, relative to the timing of contralateral excitation, then convolved with gmini(t) to produce gI(t). The current-balance equation was numerically integrated by the forward Euler method at a time step of 10 μs. Halving the time step produced no noticeable difference in the voltage trace. Output spike times were chosen when the voltage trace crossed upward through a threshold of −30 mV.


We recorded from 91 ITD-sensitive units in the left and right brain stems of 15 anesthetized gerbils. ITD sensitivity was measured in response to 1-Hz binaural beats: dichotic pure tones presented to each ear with a 1-Hz frequency difference, equivalent to a binaural pure tone undergoing continuous 1 cyc/s interaural phase shift. We report data from units that displayed unimodal period histograms with respect to the period of the beat and with respect to the periods of the ipsilateral and contralateral tones, i.e., units that were both ITD sensitive and phase-locked (n = 68). Non-phase-locking units were infrequently encountered (n = 5).

The anatomical locations of all recording sites were identified with respect to electrolytic lesions placed at or near the recording site (Fig. 2A). Units localized to the dorsal or ventral nuclei of the lateral lemniscus (n = 10) were excluded from further analysis. Units within 200 μm of the center of the MSO column were classified as MSO (n = 30) and the rest as non-MSO SOC (n = 28) (Fig. 2B).

ISIs from units localized to the SOC were collected into histograms (see e.g., Fig. 5A, inset) to exclude multiunit spiking or triggering off the neurophonic, identifiable as a substantial amount of submillisecond ISIs, normally absent because of the refractory period of a single unit. The half-width of the depolarization associated with spike events (from the filtered neural signal) was always <0.5 ms (see, e.g., Fig. 5A, inset; Fig. 6F, inset); therefore any submillisecond ISIs associated with multiunit spiking or neurophonic triggering would be readily observable in an ISI histogram. Three MSO units were excluded for possible multiunit contamination because of their having >1% of their ISIs <1 ms. Altogether, the data reported here come from single units in the MSO (n = 27) and non-MSO SOC (n = 28).

Figure 3A shows the distribution of BFs across the population. As mentioned in methods, single unit isolation could not be maintained near threshold, and frequency tuning was measured at 60 or 70 dB SPL. This raises concern that some of our responses may have come from higher-CF units excited at high levels in their low-frequency tails (Joris et al. 1994b). For example, Yin and Chan (1990) reported one 2.3-kHz CF MSO neuron in the cat whose BF shifted to 800 Hz at higher levels (their Fig. 15c). We measured responses to binaural beats at BF at different sound levels; Fig. 3B shows that for about half (26/55) of the SOC population an ITD-sensitive response could be elicited below 60 dB SPL while still maintaining single unit isolation. The threshold of response in the low-frequency tails of high-CF MSO neurons is not known; however, an inspection of individual threshold frequency tuning functions of gerbil AN fibers (Schmiedt 1982, 1989) suggests that the 1-kHz thresholds of high-CF AN fibers (CF > 3 kHz) are similar to that of cat (Kiang and Moxon 1974), which in the extreme can be as low as 45 dB SPL. Therefore, we cannot rule out the possibility of high-CF units based on the lowest sound levels eliciting an uncontaminated single unit response. We will come back to the issue of high-CF units below with our results related to interaural cochlear delays.

Fig. 3.

A: best frequencies (BF) measured at 60 or 70 dB SPL (MSO: n = 27; non-MSO SOC: n = 28). B: lowest level at which an ITD-sensitive response could be elicited to a binaural beat at BF and still maintain single unit isolation.

In the cat, the CFs of MSO units are tonotopically arranged along the dorsoventral axis of MSO, with lowest CFs dorsal and highest CFs ventral (Guinan et al. 1972). In the gerbil, the sheet of MSO neurons viewed in the sagittal plane is tilted and extends from dorsorostral to ventrocaudal (see Fig. 2B, inset). All units localized to the MSO were concentrated in the dorsorostral half of the MSO (Fig. 2B) and had BFs ≤2 kHz. The absence of units localized to the ventrocaudal end of the MSO sheet may suggest that neurons located there are tuned to higher frequencies. There was a weak correlation between a unit's dorsoventral location and BF (Fig. 4; τ = −0.28, P = 0.06, n = 27, Kendall's rank correlation). Units with BFs below 700 Hz were all located at the dorsal end, while those above were scattered dorsally and ventrally. In our localization procedure, dorsoventral location was measured in histological sections with reference to a zero location at the dorsal tip of the MSO column—a reference that systematically shifts ventrally for progressively caudal sections (see Fig. 2B, inset). The ventral shifting of the zero reference may have biased the location of caudal units to values more dorsal than their absolute dorsoventral location, which would weaken the significance of correlation between BF and dorsoventral location. Despite this, the scatter in location of units above 700 Hz BF (Fig. 4) is consistent with a recent plot of the tonotopic map in cat MSO derived from axonal terminations (Karino et al. 2011; their Fig. 9).

Fig. 4.

Best frequency vs. dorsoventral location in the MSO column. Location 0 indicates the dorsal end of MSO (n = 27).

Frequency-dependent interaural delays.

The BD of an ITD tuning function is determined by an internal interaural delay between the paths of phase-locked excitatory inputs leading to a coincidence detector. Jeffress (1948) proposed that the internal interaural delay was formed by a difference between input ipsilateral and contralateral axonal conduction times. Such an internal delay is thought to be independent of frequency, i.e., the delay is the same regardless of the frequency content of the stimulus. For a coincidence detector neuron with a frequency-independent interaural delay, ITD tuning functions measured at different frequencies will have a common peak corresponding to the ITD that compensates for the internal delay by placing the excitatory inputs in phase. Figure 5, C and D, show data from a MSO neuron with ITD tuning functions at different frequencies having a common peak near −600 μs (negative ITD indicates an ipsilateral-leading sound). The ITD tuning function in response to a binaural beat is periodic; therefore ITD may be transformed into an equivalent IPD and BD to an equivalent BP. The information contained in multiple ITD tuning functions can be summarized by the BP-frequency relation (Yin and Kuwada 1983) (Fig. 5E), which for frequency-independent interaural delays between excitatory inputs will have the following three characteristics: the relation is linear; the y-axis intercept of the fit line, termed the characteristic phase (CP), is zero cycles, and the slope of the fit line, termed the characteristic delay (CD), is equal to the internal interaural delay. Neurons with these characteristics, such as in Fig. 5, are termed “peak type” because of their having a common ITD peak across frequency. Importantly, any phase-frequency relation that has a nonzero CP or is nonlinear indicates a frequency-dependent interaural delay (assuming bilateral excitatory inputs).

Fig. 5.

ITD tuning functions from a “peak-type” MSO neuron. A: raster plot showing spike responses to a binaural beat with 1.1-kHz center frequency (f). Duration of stimulus indicated by black bar. Inset, top: interspike interval (ISI) histogram showing no ISIs < 1 ms. Inset, bottom: voltage traces of 5 triggered spike events. Dotted line indicates the chosen voltage trigger, in this case a downsweep trigger. Bar, 1 ms. B: period histogram of spike responses wrapped into the 1-s beat period. The period histogram is the interaural phase difference (IPD) tuning function and may be transformed into the ITD tuning function by dividing IPD by the binaural beat center frequency. Arrow indicates best phase/best delay. Positive IPDs/ITDs indicate contralateral-leading sound. C: ITD tuning functions were obtained for frequencies between 0.9 and 1.7 kHz and overlaid (BF = 1 kHz). Tuning functions derived from binaural beats are periodic on the center frequency tone period. D: normalized ITD tuning functions show a common peak characteristic of a “peak-type” neuron. E: information from multiple ITD tuning functions is reduced to a best phase-frequency relation. For a peak-type neuron, the relation is linear with slope [characteristic delay (CD)] equal to the common peak and y-axis intercept [characteristic phase (CP)] of ∼0 cyc.

Based on an assumption of frequency-independent interaural delay, we expected all MSO neurons to be peak type. Surprisingly, we found many MSO neurons with decidedly non-peak-type phase-frequency relations. Figure 6, A and B, show data from a MSO neuron with a CP near 0.5 cyc. For a 0.5-cyc CP (or equivalently, −0.5 cyc), the overlaid ITD tuning functions at multiple frequencies align at a common trough instead of a common peak, termed a “trough-type” neuron. “Intermediate-type” neurons were also present in our sample (Fig. 6, C and D), with a CP between 0 and 0.5 cyc or between −0.5 and 0 cyc and a common ITD at a point along the tuning function slope. Several neurons displayed strongly nonlinear phase-frequency relations with a bulge in one direction, as in Fig. 6F. These nonlinear neurons were observed in three different animals, including two MSO neurons (BFs 1.6 and 2 kHz) and three non-MSO SOC neurons (BFs 0.65, 1.25, and 1.25 kHz). There was no clear division between neurons with linear and nonlinear phase-frequency relations; while some neurons had straight relations (Fig. 5E, Fig. 6, B and D) or dramatically nonlinear relations (Fig. 6F), many had small, reproducible nonlinearities, such as the relation in Fig. 6H from a peak-type MSO neuron (note the magnified y-axis range). We calculated a CP only for those neurons that passed a linear goodness-of-fit threshold (root-mean-squared residual < 0.044 cyc) chosen to exclude the most nonlinear relations, as in Fig. 6F. The distribution of CP for MSO units depended on BF (Fig. 7), with mainly peak-type neurons for BFs below 800 Hz and the full range of CPs above. The dispersion of MSO CPs below 800 Hz differed significantly from the dispersion of the remaining MSO CPs (P < 0.005). The CPs of non-MSO SOC units also took on a wide range of values at high BFs; however, the dependence of CP on BF was only clear in the MSO sample that contained a greater number of units at low BFs. The data in Fig. 7 do not lend themselves to a clear division between peak-type and trough-type neurons; if we arbitrarily choose CPs within ±0.1 cyc as indicative of peak-type behavior, then only 37% (10/27) of MSO neurons and 29% (8/28) of non-MSO SOC neurons fall within the range. Our evidence for non-peak-type neurons is consistent with previous SOC studies (Batra et al. 1997; Spitzer and Semple 1995) yet is demonstrated here for neurons specifically anatomically localized to the MSO.

Fig. 6.

Prevalence of neurons with non-peak-type phase-frequency relations. A, top: ITD tuning functions at multiple frequencies surrounding BF (1.6 kHz) of a trough-type MSO neuron. Bottom: same functions normalized by maximum rate. B: phase-frequency relation of the same neuron. y-Axis intercept of the straight-line fit to data (gray) is the CP. C and D: data from an intermediate-type MSO neuron (BF = 1.25 kHz). E and F: data from a MSO neuron (BF = 2 kHz) with an extremely nonlinear phase-frequency relation. Inset, spike waveforms. Dotted line indicates the chosen voltage trigger. Bar, 1 ms. G and H: data from a peak-type MSO neuron with a small, reproducible nonlinearity (BF = 1.6 kHz). Error bars included in all relations represent 95% confidence intervals.

Fig. 7.

Distribution of CP across best frequency for MSO neurons (○, n = 25) and non-MSO SOC neurons (▵, n = 21).

Trough-type phase-frequency relations have been shown to arise in the lateral superior olive (LSO) from the interaction of phase-locked, ipsilateral excitation and contralateral inhibition (Finlayson and Caspary 1991; Tollin and Yin 2005). For trough-type neurons, ITD tuning functions at multiple frequencies have a common trough corresponding to the ITD that minimizes firing rate by placing the excitatory and inhibitory inputs in phase. We considered the possibility that trough-type phase-frequency relations in our sample arose from a purely excitatory-inhibitory (EI) interaction. We recorded responses to 100-ms monaural tone pips at the BF of each neuron. Only two neurons indicated inhibition in their monaural responses by a suppression in spiking during stimulation and/or a postinhibitory rebound at stimulus offset (1 MSO, CP = −0.15 and 1 localized to the superior paraolivary nucleus, CP = 0.09). An excitatory monaural spiking response was evoked from at least one ear for all but one neuron (1 MSO). Monaural unresponsiveness or weak responses were common, likely because MSO functions as a binaural coincidence detector. For an EI mechanism, the binaural spike rate should always be less than the higher monaural spike rate since the addition of contralateral inhibition can only reduce the rate evoked by ipsilateral excitation. In our population, the maximal binaural spike rate (the rate at the best ITD) was greater than the monaural rate for all but three neurons (1 MSO neuron that had bilateral, excitatory monaural responses and 2 non-MSO SOC; Fig. 8). Note that the binaural data in Fig. 8 came from long-duration binaural beats while the monaural data came from short tone bursts over many repetitions. It is likely that monaural rates had not undergone as much rate adaptation as binaural rates; nevertheless, binaural rates still exceeded monaural rates in almost all cases. This implies that neurons in our sample received bilateral excitation, and that subthreshold excitation underlies those cases in which a neuron did not respond to sound presentation on one side. Interestingly, this included the 12 units localized to the LSO (see Fig. 2B), of which 7 had evoked spiking responses to both ipsilateral and contralateral sound and 4 had CPs within ±0.1 cyc (peak type).

Fig. 8.

Maximal binaural rate is greater than monaural rate. Monaural rate was derived from response to a BF pure tone. Binaural rate at the peak of the ITD tuning function was derived from a BF binaural beat at the same sound level as monaural. A: ipsilateral (MSO, ○: n = 12; non-MSO SOC, ▵: n = 25). B: contralateral (MSO: n = 23; non-MSO SOC: n = 17). Dashed line, identity line.

The absence of inhibitory indications in monaural responses does not exclude the presence of inhibitory inputs among bilateral excitation, and the presence of such inhibition is likely (Grothe and Sanes 1993). However, the evidence for bilateral excitation indicates that frequency-dependent interaural delays in our sample (including trough-type, intermediate-type, and nonlinear phase-frequency relations) do not arise from a purely EI interaction. We address the possible influence of inhibition concurrent with bilateral excitation below.

Interaural cochlear delays are consistent with frequency-dependent delays.

Bonham and Lewis (1999) demonstrated the ability of interaural cochlear delays to produce nonlinear phase-frequency relations through a simple MSO model based on cross-correlation of AN responses. We used a similar model to fit our phase-frequency data in the SOC. In our model, a pure tone sound pressure waveform was simultaneously input into ipsilateral and contralateral AN modules that were allowed to have different CFs corresponding to different distances along the basilar membrane (Fig. 9A). The output of the ipsilateral AN module was time-shifted by a frequency-independent interaural delay (e.g., an axonal conduction time difference, or perhaps the delay associated with inhibition). This frequency-independent delay allowed the inclusion of other mechanisms of internal delay, whereas the frequency-dependent behavior arose solely from the interaural cochlear delay. The time-varying ipsilateral and contralateral AN spike rate outputs (poststimulus time histograms) were cross-correlated at different ITDs to simulate coincidence detection at the MSO. The ITD that maximized the cross-correlation was divided by the tone frequency to yield the BP, and BPs were computed for each frequency.

Fig. 9.

Interaural cochlear delay model produces non-peak-type phase-frequency relations. A: interaural cochlear delay model. A pure tone was input into ipsilateral and contralateral auditory nerve (AN) models that may be mismatched in characteristic frequency (CF). The time-varying output rate on the ipsilateral side was optionally time-shifted by a frequency-independent delay. The 2 rates were cross-correlated at different IPDs. The IPD that maximized the cross-correlation (BP) was the final output. B, top: mean phase of the phase-locked output of the AN model for the case of 0.1-oct CF mismatch: ipsilateral CF = 1.5 kHz (solid line) and contralateral CF = 1.61 kHz (dashed line). Bottom: difference of AN mean phases converted into time shows that the delay produced by the CF mismatch is dependent on frequency. C: model phase-frequency relations for the case of no frequency-independent delay (pure cochlear delay). Ipsilateral CF was held constant in each panel, while contralateral CF was varied to produce CF mismatches of 0.15, 0.1, 0.05, 0, −0.05, −0.1, and −0.15 oct (top to bottom). Zero CF mismatch in each panel is in gray. Bold line indicates the same parameters as in B. D: model phase-frequency relations with a 200-μs frequency-independent delay. Line fits show the varied CP at the intersection with the y-axis. Ipsilateral CF was held at 1.6 kHz, while contralateral CF was varied to produce CF mismatches of 0.15, 0.1, … , −0.1 oct (top to bottom).

The basilar membrane is a dispersive medium, and different frequency components of the cochlear traveling wave have different latencies at a given location along the membrane (Robles and Ruggero 2001). In particular, recordings from the AN show that latency becomes longer for stimulus frequencies near the CF (Pfeiffer and Molnar 1970; van der Heijden and Joris 2003, 2006; Versteegh et al. 2011). Figure 9B demonstrates that the AN module used in our model exhibits a longer latency for stimulus frequencies near CF. Shown is the phase of the AN module response to various tone frequencies for two locations along the basilar membrane (two CFs near 1.5 kHz separated by 0.1 oct). The slope of this phase plot (the latency or group delay) increases near the CF. When the phase difference between the two locations is converted into a time delay (Fig. 9B, bottom), the frequency dependence of the interaural delay is evident. The frequency dependence arises because the latency associated with the AN input with the lower CF begins increasing at a lower stimulus frequency than the input with the higher CF. Therefore, a frequency-dependent interaural delay would result if ipsilateral and contralateral inputs originate from slightly different locations on the basilar membrane. Furthermore, the interaural cochlear delays possible from a small mismatch in CF may be large (Fig. 9B; Bonham and Lewis 1999; Joris et al. 2006) compared with the azimuth-relevant range of ITDs of the gerbil (±130 μs) (Maki and Furukawa 2005).

On the basis of interaural cochlear delays alone, model phase-frequency relations could be highly nonlinear (Fig. 9C), similar to the most extreme observed nonlinear relations (Fig. 6F). The phase-frequency relation became more nonlinear when the mismatch between CFs was increased. While modeled phase-frequency relations with pure interaural cochlear delay were highly nonlinear, adding an additional frequency-independent interaural delay and selecting a subset of frequencies surrounding the CFs created relations that were approximately linear but with a CP shifted away from zero to that of an intermediate- or trough-type neuron (Fig. 9D).

We fit the interaural cochlear delay model to the phase-frequency relations of 12 neurons (10 MSO and 2 LSO) that exhibited deviation from peak-type behavior. The three parameters of the model (ipsilateral and contralateral CFs, frequency-independent interaural delay) were adjusted to minimize the mean absolute residual of the fit. Fits to data were remarkably accurate, with most mean absolute residuals <0.015 cyc (Table 1). Figure 10, A and B, show fits to data from a trough-type MSO neuron (same as in Fig. 6, A and B) and an intermediate-type MSO neuron, respectively, which both had observable excitatory responses to ipsilateral and contralateral stimulation. To visually demonstrate the advantage of fitting data with a model that incorporates the frequency-dependent interaural cochlear delay, we also plotted in all panels the best fit assuming only frequency-independent delay (i.e., a best-fit line with zero CP). Figure 10, C and D, show fits to two MSO phase-frequency relations exhibiting large nonlinearities (data in Fig. 10D same as in Fig. 6, E and F). Figure 10, E and F, show fits to two phase-frequency relations with smaller nonlinearities (note the magnified y-axis range). Both of these MSO neurons (CP = −0.29 and 0.06 cyc, respectively) had relations that were reasonably linear; however, the interaural cochlear delay model captured the subtle nonlinearities. This suggests that interaural cochlear delays may underlie small nonlinearities often apparent in peak-type phase-frequency relations, such as Fig. 10F.

View this table:
Table 1.

Fit parameters of interaural cochlear delay model

Fig. 10.

Interaural cochlear delay model fits to data. A: phase-frequency relation of a trough-type MSO neuron (gray line; same unit as Fig. 6, A and B; BF = 1.6 kHz) and its fit with the interaural cochlear delay model (●). Included in all panels is the best-fit line assuming a CP of zero (dotted line). B: fit to data from an intermediate-type MSO neuron (BF = 1.55 kHz). C and D: fits to data from 2 MSO neurons with strongly nonlinear phase-frequency relations (BFs = 1.6 and 2 kHz). E and F: fits to data from 2 MSO neurons with small, reproducible nonlinearities in the phase-frequency relations (note the magnified y-axis range; BFs = 1.6 and 1.6 kHz). Error bars in all panels indicate 95% confidence intervals of the BP data.

A previous experimental study showed that the interaural delay created by a CF mismatch of cross-correlated AN fiber responses decreased with CF (Joris et al. 2006), consistent with the dependence of latency on log CF being steeper for low-CF AN fibers than for high-CF fibers [e.g., Versteegh et al. (2011) for gerbil]. Our model reproduced this dependence of interaural cochlear delay on CF. For an ipsilateral CF of 200 Hz, a 0.05-oct CF mismatch (CFcontra − CFipsi) created a modeled maximal interaural delay of 250 μs. The same CF mismatch (in octaves, approximately the same mismatch in cochlear distance) at an ipsilateral CF of 1.5 kHz created a maximal interaural delay of 91 μs. The interaural time delay caused by a fixed CF mismatch decreases with CF; however, the delay in terms of phase of the tone period increases with CF, as noted by Joris et al. (2006). This can be seen in Fig. 9C, where the same CF mismatch (in octaves) produces slightly greater phase delays at higher CFs. The decreased ability of interaural cochlear delay to cause large phase shifts at low CFs may explain the predominance of peak-type CPs at low BFs in our sample (Fig. 7). Interaural cochlear delays may be present (and even larger) at low CFs or BFs but not produce significant deviations in the phase-frequency relation from peak-type behavior.

Model-estimated monaural CFs could not be compared with measured monaural CFs because single unit isolation could not be maintained at threshold sound levels (see methods), nor was monaural spiking prevalent enough at high sound levels off the BF to construct adequate isolevel frequency tuning functions while maintaining unit isolation. Most CF mismatches estimated in our model (as small as a 30-Hz mismatch at 1.4 kHz) would be difficult to observe in a comparison of noisy, experimentally derived monaural frequency tuning functions. The fits in Fig. 10 demonstrate that the interaural cochlear delay model can quantitatively account for some observed phase-frequency relations with high precision by using model CFs near the measured BF (see Table 1). However, an evaluation of recent data from the gerbil AN suggests that interaural cochlear delays may also occur far from the CF. Versteegh et al. (2011) reported phase plots for gerbil AN fibers with high CFs (≥2 kHz) which had increased latencies at the CF and separately at frequencies below 1 kHz. Furthermore, two example fibers (their Fig. 8) with CFs near 2 kHz had BFs that shifted below 1 kHz as level was increased to ∼60 dB SPL, i.e., into the off-CF range where there was another increase in latency. Given that BFs in our sample were measured at 60–70 dB SPL, it is possible that some of the frequency-dependent delays observed in our sample below 1 kHz came from CF mismatches of neurons with CFs > 2 kHz. The AN module used in our interaural cochlear model only exhibits an increase in latency near CF and would not accurately model an interaural cochlear delay occurring far from CF. However, the majority of frequency-dependent delays in our sample occurred for stimulus frequencies above 1 kHz (Fig. 7 and Fig. 10), where there is no evidence of an additional region of increased latency for high-CF AN fibers.

Inhibition and frequency-dependent interaural delay.

Besides interaural cochlear delay, the other mechanism that has been proposed to account for non-peak-type phase-frequency relations in MSO neurons that exhibit bilateral excitation is the interaction of fast, phasic inhibition with bilateral excitation (Batra et al. 1997; Leibold 2010). Glycinergic inhibition has been demonstrated as a mechanism of interaural delay (Brand et al. 2002; Pecka et al. 2008); however, the manner in which inhibition has been modeled to account for delay (i.e., short duration synaptic inhibition) has been contested (Zhou et al. 2005). We examined the ability of a biophysical MSO model incorporating fast synaptic contralateral inhibition [and similar to the model in Brand et al. (2002)] to produce frequency-dependent interaural delays. Specifically, both excitatory and inhibitory miniature postsynaptic conductances were modeled as α-functions with fast 0.1-ms time constants.

As shown in Fig. 11A, increasing the strength of synaptic inhibition shifted the ITD tuning function in the contralateral-leading direction for a 1-kHz input frequency when inhibition led the contralateral excitation by 200 μs [compare to Fig. 4A in Brand et al. (2002)]. The model also predicted that fast inhibition could shift the ITD tuning function in different directions for different values of delay between contralateral inhibition and contralateral excitation (Fig. 11B). Interestingly, at some inhibitory delays the firing rate exceeded the excitatory-only case, possibly because of postinhibitory facilitation (Fig. 11B). We computed phase-frequency relations of the model from 1.2 to 1.8 kHz, where non-peak-type behavior was often observed (e.g., Fig. 10), while varying the strength and relative timing of synaptic inhibition. When inhibition led the contralateral excitation by 200 μs, increasing the strength of inhibition systematically increased the CP while maintaining a linear phase-frequency relation (Fig. 11C). When the relative timing of inhibition was varied, the phase-frequency relation also remained linear and the CP could take on the full range of possible values (Fig. 11D). The addition of a frequency-independent interaural delay, such as from axonal conduction time difference, would change the CD of the phase-frequency relation while maintaining the same CP. Therefore, it is likely that with three parameters (inhibitory strength and timing, frequency-independent delay) the fast inhibition model might reasonably fit the linear, non-peak-type phase-frequency relations in our data set. This would include CPs as large as ∼0.5, as shown in Fig. 11, while a previous linear model including inhibition only fit relations with CPs ≤ 0.25 (Leibold 2010). Our fast inhibition model did not predict any nonlinearities in the phase-frequency relation, such as in Fig. 10, C–F.

Fig. 11.

Modeled synaptic inhibition and interaural delay. A: synaptic inhibition shifts the ITD tuning function in the contralateral-leading direction when inhibition leads contralateral excitation by 200 μs. Same result as in Brand et al. (2002). ĝE and ĝI are the unitary excitatory and inhibitory peak synaptic conductances (in nS), with ĝE = 5 nS. B: synaptic inhibition shifts the ITD tuning function with the peak ITD dependent on the timing of inhibition relative to contralateral excitation, ΔtI (in μs, positive indicates inhibition leading excitation). ĝE = 8 nS and ĝI = 24 nS. C: with a fast inhibitory synaptic time constant τinh, inhibition creates frequency-dependent interaural delay, shifting the phase-frequency relation to greater CPs (triangles) with increased synaptic strength. ĝE = 8 nS and ΔtI = +200 μs. D: altering the relative timing of inhibition shifts the phase-frequency relation to span the range of CPs. ĝE = 8 nS and ĝI = 24 nS. E: with a slower inhibitory synaptic time constant, the ability of inhibition to shift the phase-frequency relation is eliminated. ĝE = 8 nS and ĝI = 8 nS. F: comparison of the time courses of the synaptic inhibitory conductances used in our model and the time course of synaptic inhibition derived experimentally (Magnusson et al. 2005). The decay time of the experimentally reported inhibition is much longer than the decay times used in our model.

The above results were based on an extremely fast inhibitory synaptic time constant. When we increased the inhibitory synaptic time constant to 0.4 ms, the ability of inhibition to create nonzero CPs was eliminated. CPs remained within ± 0.1 cyc regardless of the relative timing of inhibition (Fig. 11E) or the strength of inhibition (ĝI could not be increased much larger than ĝE without reducing the model output to minimum because of the greater effective inhibition at τinh = 0.4 ms). Since the publication of the fast inhibition model in Brand et al. (2002), the time course of synaptic inhibition onto gerbil MSO has been reported. Magnusson et al. (2005) measured miniature inhibitory postsynaptic currents under voltage clamp and reported an average rise time of 0.34 ms and exponential decay time constant of 2.5 ms. We plot this synaptic time course along with the 0.1- and 0.4-ms α-function synaptic conductances used in our model in Fig. 11F. The 0.4-ms α-function has a faster rise time and a shorter decay time (0.7 ms) than the experimentally derived synaptic input yet was still not able to create frequency-dependent interaural delays. Therefore, a biophysical model of MSO incorporating inhibition with realistic synaptic dynamics would not explain the presence of frequency-dependent interaural delays in our data.

Distribution of interaural delays.

Mechanisms of internal interaural delay such as axonal conduction differences, inhibition, and cochlear delays ultimately shape the distribution of BDs. An interesting observation in the IC (the main recipient of output from the MSO) is that the distribution of BDs in response to noise across CF is overwhelmingly contralateral leading and scattered below the “π-limit”—a boundary equal to half the period of the CF (Hancock and Delgutte 2004; Joris et al. 2006; McAlpine et al. 2001).

Figure 12A shows a scatterplot of BDs in response to pure tones at BF against BF for units in our MSO sample. For binaural beat stimuli, as used here, the ITD function is periodic and the BD is chosen as the peak closest to zero ITD, which is necessarily inside the “π-limit” (in this case half the period of the BF). A comparison to the distribution in IC is limited because of our use of tone stimuli at BF instead of noise and the possibility that the BF measured here is not the same as the CF. Nonetheless, it is interesting to note the absence of negative (ipsilateral leading) BDs below 1.4 kHz in our data. Random interaural cochlear delays alone would predict BDs scattered across negative and positive delays. However, our model fit data well by incorporating a frequency-independent delay in addition to interaural cochlear delay; a contralateral bias of BDs could exist with random interaural cochlear delays if the mechanisms underlying the frequency-independent delays were biased toward contralateral delays.

Fig. 12.

Distribution of best delays (BD). A: BD (measured at the BF) plotted against BF for all MSO units (n = 27). BDs are derived from periodic ITD tuning functions and are necessarily contained within the π-limit (half a tone period from zero ITD; black curves). Shaded area is azimuth-relevant ITD range. Symbols indicate type of phase-frequency relation: peak-type (circles), intermediate-type (squares), trough-type (triangles), and nonlinear or indeterminate (asterisks). B: same distribution of MSO BDs as in A plotted along with the BD-BF distribution from Pecka et al. (2008; their Supplemental Fig. 2A).

Surprisingly, BDs from units with BF above 1.4 kHz were scattered across positive and negative delays, with a majority of negative delays. Joris (2003) found that cat IC units with CFs between 1 and 3 kHz could exhibit ITD tuning to both the fine structure and envelope of noise. It is possible that our units with BF above 1.4 kHz exhibit this kind of ITD tuning, i.e., the BDs derived from tones would not be the same as from noise, which would include envelope ITD tuning. In Fig. 12B, we combine our data with data extracted from Pecka et al. (2008; their Supplemental Fig. 2A), who also measured BD for tones at BF in the gerbil MSO (their reported CFs ranging from 165 to 4,800 Hz). The combined plot, for BFs below 1.4 kHz, shows BDs scattered between the upper π-limit and zero ITD, similar to the IC plots of BD versus CF.

As mentioned in methods, spike isolation could not be maintained with noise stimulation to collect traditional noise-delay functions. As an alternative, we constructed composite ITD tuning functions: an average of pure tone ITD tuning functions collected at several equally spaced frequencies (Fig. 13, A and C). Composite functions have been shown to be similar to noise-delay functions (Yin and Chan 1990; Yin et al. 1986). Plotted in Fig. 13E is the distribution of composite peak ITDs across BF for units in our MSO sample. All trough-type and many intermediate-type units had composite functions in which the central peak was ambiguous (Fig. 13C). In such cases, we plotted both peaks (Fig. 13E)—all were associated with trough-type, intermediate-type, or nonlinear units. All but one of the units with unambiguous peaks had peaks within the π-limits. The exception was a peak-type unit with a large, negative BD whose ITD tuning function is presented in Fig. 5. A comparison of Fig. 13E with Fig. 12A reveals that composite peaks were generally located at the same delay as BDs measured at BF. This is unsurprising given the ITD function at BF elicits the greatest spike rate and therefore contributes the most to the composite. Similar to Fig. 12A, there was an absence of peaks with negative delays within the π-limit for BFs below 1.4 kHz. Regardless of whether the BFs of units in our sample matched their actual CFs, it is clear that units that elicit strong responses below 1.4 kHz prefer contralateral-leading (positive) delays at these frequencies. Composite peaks of units with BF above 1.4 kHz were scattered across positive and negative delays. Composite tuning functions only capture fine-structure ITD tuning; therefore if these units (or others in our sample) were also tuned to envelope ITDs, the envelope component would not be represented in their composite functions.

Fig. 13.

Composite ITD tuning functions. A: ITD tuning functions at different pure tone frequencies (top) and composite function (bottom) for a MSO unit (BF = 0.65 kHz). Central peak of composite is unambiguous. B: phase-frequency relation of the unit in A; CP in cycles and CD in microseconds. C and D: same layout as in A and B for a different MSO unit (BF = 0.8 kHz). Central composite peak is ambiguous. E: composite peak plotted against BF. ○, Unambiguous peaks; X, ambiguous peaks (both peaks plotted).


We showed that a majority of neurons anatomically localized to the gerbil MSO exhibited frequency-dependent interaural delays (i.e., non-peak-type phase-frequency relations arising from neurons with bilateral excitation). We further demonstrated that an excitatory coincidence detection model incorporating interaural cochlear delay (in addition to frequency-independent delay) was sufficient to reproduce observed frequency-dependent delays, while a biophysically realistic model of inhibition was not. Several previous studies of the SOC have shown the existence of frequency-dependent interaural delays in the form of non-peak-type CPs (i.e., 0.1 > CP > 0.9) (Batra et al. 1997; Pecka et al. 2008; Spitzer and Semple 1995; Yin and Chan 1990), yet data have been interpreted as indicative of peak-type behavior since CP population histograms generally have a mode near zero. Previous studies of neurons anatomically localized to the MSO have shown a larger concentration of peak-type neurons than the present study (Pecka et al. 2008; Yin and Chan 1990), likely because most neurons in these studies had BFs below 1 kHz—frequencies at which interaural cochlear delays may be large but create smaller shifts in phase. Interaural cochlear delay effects fed-forward from the MSO may also explain some of the non-peak-type behavior prevalent in the IC (Yin and Kuwada 1983), although some of this behavior is likely due to convergence of ITD-sensitive inputs with different BDs (McAlpine et al. 1998).

Pentobarbital, one of the anesthetics used in the present study, has been shown to potentiate the current of the glycine receptor (Daniels and Roberts 1998). Blockage of inhibitory glycinergic transmission at MSO neurons has been demonstrated to alter the BD (Brand et al. 2002; Pecka et al. 2008); therefore it is possible that pentobarbital altered some BDs in the present study. However, in a study of the unanesthetized rabbit SOC, Batra, Kuwada, and Fitzpatrick (1997) found many clear examples of intermediate-type, trough-type, and nonlinear phase-frequency relations—i.e., these non-peak-type relations occur regardless of any possible alteration of inhibition.

Batra et al. (1997) concluded that trough-type neurons in the rabbit came from a low-frequency area of the LSO; however, the precise location of their neurons within the SOC was indeterminate because of the use of an awake animal preparation. Similar to our study, they found almost no evidence of an exclusive EI interaction in ITD-sensitive units. It is likely that some trough-type units in the Batra et al. study assumed to be from the LSO were actually from the MSO. The assumed dichotomy of peak-type neurons in the MSO and trough-type neurons in the LSO has endured, especially since low-frequency, trough-type phase-frequency relations derived from an EI interaction have been reported in the LSO (Tollin and Yin 2005). One consequence of the assumed dichotomy is that neurons that do not exhibit peak-type phase-frequency relations may be excluded from a sample because their response is not considered to come from the MSO (Seidl and Grothe 2005).

Studies in the barn owl have provided some evidence against the existence of interaural cochlear delays (Fischer and Pena 2009; Pena et al. 2001), but the specific evidence has been against an exclusive model of interaural cochlear delay, e.g., measured CF mismatches in barn owl nucleus laminaris neurons were not predictive of the direction of interaural delay. In contrast, our model assumes both cochlear and frequency-independent delays that may or may not be in the same direction. For example, one of our units whose data was fit to the interaural cochlear delay model had a modeled CF mismatch (−0.055 oct) opposite in direction to the modeled frequency-independent delay (525 μs) and opposite to the observed composite peak delay (439 μs). Interaural cochlear delays shifting in a direction opposite to other mechanisms of internal delay would be expected if CF mismatches resulted from random imprecision in the tonotopic matching of afferents onto a target MSO neuron. Nonetheless, interaural cochlear delays may not be significant in barn owls, which are unique in that they extract ITDs at comparatively high frequencies (3–8 kHz) (Pena et al. 2001). In the barn owl AN (and also in mammals), the dependence of latency on CF is steeper for low CFs than for high CFs (Koppl 1997), and at high CFs in the barn owl range the time needed to traverse a fixed cochlear distance may be small compared with the azimuth-relevant ITD range (Pena et al. 2001), while the time needed to traverse the same cochlear distance at lower CFs would be substantial. CF mismatches in birds that use low-frequency ITDs, such as the chicken (and perhaps the small, low-CF range of barn owl), may produce sizeable interaural cochlear delays compared with their azimuth-relevant ITD range. Indeed, a study in chicken (Koppl and Carr 2008) showed that nearly half of nucleus laminaris recordings had CF mismatches greater than 50 Hz and a majority of recordings with non-peak-type CPs (0.1 < CP < 0.9), indicative of interaural cochlear delay.

Our interaural cochlear delay model used an AN module that exhibited an increase in latency (group delay) near the CF. A recent study on responses of low-CF AN fibers in the gerbil showed that phase plots could be complex, with an additional increase in latency at frequencies below CF (Versteegh et al. 2011). It would be straightforward to look at the difference of phase plots between AN fibers or spherical bushy cells of the AVCN as a simulation of interaural delay, similar to our Fig. 9B, to search for indications of frequency-dependent interaural delays at frequencies further from CF as well as near CF. One interesting possibility is that interaural cochlear delays could occur without CF mismatches: differences in the shape of the phase plots at a similar location along the cochlea in each ear could create frequency-dependent interaural delays, even though the CFs would be the same.

In this study, we show how certain aspects of data from the MSO can be qualitatively and quantitatively accounted for by interaural cochlear delay. However, direct evidence of interaural cochlear delays in the form of mismatches between the ipsilateral and contralateral CFs of MSO neurons has yet to be reported. The general paucity of data on the MSO compared with other auditory brain areas is due to the difficulty in isolating MSO units by extracellular recording techniques. Recent reports of successful MSO recordings using in vivo juxtacellular techniques (Van der Heijden et al. 2011) may provide the opportunity to record responses to difficult stimuli, e.g., monaural pure tone frequency thresholds and noise-delay functions. Small CF mismatches would likely be difficult to discern at low frequencies where threshold frequency tuning is broad and the exact position of the CF is questionable. However, an important test of the interaural cochlear delay model would be in the case of a relatively large CF mismatch; the measured CF mismatch could be compared with the model CF mismatch after fitting the model to the measured phase-frequency relation.


No conflicts of interest, financial or otherwise, are declared by the author(s).


We thank B. Delgutte and K. Hancock for a critical reading of the manuscript and D. Polley for a critical reading of an earlier version.


View Abstract