Sound localization in echoic conditions depends on a precedence effect (PE), in which the first arriving sound dominates the perceived location of later reflections. Previous studies have demonstrated neurophysiological correlates of the PE in several species, but the underlying mechanisms remain unknown. The present study documents responses of space-specific neurons in the barn owl's inferior colliculus (IC) to stimuli simulating direct sounds and reflections that overlap in time at the listener's ears. Responses to 100-ms noises with lead-lag delays from 1 to 100 ms were recorded from neurons in the space-mapped subdivisions of IC in anesthetized owls (N2O/isofluorane). Responses to a target located at a unit's best location were usually suppressed by a masker located outside the excitatory portion of the spatial receptive field. The least spatially selective units exhibited temporally symmetric effects, in that the amount of suppression was the same whether the masker led or lagged. Such effects mirror the alteration of localization cues caused by acoustic superposition of leading and lagging sounds. In more spatially selective units, the suppression was often temporally asymmetric, being more pronounced when the masker led. The masker often evoked small changes in spatial tuning that were not related to the magnitude of suppressive effects. The association of temporally asymmetric suppression with spatial selectivity suggests that this property emerges within IC, and not at earlier stages of auditory processing. Asymmetric suppression reduces the ability of highly spatially selective neurons to encode the location of lagging sounds, providing a possible basis for the PE.
Localizing sounds in a natural environment requires an ability to deal with spurious directional information conveyed by acoustic reflections, or echoes. Our ability to do so is thought to depend on a precedence effect (PE), whereby the first arriving sound dominates perception of later arriving reflections (Haas 1951; Hartmann 1983; Wallach et al. 1949). Recent human psychophysical findings (Litovsky and Shinn-Cunningham 2001) suggest that the PE encompasses at least 2 distinct phenomena. At delays of a few ms, subjects experience fusion of leading and lagging sounds into a single perceptual event (reviewed in Blauert 1997). Over a wider range of delays, subjects experience localization dominance, in which leading and lagging sounds are localized to a position near the leading source (Litovsky and Shinn-Cunningham 2001; Wallach et al. 1949). Concurrently, subjects experience an impaired ability to detect changes in the location of the lagging source, and a smaller effect for the leading source (Litovsky and Macmillan 1994; Litovsky and Shinn-Cunningham 2001; Perrott et al. 1989; Zurek 1980), which have been termed discrimination suppression (Litovsky et al. 1999). The difference in time courses suggests that perceptual fusion results from different mechanisms than those responsible for localization dominance and discrimination suppression.
Behavioral studies have provided evidence of localization-dependent precedence phenomena in several animal species, including barn owls. Several animal studies have measured lateralization of sources placed symmetrically about the midline (Cranford 1982; Keller and Takahashi 1996b; Kelly 1974; Wyttenbach and Hoy 1993). At short delays, lateralization judgments correspond to the side of the leading sound. As the delay is increased, judgments become evenly distributed on the 2 sides, suggesting that the lagging sound becomes separately localizable. This conclusion is supported by results of a recent study measuring localization of paired sources using eye movements in cats (Tollin and Yin 2003). At lead-lag delays from 400 μs to 10 ms, subjects oriented toward the leading sound. At longer delays, subjects localized the lagging sound on some trials. Finally, a recent study from our laboratory demonstrated spatial discrimination suppression in barn owls (Spitzer et al. 2003). As in humans, leading sounds had a large effect on the ability to detect chances in the location of lagging sounds, and lagging sounds had a smaller effect on spatial acuity for leading sounds.
The neuronal basis of the PE is not well understood. Studies of central auditory structures in a variety of species have demonstrated that leading sounds suppress responses of spatially sensitive neurons to lagging sounds, providing a neuronal correlate of the PE (e.g., Fitzpatrick et al. 1995; Keller and Takahashi 1996b; Mickey and Middlebrooks 2001; Yin 1994). Although the PE occurs for a wide range of signals, including speech and continuous noise, most previous physiological studies, including that in the barn owl (Keller and Takahashi 1996b), have focused primarily on neuronal responses to clicks or sounds with durations of a few ms. The use of transient stimuli potentially offers 2 major advantages: 1) the ability to separate neuronal responses to leading and lagging sounds, and 2) the ability to clearly visualize neuronal interactions in the absence of confounding effects of acoustic superposition of leading and lagging sounds. (In practice, these advantages may be compromised by the response times of the acoustic transducers and peripheral auditory filters.) In natural listening situations, however, the delay between the primary signal and its reflection will often be shorter than the signal duration, resulting in substantial temporal overlap of leading and lagging waveforms. The resulting acoustic superposition of leading and lagging sounds at the subject's ear causes degradation of the directional cues to each source. Consequently, it is expected that effects of the leading sound on responses of spatially sensitive neurons to the lagging sound will be confounded by additional masking effects when the sounds overlap. Such effects would not have been apparent in previous studies using shorter stimuli. To be generally applicable to the variety of sounds and reverberant conditions encountered in natural environments, an understanding of the neuronal mechanisms of the PE must therefore extend to situations in which the sound duration is longer than the lag delay. As a step in this direction, the present study documents neuronal responses to pairs of leading and lagging sounds with durations of 50 and 100 ms, and lead-lag delays ranging from 1 to 200 ms.
The physiological mechanisms of lead-evoked suppression remain controversial. Several authors have proposed inhibitory mechanisms to explain both behavioral (Harris et al. 1963; Lindemann 1986; Zurek 1987) and neuronal (Fitzpatrick et al. 1995; Yin 1994) effects. In mammals, the spatial dependence of lead effects suggests that a substantial component of these interactions may be mediated by inhibitory processes in binaural brain stem nuclei (Litovsky and Delgutte 2002). On the other hand, recent modeling studies have demonstrated that, particularly at low frequencies, both behavioral and neurophysiological effects occurring at delays of a few milliseconds could result from interactions of leading and lagging sounds at early stages of auditory processing that do not involve neuronal inhibition (Hartung and Trahiotis 2001; Tollin 1998; Trahiotis and Hartung 2002). If such mechanisms were responsible for the suppression of neuronal responses to lagging sounds, this effect should be evident at the initial site of binaural interaction and at all subsequent processing stages. To elucidate the neuronal mechanisms of lead-evoked suppression it will be necessary to determine the site at which such interactions first become apparent within the ascending auditory pathways.
The lateral subdivisions of the barn owl's IC contain a neuronal map of auditory space. Within these structures, single-peaked auditory spatial receptive fields (SRFs) are generated through the combination of inputs from binaural neurons with highly ambiguous spatial tuning (Konishi 2003). This process seems to involve a gradual series of processing stages (Mazer 1995), resulting in a continuous distribution of spatial selectivity within the lateral shell subdivision of the IC core (ICc-ls) and the external nucleus of IC (ICx). In the present study, we examined the distribution of lead-dependent spatial masking effects, which are functionally analogous to the lead-evoked suppression observed in previous studies, among a population of IC neurons at varying levels of spatial processing. The results demonstrated an association between lead-dependent effects and spatial selectivity, suggesting that a neuronal correlate of the PE is generated in parallel with refinement of spatial selectivity within the barn owl's IC.
All procedures conformed to National Institutes of Health guidelines for care and use of laboratory animals and were approved by the Institutional Animal Care and Use Committee of the University of Oregon. The subjects were 3 adult barn owls (Tyto alba) from a captive breeding colony at the University of Oregon. Experiments were conducted using a chronic preparation for recording in anesthetized owls that was described previously (Euston and Takahashi 2002). Before use in experiments, each owl had a head plate and 2 recording wells attached to its skull under isofluorane anesthesia. After recovery from surgery, the 3 subjects were returned to a flight cage in an owl colony where they were housed together for the duration of their experimental use. For neurophysiological recording, anesthesia was induced by intramuscular injection of ketamine (22 mg/kg) and valium (5.6 mg/kg), and maintained with N2O/O2 (25 to 40%) supplemented by isofluorane (0.125 to 1%), as needed. Recording sessions had a maximum duration of 12 h. Each subject was used in several recording sessions (Owl 719: 21 sessions; Owl 883: 6 sessions; Owl 916: 8 sessions), with a minimum recovery period of 10 days between sessions.
Neurophysiological recordings were conducted in a sound-attenuating chamber (IAC). The owl's head was stabilized by a holder attached to the chronically implanted headplate and its body supported by a heating pad. Before each recording session, the lid of one recording well was removed and the interior of the well was cleaned with a 0.25% mixture of chlorhexidine in sterile saline. In the first session, the portion of the skull underlying each recording well was excised to permit introduction of recording electrodes. Single-unit recordings were obtained using glass-coated tungsten microelectrodes with impedances at 1 kHz from 1 to 12 MΩ and exposed tip lengths of 5 to 20 μm. Electrodes were introduced through the forebrain and advanced ventrally toward the stereotaxic coordinates of the auditory midbrain using a stepping motor microdrive (μD-500, Power Technologies). The signal recorded by the electrode was fed to an oscilloscope and audio monitor to permit detection of stimulus-evoked activity. An interactive graphical user interface (BCLab running in Matlab v. 5.3, The Mathworks) allowed the experimenters to select virtual stimulus locations while searching for responsive units. Single-unit action potentials were isolated either by level triggering or through the use of a template matching spike sorter (Alpha-Omega MSD). Typical recording sessions involved 1 to 3 electrode penetrations. At the end of the recording session the well was rinsed with sterile saline and the lid was replaced. After recording sessions the owl was kept in an isolated recovery chamber until it recovered from anesthesia, at which point it was returned to its flight cage.
Virtual auditory space (VAS) stimuli were generated as described previously (Keller et al. 1998) using each subject's own head related transfer functions (HRTFs). HRTFs were band-pass filtered between 2 and 12 kHz, converted from frequency to time domain representations by inverse Fourier transformation (30-kHz sampling rate), and stored digitally as 255-point (8.5-ms) finite impulse response filters. Two sets of binaural HRTF measurements were obtained for each subject. The first set sampled 617 locations spanning the frontal hemifield, at a spacing of 5° in azimuth and elevation in double polar coordinates. The second set sampled the following regions of space in 1° increments: −20 to 20° azimuth at elevations −20, −10, 10, and 20° at 0° elevation; −40 to 40° elevation at 0° azimuth.
During physiological recording, sounds were presented using a dichotic delivery system with foam insert earphones (model ER-1, Etymotic Research, Elk Grove Village, IL). Stimulus waveforms were generated digitally, and typically consisted of broadband noises with flat (±1 dB) spectra from 2 to 12 kHz, durations of 50 or 100 ms, and 2.5-ms cosine on and off ramps. VAS stimuli were generated by real-time convolution of the stimulus waveform with the HRTFs for the appropriate ears and location (PD1 Power DAC, Tucker Davis Technologies, Gainesville, FL). To generate combinations of leading and lagging sounds, a sequence of zeros, with length corresponding to the lag delay, was concatenated to the end (leading sound) or beginning (lagging sound) of a single-noise waveform. The leading and lagging waveforms were then convolved with the HRTFs for the appropriate locations, and the filtered waveforms were added. Digitally processed waveforms were converted to analog voltage at 30-kHz sampling rate (PD1, Tucker Davis), attenuated (PA4, Tucker Davis) and amplified (HB6, Tucker Davis) before earphone presentation. All stimuli were presented at 52 dB SPLA, which was typically 25 to 35 dB above the response threshold of space-specific neurons, measured at their best locations.
An initial test was performed to characterize the auditory spatial tuning of each isolated unit. Sounds were presented from a set of 292 virtual locations, arranged in a checkerboard pattern to sample the entire frontal hemifield at a spacing of 10° in azimuth and elevation. Noise pips of 50 ms were presented with an interstimulus interval of 250 ms. The stimulus set was presented in 2 to 5 repetitions. In this, and all subsequent tests, the order of stimulus presentation was randomized for each repetition.
“Spatial response profiles” (SRPs) were generated by plotting the response (spikes per stimulus recorded in a time window equal to the stimulus duration, delayed by the unit's response latency), as a function of stimulus azimuth and elevation. Stimulus onset was delayed relative to the start of data collection by an amount equal to the stimulus duration to allow measurement of spontaneous discharge before stimulation. In addition, spike data were recorded during a silent interval equal to the total sound duration plus interstimulus interval at the start of each stimulus set repetition. The set of locations that evoked increases in discharge rate, relative to background firing, will be referred to as the “spatial receptive field” (SRF). The area within the SRF from which ≥75% of the maximal rate was obtained is termed the “best area.” For units with SRFs containing a single dominant peak, the location within the best area that elicited the maximal average firing rate is termed the “best location.” In off-line analysis (see spatial tuning index), the best location was determined by calculating the weighted average of locations within the best area, using response magnitude as the weighting factor. (If the best area contained more than one region, the one that contained the most total spikes was used.) For on-line determination of target locations (see interaction index) the best location was estimated as the center of the dominant peak of the SRF.
SPATIAL TUNING INDEX.
Neuronal spatial selectivity was quantified by a spatial tuning index (STI), that measures the spatial dispersion of responses across the frontal hemifield. STI is calculated from the spatial response profile by computing a weighted sum of angles between the sampled locations and the unit's best location, with response magnitude as the weighting factor, normalized to sum of all angles (1) where αi is the absolute value of the angle between location i and the unit's best location, Ri is the magnitude of the response to location i, and n is the number of locations tested. STI has a potential range from 1, if the spatial distribution of responses is uniform, to 0, if a unit responds to only a single location. The range of observed values was 0.003 to 0.471.
The effect of a leading or lagging sound on the response to a sound at a unit's best location was quantified using the interaction index (I; Eq. 2). The sound at best location is termed the target (t) and the other sound the masker (m). I is calculated as follows (2) where Rt is the response (spikes/stimulus) to the target alone and Rt+m is the response to the target plus masker. The values of I have a potential range from −1, indicating complete suppression of the response in the target plus masker condition, to numbers approaching +1, indicating strong enhancement of the response. Values of 0 indicate that the masker had no effect.
The time windows used to measure Rt+m and Rt are illustrated in Fig. 1. To quantify the effect of the masker on total spike output, the Rt+m window was set to include all spikes evoked by either the target or the masker. Thus the window began at the onset of the leading sound, delayed by the response latency, and had a duration equal to the lag delay plus either the duration of the response to the target-alone (measured from visual inspection of spike raster displays) or, alternatively, the duration of the stimulus duration plus 10 ms (Fig. 1, vertical lines), whichever was longer. The use of the alternative minimum duration insured inclusion of responses occurring after the offset of leading maskers at short delays in some units, that might otherwise be excluded (e.g., Fig. 1A, Masker Leads). The Rt measurement window began at stimulus onset, delayed the response latency, and had the same duration as that of the Rt+m window.
RESPONSES TO INDIVIDUAL STIMULUS SEGMENTS.
The combination of temporally overlapping leading and lagging sounds generates a stimulus with 3 distinct segments. The leading segment conveys directional cues identical to those of the leading sound, presented in isolation. This is followed by an overlap segment, during which acoustic superposition of leading and lagging sounds degrades the binaural cues for each source location. Finally, during the trailing segment the binaural cues are essentially identical to those of the lagging sound in isolation. To gain further insight into the mechanisms underlying the masking effects, responses to the different stimulus segments were analyzed separately, as illustrated in Fig. 2.
The masker's effect during the overlap stimulus segment was quantified by calculating I, as before (Eq. 2), but with Rt+m measured in a window starting 10 ms after the onset of the lagging sound (delayed by the response latency), and ending at the offset of the leading sound (Fig. 2, dashed lines). The start of the overlap Rt+m window was delayed by 10 ms to prevent inclusion of responses to the leading segment in the target-leading condition, which often appeared to continue for a few milliseconds past masker onset. Rt was measured in an equivalent time window, relative to target onset. Thus the Rt window starts 10 ms after target onset, in the target-lagging condition, and 10 ms after masker onset, in the target-leading condition. The value of I was considered to be undefined if neither Rt+m nor Rt was significantly >0 (P < 0.05).
To quantify masking effects on responses to trailing segments, I was used to compare the response at masker offset in the target-lagging condition to the onset of the response in the target-alone condition (Fig. 2). Thus Rt+m was measured in a window starting at masker offset (Fig. 2, Masker Leads, shaded area). The duration of the Rt+m window was equal to the lag delay at delays ≥5 ms. At shorter delays, a duration of 5 ms was used because the responses to leading and lagging segments often continued for a few milliseconds beyond the lag delay, and because shorter windows yielded less reliable results. Rt was measured in a window with duration equal to that of the Rt+m window, starting at target onset. The reasons for choosing the onset segment of the target-alone response as a reference for comparison, in preference to the final segment, are detailed in the results section. For comparison, the response to leading segments of the target were analyzed in a similar manner. In this case, both Rt+m and Rt were measured in windows starting at target onset, with duration equal to the lag delay (Fig. 2, Target Leads).
Neuronal detection of a target at best location in the presence of a masker was quantified by receiver operating characteristic (ROC) analysis, following methods applied in previous neurophysiological studies (e.g., Bradley et al. 1987; Britten et al. 1992; Mountcastle et al. 1969). Responses were recorded during 20 repetitions of several target-plus-masker combinations with lead-lag delay varied from −200 to 200 ms. By convention, lag delay is measured between the onset of leading and lagging sounds, and is positive when the masker leads the target. The response to the target plus masker (Rt+m) was measured using the same procedures as in the initial I calculation, except that responses were not averaged across repetitions. The masker-alone response (Rm) was measured using the initial 200 ms of the +200 ms delay stimulus, in a window starting at stimulus onset, delayed by the unit's response latency, and with duration equal to the lag delay plus the longer of 110 ms or the duration of the response to the target alone. The minimum effective response duration of 110 ms was again used to prevent exclusion of spike bursts after the offset of leading maskers in units with short (<110 ms) target-alone responses. ROC curves were constructed using a set of response criterion values spanning the range of single-trial Rt+m and Rm values. The ROC curve was generated by plotting the proportion of “hits” (Rt+m > criterion), against the proportion of “false alarms” (Rm > criterion), for each criterion value. The criterion values included the maximum and minimum response values, as well as any value that resulted in a change in the proportion of both hits and false alarms. A criterion value greater than the maximum response and a value of one less than the minimum response were also included to define the endpoints of the curve. The area under the resulting curve, termed proportion correct [p(c)] provides an unbiased measure of target detection (Green and Swets 1966), representing the performance of an ideal observer, using the neuron's responses as the decision variable. Values of 0.5 and 1 correspond to chance and perfect detection performance, respectively. Values <0.5 may occur if the average response to the masker-alone condition is greater than the response to the target-plus-masker condition.
HIGH-RESOLUTION AZIMUTH TUNING.
High-resolution single-source azimuth tuning curves were obtained by recording neuronal responses to a set of virtual locations spanning the azimuthal extent (or a portion thereof) of the peak of the SRF in 1° increments at an elevation of −20, −10, 0, 10, or 20°. Because most space-specific units cannot reliably detect changes in elevation about their SRF peaks of <5° (unpublished observations), this sampling was sufficient to characterize the azimuth tuning at the SRF peak of units with best elevations between −25 and 25°. Azimuth tuning curves for leading and lagging targets were obtained in the same manner, but with the addition of a masker at a fixed location outside the SRF, at the same elevation as the loci sampled at high resolution. The average azimuthal separation between the masker and the units' best azimuths was 28.0 ± 6.8°. The lag delay was always 3 ms, and sound durations were 100 ms. The peaks of azimuth tuning curves for single, leading, and lagging targets were determined by fitting the tuning-curve data with either a Gaussian curve or, if the tuning-curve was clearly skewed, with a lognormal curve. To ensure that fitted peaks accurately reflected neuronal azimuth tuning, a curve fit was excluded from further analysis if it explained <75% of the response variance, or if the calculated peak was located at an endpoint of the range of sampled azimuths. Best azimuths for single, leading, and lagging targets were determined from the corresponding curve fits. Tuning-curve shifts were calculated by determining the change in best azimuths between the single source and either leading or lagging target conditions relative to the location of the masker. By convention, a positive shift value indicates that the best azimuth in the target+masker condition is further away from the masker than the best azimuth in the single source condition.
Auditory responses were recorded from 241 spatially sensitive single units in the midbrains of 3 barn owls. The sampled SRFs exhibited varying degrees of spatial selectivity, ranging from those containing distinct lateral side peaks and/or vertically elongated central peaks to those with a single, spatially restricted peak. The joint distributions of spatial selectivity and response latency for the total unit sample are illustrated in Fig. 3. There was a significant negative correlation (ρ = −0.23; P = 0.0004) between spatial selectivity, quantified by STI, and unit response latency, indicating that the most spatially selective units had the longest latencies. There was no clear indication, however, that either distribution contained multiple modes. These findings are in agreement with those of a previous study (Mazer 1995) and are consistent with the view that spatial selectivity develops through a gradual process within ICc-ls and ICx. Recordings were also obtained from 8 optic tectum (OT) units that responded to light as well as sound. Tectal units had long latencies and spatial selectivity comparable to that of the most selective IC neurons (Fig. 3, stars). The results below were obtained from a subset of 98 units from which detailed data sets were collected. All units chosen for further analysis had SRFs with a single, dominant peak, although smaller side peaks may have been present. The distribution of STI values for these units is shown on the right in Fig. 3 (shaded). The mean STI value for this subpopulation (0.10) was significantly lower than that for the total unit sample (0.13; P = 0.009, Mann–Whitney U test). The mean response latencies of the 2 populations (12.3 and 12.8 ms) were not significantly different (P = 0.25, t-test). The distributions of spatial selectivity and response latency suggest that the latter unit sample contains a mixture of units from lateral ICc-ls and ICx.
Spatial distribution of lead-source effects
A previous study demonstrated that the responses of space-specific neurons to sounds at their best locations were reduced in the presence of leading sounds displaced by 40° in azimuth (Keller and Takahashi 1996b). Thus the neuronal representation of the direction of the lagging sound within the space map is suppressed, providing a potential neuronal correlate of the behavioral PE. We now consider the spatial distribution of such effects in both azimuth and elevation. The spatial dependence of lead source effects was studied, quantitatively, in 26 units by recording responses to lagging sounds at the units’ best locations, combined with leading sounds at locations spanning the frontal hemifield. For simplicity, the lagging sound at best location will be referred to as the target (t), and leading sound as the masker (m). The delay between onset of the masker and the target was either 3 (22 units) or 5 (4 units) ms.
The usual patterns of lead-source effects are illustrated for 3 units with varying degrees of spatial selectivity in Fig. 4. The lead effect was quantified by the interaction index (I, see methods, Data analysis). For this analysis, the response to the target plus masker (Rt+m) was measured in a window set to capture all spikes evoked by either the target or masker, and compared with the response to the target-alone (Rt), measured in a window with equivalent duration (Fig. 1). The SRPs are shown in the left column (Fig. 4A) and the spatial distributions of I are shown in the middle column (Fig. 4B). The target locations (crosses) and best areas (dotted lines) are indicated in Fig. 4B to facilitate comparison of the spatial topographies of lead effects with the SRPs. In all 3 units, the lead-source effects ranged from suppression (I < 0) to values close to zero, indicating that the masker had little effect on the net response relative to the target-alone condition. Lead-evoked facilitation (Rt+m > Rt + response to masker-alone) was never observed in any of the units studied. For all 3 units in Fig. 4B, the masking effect was minimal at locations near the best area, or at far peripheral locations. In the most spatially selective unit (unit 883DC, top row), the masker was most suppressive when it was located lateral to or above the SRF. In the other 2 units, the SRF extends vertically, following the contour of locations with the same ITD as that of the best location. In such units, lead effects were usually minimal along the same iso-ITD contour as that of the best location, and maximal at laterally adjacent locations. The masking effect (I) is plotted against the normalized single-source response at each location in Fig. 4C. In all 3 cases, the strongest lead-evoked suppression occurred at locations that produced the weakest responses in the single-source condition. By contrast, leading sounds at locations that produced responses >40% of maximum in the single-source condition had minimal suppressive effects.
Differential effects of leading and lagging maskers
Because the sounds used in the preceding test were much longer than the lag delay, resulting in considerable temporal overlap of leading and lagging waveforms, the suppression of neuronal responses to the target caused by leading maskers located outside the SRF could reflect either acoustic or neuronal interactions, or a combination of the two. To better understand the cause of response suppression, we next compare the effects of leading and lagging maskers on responses to targets at the best location in a larger sample of IC units exhibiting varying levels of spatial selectivity.
When leading and lagging sound are presented from different locations at approximately equal levels, some reduction of the response to the best location sound is expected to result from degradation of the binaural cues caused by the acoustic superposition of waveforms from the 2 sources (Keller and Takahashi 1996a; Takahashi and Keller 1994). Specifically, the spectrum of interaural level difference cues will be altered, and the level of interaural correlation diminished, the latter resulting in a reduction of the effectiveness of ITD cues (Albeck and Konishi 1995; Saberi et al. 1998). Such effects are approximately equal, regardless of whether the masker or target leads. Therefore any difference in suppressive effects caused by leading and lagging maskers cannot be attributed to these acoustic interactions alone. Furthermore, any difference in effectiveness of suppression between leading and lagging maskers resulting from interactions within peripheral filters (Hartung and Trahiotis 2001; Trahiotis and Hartung 2002) or asymmetric temporal weighting at the initial site of binaural interaction (Tollin 1998) should be exhibited by all IC neurons.
Effects of leading and lagging maskers on responses to targets at best location were compared in 59 units, including 6 presumptive OT units. For this test the masker was positioned outside the SRF, at a location that was found, on-line, to suppress responses to the target. In 58/59 units, the masker was positioned at the same elevation as the estimated best location. In one unit, the masker was positioned 30° above the best location. The azimuthal separations between targets and maskers ranged from 14 to 55° (median = 26°). Sounds were 100-ms broadband (2–12 kHz) noise bursts presented with lead-lag onset delays of ±1, 2, 5, 10, 20, 50, 100, and 200 ms, expressed relative to target onset. The target and masker waveforms were identical, before convolution with the head-related impulse responses (HRIRs).
Comparing effects of leading and lagging maskers revealed 2 types of suppressive effects. In many units, the amount of suppression was similar, regardless of whether the masker led or lagged. This type of effect, termed temporally symmetric, is illustrated by responses of unit 883CJ in Fig. 5. The response to a target leading the masker by 200 ms is the same as that evoked by the target alone, and consists of a robust, moderately adapting discharge, sustained throughout the stimulus duration. By contrast, when the masker leads by 200 ms, it elicits a single spike at onset on some trials, followed by suppression of spontaneous firing, suggesting lateral inhibition. As the delay is decreased, causing the sounds to overlap in time, the response to the target is clearly suppressed throughout the duration of the masker. At delays from 1 to 20 ms, the magnitude of suppression is approximately equal, whether the masker leads or lags. This symmetric suppression is consistent with the effects of acoustic superposition on the available binaural cues. It is also possible that the masker exerts an additional inhibitory effect. However, any contribution of lateral inhibition to suppression of the target response appears not to depend on the temporal order of the masker and target. Note that, at delays from 5 to 20 ms, there is a strong, transient burst of spikes at the offset of the masker. At 50 ms, suppression is stronger when the target leads. This effect may reflect an interaction of the degradation of binaural cues with the temporal dynamics of the response to the target. When the target leads, the masker coincides with the weaker, later portion of the response and is thus more effective than in the opposite configuration, when it coincides with the stronger initial portion of the response. At 100 and 200 ms the masker has little to no effect. As in this example, symmetric response suppression was most often observed in units with SRFs containing prominent lateral side-peaks and vertically elongated main peaks.
In more spatially selective units, suppression was often greater when the masker led than when it lagged. Such temporally asymmetric suppression is illustrated by responses of unit 883CG in Fig. 6. In this case, the unit responded to a target at −10° azimuth and failed to respond to a masker at 15° azimuth. At delays of 5 ms and below, the response to the target was completely suppressed when the masker led, and only partially suppressed when it lagged. In contrast to the previous example, the response at masker offset was weak or absent at delays below 20 ms in the masker-leading condition. Although a response to the target became apparent at longer delays, an asymmetric suppressive effect was evident at delays ≤50 ms. This type of asymmetric interaction is analogous to the behavioral PE in that the neuronal representation of the location of a lagging sound is more effectively suppressed than is that of a leading sound.
The effects of leading and lagging maskers at all delays are compared for the entire sample of 59 units in Fig. 7. In each plot the interaction index values (I, see methods, Data analysis) for lagging maskers are plotted against those for leading maskers. The unity line indicates temporally symmetric effects. At delays from 1 to 50 ms, nearly all points are contained in the lower left quadrant (interaction index values −1 to 0), indicating that the response to the target was suppressed to some extent when it overlapped in time with the masker. At delays from 1 to 10 ms, most points fell on or below the unity line, indicating that, for most units, the extent of suppression was equal or greater when the masker led than when it lagged.
The different unit subpopulations, designated by plot symbols, exhibited different effects. The scatter of values from the least spatially selective units (ST > 0.13; squares) fell just below the unity line at delays from 1 to 2 ms, and became centered on it at longer delays. Thus responses of these units exhibited mildly asymmetric suppression at the shortest delays, and temporally symmetric suppression at longer delays. By contrast, the scatters of values from the more spatially selective units (ST < 0.13; circles) and OT units included substantial numbers of points falling well below the unity line at delays from 1 to 50 ms. Thus profound temporal asymmetry was apparent only in the most spatially selective units.
The preceding conclusions were confirmed by statistical analysis (Table 1). At each delay value, IC units were classified as either symmetric or asymmetric by comparing responses to the target when the masker led or lagged. A unit was classified as symmetric if its responses with both leading and lagging maskers were significantly lower than the response to the target-alone, but not different from one another. A unit was classified as asymmetric if its response in the masker leading condition was significantly lower than half the magnitude of the response in the masker-lagging condition (all comparisons: t-test, α = 0.01). At delay values from 1 to 20 ms there was a substantial proportion of symmetric units. At each delay value in this range the mean STI value of asymmetric units was significantly lower than that for symmetric units, indicating the former units were more spatially selective.
The relation between response asymmetry and both spatial selectivity and response latency is examined further in Fig. 8. Here, asymmetry is quantified as the difference between I values calculated in the masker-leading and masker-lagging conditions (IM_leads − IM_lags). Thus large negative values indicate greater suppression when the masker leads than when it lags, and values of zero indicate equal masking effects in the 2 conditions. Response asymmetry, thus calculated, is plotted against spatial selectivity (STI, left column) and response latency (right column), for each IC unit for delays from 1 to 50 ms. As expected from the preceding analysis, at delays from 1 to 20 ms, units with low spatial selectivity (STI >0.13) had asymmetry values near zero. Among the more selective units response asymmetry increased systematically as a function of spatial selectivity. Because low STI values indicate high spatial selectivity, this relation is manifest as a significant (P < 0.01) positive correlation between STI and asymmetry at delays below 50 ms. Because spatial selectivity is correlated with latency, it is not unexpected that response asymmetry is also correlated with latency. In this case, response asymmetry tended to increase in a negative direction with increasing latency, resulting in negative correlations. Unlike spatial selectivity, however, latency was significantly correlated with asymmetry at a delay of 50 ms, but not at delays of 1 and 2 ms. Neither latency nor spatial selectivity was significantly correlated with asymmetry at 100-ms delay (P values of 0.55 and 0.31, respectively).
Masking of responses to individual stimulus segments
The temporal overlap of leading and lagging sounds results in segments of stimuli during which only the leading or lagging waveform is present, flanking a segment in which both are present. The responses to such stimuli were often complex, with distinct components appearing to reflect differences in masking effects within different stimulus segments. Analysis of the segment-specific responses helped to pinpoint the differences between temporally symmetric and asymmetric suppression.
During the overlap segment, the mixing of the leading and lagging waveforms in each ear degrades the binaural cues. The degradation is the same, however, regardless of whether the target leads or lags and would not, by itself, contribute to a temporally asymmetric effect. Thus if suppression resulted entirely from these acoustic interactions, it is expected that masking of responses to the overlap segment would be the same, regardless of which sound led. This prediction was evaluated by comparing masking during the overlap segment in the masker-leading and masker lagging conditions (see methods, Data analysis).
The I values calculated from responses to the overlap segments (Fig. 2, between dashed lines) are plotted as a function of spatial selectivity for delays from 1 to 50 ms in the left column of Fig. 9. At all delays, responses of most units were moderately to heavily suppressed (I < 0) during the overlap segment, both when the masker led (circles) and when it lagged (triangles). This result is consistent with the expectation that acoustic interactions will have a major effect on responses of all units, regardless of the temporal order of masker and target. To compare effects of leading and lagging maskers, the difference between I values in the masker-leading and masker-lagging conditions (asymmetry = IM_leads − IM_lags) is plotted against spatial selectivity in Fig. 9 (right column). At delays between 1 and 10 ms, most values were close to zero, indicating approximately equal suppression in the 2 conditions. Thus in most units temporally symmetric influences were sufficient to explain the masking effects within the overlap segment at these delays. This result is consistent with the expected acoustic masking effects, as well as with lateral inhibition, provided that the inhibitory mechanism is insensitive to the temporal order of masker and target. The major exceptions, at delays from 1 to 5 ms, were highly spatially selective units that exhibited greater suppression in the masker-leading condition. This form of temporally asymmetric suppression was also evident at 10- to 50-ms delays in responses of several highly selective units, and 2 less-selective units. Such asymmetric suppression is inconsistent with the predictions of acoustic masking, and indicates the contribution of an additional mechanism that is sensitive to the temporal order of masker and target.
At delays from 10 to 50 ms, the responses of most units differed from the predictions of a simple acoustic masking effect because suppression was greater in the masker-lagging condition. This effect is probably attributable to adaptation of responses. In the target-leading condition, the overlap segment follows a leading segment, in which only the target is present, which always evokes a strong response (e.g., Figs. 1 and 2, target leads). By contrast, in the masker-leading condition, the overlap segment follows a leading segment that evokes little or no response (e.g., Figs. 1 and 2, target leads). Consequently, response adaptation would be expected to suppress responses to the overlap segment in the masker-lagging condition, but not in the masker-leading condition. This explanation is supported by the fact that suppression in the masker-lagging condition actually increases in many units, particularly less-selective ones, as the delay is increased from 10 to 50 ms. Thus the masking observed during the overlap segment in many units is consistent with the expectations of acoustic masking at short delays, with an additional contribution of adaptation at longer delays. Neither mechanism, however, can account for the temporally asymmetric masking effects during the overlap segment exhibited by many of the more spatially selective units.
A second prediction of simultaneous acoustical masking is that the suppressive effects should occur only during the overlap segment. Thus in the masker-leading condition, we would expect to see a strong response to the trailing stimulus segment, in which only the target is present. This prediction is consistent with the responses of the symmetric unit shown in Fig. 5, at delays >2 ms, but not with those of the asymmetric unit shown in Fig. 6. At delays from 5 to 50 ms, the symmetric unit fired a burst of spikes after the offset of leading maskers, which resembled the response to stimulus onset in the target-alone condition (Fig. 5, Fig. 2A, Masker Leads, shaded region). Such responses demonstrate a recovery from the masking effect almost immediately after termination of masker-target overlap. In the asymmetric unit (Fig. 6, Fig. 2B), by contrast, an onset-like response to the trailing segment does not emerge until the target delay is increased to 20 ms. In this case, the masker appears to exert a suppressive influence on the response to the trailing segment that persists well beyond masker offset. Such effects are consistent with a long-lasting inhibitory mechanism, but not with the degradation of binaural cues resulting from acoustic superposition.
Suppression beyond masker offset was evaluated in the entire IC unit sample by comparing the responses to the trailing stimulus segment, in the masker-leading condition, to the onset segments of the responses to the target alone (see methods, Data analysis and Fig. 2). The resulting I values are plotted as a function of spatial selectivity in Fig. 10 (left column, circles). To demonstrate how the suppression of responses to trailing segments contributes to overall response asymmetry, the responses to leading segments in the masker-lagging condition were analyzed in similar fashion (Fig. 10, left column, triangles). At delays of 1 and 2 ms, most units had little or no response to the trailing target-alone segment, resulting in negative values. As delay was increased, responses to the trailing segment emerged more quickly in less spatially selective units than in the more selective ones. Thus at a delay of 50 ms, I values for responses to trailing segments are close to 0 in the less-selective units (STI <0.13), but well below 0 for many of the more-selective units. The recovery of responses at short delays in the less spatially selective units indicates that masking effects are primarily limited to the overlap segment. This effect is consistent with the expectations for acoustic masking, as well as lateral inhibition with a short time constant. By contrast, the suppression of responses to the trailing segment at delays up to tens of milliseconds in the highly selective units is suggestive of long-acting lead-evoked inhibition. This effect is unlikely to have resulted from response adaptation because the strongest suppression of trailing segment responses usually occurred after overlap segments that evoked little or no response. A peripheral mechanism is equally unlikely because interactions within peripheral filters are limited to a few milliseconds.
The suppression of responses to the trailing target-alone stimulus segments makes a major contribution to the temporal asymmetry of masking effects in the most spatially selective units. Trailing segment masking is, by its very nature, a form of temporal asymmetry because the response to the leading portion of the target in the masker-lagging condition (Fig. 2, Target Leads, shaded areas) is not subject to a similar influence. This is illustrated in Fig. 10 by comparison of I values calculated from responses to the trailing (circles) and leading (triangles) segments. At delays >1 ms, the values for leading segment responses are all near 0, indicating that these responses are similar to the onset portion of the target-alone response. (The negative I values for leading-segment responses in a few units at 1 and 2 ms could indicate either that the Rt+m window was long enough to include a portion of the response to the overlap segment, or that the integration time for neuronal responses was greater than the leading segment duration.) The temporal asymmetry of suppression is illustrated in the right column of Fig. 10 by plotting the difference in I values for trailing and leading segments (asymmetry = Itrailing − Ileading) as a function of spatial selectivity. At delays of 1 and 2 ms nearly all units exhibit strong temporally asymmetric suppression. This effect is not expected based on acoustic masking, but is consistent with either peripheral filter interactions, or short-acting inhibition. As the delay is increased, the asymmetry of responses in the less spatially selective units approaches 0 more quickly than that of the more selective units, as expected from the difference in suppression of responses to the trailing segments. By 50 ms, the values for poorly selective units are all close to 0, whereas many highly selective units still exhibit considerable asymmetry, resulting from the long-lasting suppression of trailing segment responses.
In summary, separate analysis of responses to the overlap and trailing stimulus segments suggests that the observed masking effects reflect a combination of several factors. In the least spatially selective neurons, at delays >2 ms, a combination of temporally symmetric suppression of responses to the overlapping stimulus segment and robust responses to trailing target-alone segments resulted in approximately equal masking effects in the masker-leading and masker-lagging conditions. This type of masking is consistent with the expected effects of acoustic superposition on binaural cues, with possible additional contributions of temporally symmetric lateral inhibition. Such acoustic effects also appear to make a major contribution to the suppression of responses to the overlapping stimulus segment in the more selective units. However, this mechanism cannot account for the temporally asymmetric suppression in the more selective units that resulted from a combination of long-lasting suppression of responses to trailing segments of the target and, in some units, greater suppression during the overlap segment in the masker-leading condition. Such effects are consistent with a lead-evoked inhibitory mechanism. Finally, at delays of 1 and 2 ms nearly all units exhibited some level of temporally asymmetric suppression. In most units, this effect was primarily attributable to the masking of responses to trailing target-alone segments. This effect is consistent with a peripheral mechanism or with short-acting lateral inhibition.
Neuronal detection of leading and lagging sounds
The temporal asymmetry of suppressive effects results in a difference in the ability of the most spatially selective IC neurons to detect leading and lagging targets at their best locations. This effect was quantified by ROC analysis measuring the ability of neurons to signal the presence of the target through changes in discharge rate relative to the masker-alone condition (see methods, Data analysis for details). The methods used to generate ROC curves from neuronal spike data are based on those used in previous studies (e.g., Bradley et al. 1987; Britten et al. 1992; Mountcastle et al. 1969) and are illustrated in Fig. 11. In this example, there was a small response to the masker, located on the edge of the SRF (Fig. 11B, masker alone), and a much larger response to the target at the best location (Fig. 11B, target alone). When the masker led the target by 1 ms (Fig. 11B, +1 ms), the response was slightly greater than that to the masker alone. When the target led by 1 ms (Fig. 11B, −1 ms), the response was much greater than that to the masker alone, but still less than that to the target alone. To construct ROC curves, spikes were counted on each stimulus repetition in a time window set to capture all spikes evoked by either sound (Fig. 11B, dashed vertical lines, same windowing as in Figs. 1 and 7). A set of response criteria was adopted that spanned the range of recorded spike counts. ROC curves were generated by plotting the proportion of trials on which Rt+m exceeded the criterion (“hits”) against the proportion of trials on which Rm exceeded the criterion (“false alarms”) for each criterion value. The areas under the resulting curves [proportion correct, p(c)] are equivalent to the performance of an ideal observer using the neuron's spike counts as a decision variable in a 2-alternative forced-choice task (Green and Swets 1966). ROC curves obtained from the responses in Fig. 11B are shown in Fig. 11C. In this example, when the target led the masker by 1 ms (−1 ms, triangles), the response on nearly every trial was greater than the maximum response in the masker-alone condition, resulting in nearly perfect detection performance [0.99 p(c)]. The smaller response when the masker led by the same amount (+1 ms) was reflected in lower target detectability [0.74 p(c)]. Increasing the target delay to +5 ms caused the response to increase (response not shown), improving detection performance to 0.92 p(c).
The relation between masker-evoked suppression and target detectability is illustrated in Fig. 12 for the 3 units used as examples in Figs. 5, 6, and 11. Detection functions, obtained by plotting p(c) as a function of lag delay, are shown in the left column. For comparison, the effect of the masker on the magnitude of responses to the target is quantified by “recovery” functions, plotted in the right column. Recovery was calculated according to the equation (3) with Rt+m and Rt measured in the same windows used in the initial interaction index calculation (Fig. 1). This measure was chosen because it is similar to that used previously to compare neuronal and behavioral echo thresholds (e.g., Fitzpatrick 1999; Yin 1994). For all 3 units, the magnitude of responses to leading and lagging targets are reduced by 50% or more at the shortest delays. Nevertheless, the responses of each unit to leading targets were sufficient to enable perfect, or near perfect, detection performance. Units 883AJ (A, also Fig. 11) and 883CG (C, also Fig. 6) exhibited temporally asymmetric masker effects. In these units, the greater suppression of responses to lagging targets resulted in an asymmetry of detection performance at the shorter delays. Detection performance for lagging targets improved as the response recovered at longer delays. Unit 883CJ (B, also Fig. 5) exhibited symmetric masker effects. In this case, the suppressed responses to leading and lagging targets were sufficient to support perfect detection performance at all delays. The data from all 3 units demonstrate that even very heavily suppressed responses may be sufficient to reliably indicate the presence of a target at the unit's best location.
Thresholds for target detection and response recovery for the entire unit sample are compared in Fig. 13. Thresholds were calculated by finding the lag delay at which performance exceeded an arbitrary criterion level of 0.75 p(c) or 50% recovery. Threshold values between sampled delays were estimated by linear interpolation. In a few cases, where the detection or recovery function crossed the 0.75 level more than once, the lag delay at which the function crossed and stayed above threshold was used. Thresholds were determined for leading and lagging targets for 59 units, and for lagging targets only, for 21 additional units. In most units for which at least one threshold fell within the range of lag delays tested (1 to 100 ms), thresholds for target detection were lower than those for response recovery. In the target-leading condition, responses of 20/59 units exceeded both detection and recovery criteria at the shortest delay. Of the 39 units with at least one threshold in the sampled range, 30 had lower thresholds for detection than for response recovery. The differences between thresholds obtained by the 2 measures were often large. Many units with suprathreshold detection performance at 1 ms delay had recovery thresholds on the order of tens of milliseconds. In the target-lagging condition, a higher proportion of units (70/80) had at least one threshold within the sampled range. The great majority of these units (56/70) also had lower thresholds for detection than for recovery. Again, recovery thresholds often exceeded detection thresholds by more than an order of magnitude. Thus thresholds derived from measures that simply reflect the magnitude of neuronal responses may greatly underestimate the ability of neurons to signal the presence of a target through reliable changes in firing.
Analysis of detection thresholds revealed an asymmetry of performance for leading and lagging sounds among the more spatially selective units that mirrored the asymmetry of response suppression. The distributions of detection thresholds for all units, grouped according to spatial selectivity (ST cutoff = 0.13), are shown in Fig. 14 A. Most units, in both categories, could detect leading targets at all delays. The proportion of broadly tuned units with lead-detection thresholds below 1 ms (14/16) was slightly higher than that for sharply tuned units (27/37). There was a much greater difference in the abilities of the 2 populations to detect lagging targets. A high proportion of broadly tuned units (18/25) could also detect lagging targets at all delays. By contrast, most sharply tuned units were unable to detect targets at short delays. Only 17/53 sharply tuned units had thresholds below 1 ms.
The detection performance of the 53 units tested with both leading and lagging targets is summarized in Fig. 14B. Here, cumulative proportions of suprathreshold units are plotted as a function of lag delay. At a delay of 1 ms, most of the broadly tuned units could detect both leading and lagging targets. As the delay was increased to 5 ms the proportion of these units that were above threshold increased to 1. Among the more sharply tuned units, there were large disparities in detection performance for leading and lagging targets at short delays that diminished as delay was increased. Thus at delays from 1 to 10 ms, there is a substantial difference in the abilities of neurons with differing levels of spatial selectivity to detect a lagging source at best location, and hence to contribute to the neuronal representation of its location. The neuronal representation of the location of a lagging source by responses of the lessspatially selective units is robust. By contrast, the representation of the location of a lagging source conveyed by responses of the most spatially selective units is substantially attenuated relative to that of a leading source.
Azimuth tuning of responses to leading and lagging targets
Results of recent modeling studies suggest that suppression of neuronal responses to lagging sounds may result from interactions within peripheral filters that give rise to new effective internal interaural time differences (ITDs) and interaural level differences (ILDs) that differ from those present in the stimuli (Hartung and Trahiotis 2001; Trahiotis and Hartung 2002). Consequently, a lead-lag sound pair, with the lagging sound at a unit's best binaural configuration and the leading sound at the worst configuration, may produce internal ITDs and/or ILDs, at the level of the auditory nerve, that are well outside the range of the unit's binaural tuning curve(s). If such mechanisms were responsible for the asymmetric suppression of responses to targets at best location observed in the present study, azimuth tuning of responses to leading and lagging targets should be substantially shifted relative to that for single sounds, and tuning of responses to lagging targets should be more affected than that for leading targets. These predictions were evaluated by examining high-resolution azimuth tuning curves for single, leading, and lagging targets obtained from a sample of 52 units. High-resolution tuning curves were obtained by recording responses to targets at a set of virtual locations spanning the peak of the SRF in 1° increments, either alone, or in the presence of a masker at a fixed location outside the SRF peak, and a lead-lag delay of 3 ms (see methods, Data analysis).
Although the peaks of the azimuth tuning-curves for leading and lagging targets were often shifted relative to the peak of the single source tuning-curve, the observed shifts were small. Examples are shown in Fig. 15, A and B. Units 719XQ (Fig. 15A) and 916CC (Fig. 15B) exhibited temporally symmetric and asymmetric masker effects, respectively. In both cases, the azimuth tuning curves for leading and lagging targets are shifted by 3 or 4 degrees, relative to the azimuth tuning curves for single sources, and in a direction opposite to that of the masker (+20° azimuth in both cases). Unit 719QF (Fig. 15C) exhibited pronounced temporally asymmetric suppression. Nevertheless, the weak responses to lagging targets had similar azimuth tuning to responses to single and leading targets, with the best azimuths differing by only 1°. Responses of this unit and unit 916CC suggest that asymmetric suppression does not result from a shift of the effective ITD, at the level of the auditory nerve, to values outside the range that drive neuronal responses to single sources.
These effects were quantified for the entire neuronal sample by calculating the change in best azimuths for single sources and either leading or lagging targets (see methods). Figure 16A shows the distributions of tuning curve shifts for all units with azimuth-tuned responses to leading (top) and lagging (bottom) targets that met the selection criteria. The direction of tuning-curve shifts are expressed relative to the azimuth of masker, with positive values indicating a shift away from its location. As illustrated in the preceding examples, the shifts in azimuth tuning observed across the entire unit sample were generally small, with absolute values averaging 2.9° (range: 0–10°) for responses to leading targets and 3.0° (range: 0–11°) for responses to lagging targets, and were most often directed away from the masker. Forty-one units had adequately characterized, azimuth-tuned responses to lagging and leading targets. Among these units, any observed shifts in azimuth tuning were generally of equivalent magnitude and direction in the 2 conditions. In Fig. 16B the magnitude of the tuning-curve shift for lagging targets is plotted against that for leading targets for each unit. The scatter of points falls close to the unity line (dashed). The data scatter was well fit (r2 = 0.75) by a regression line with slope of 0.91 and y-intercept of −0.22°. Neither the slope nor the y-intercept of the calculated regression was significantly different from that of the unity line (slope: P = 0.205; intercept: P = 0.454, t-test). Because many units that exhibited temporally asymmetric suppression had azimuth-tuned responses to leading targets, but not lagging ones, units exhibiting temporally symmetric responses are overrepresented in the sample used for this comparison. Nevertheless, in 10/41 units with azimuth-tuned responses to lagging targets (Fig. 16B, diamonds), the maximum response to lagging targets was less than half that for leading targets,indicating temporally asymmetric suppression. There was no obvious difference in effects between these units and the rest of the sample. Thus there is no evidence that temporally asymmetric suppression is associated with a corresponding asymmetric change in azimuth tuning, as would be expected if asymmetric suppression resulted from interactions within peripheral filters. To the contrary, the symmetry of tuning curve shifts for leading and lagging targets suggests that this effect is caused by acoustic summation of leading and lagging sounds that produces approximately equal changes in binaural cues, regardless of which sound leads.
The results demonstrate that the responses of space-specific neurons to targets at their best locations are reduced by the presence of leading or lagging maskers located outside their SRFs. In the most spatially selective neurons, the suppression was temporally asymmetric, being stronger when the masker led than when it lagged. This phenomenon is a potential neuronal correlate of the PE. The suppression of responses to lagging targets has also been documented in previous studies using transient stimuli in several central auditory structures in mammals (Fitzpatrick et al. 1995, 1999; Litovsky and Delgutte 2002; Litovsky and Yin 1998a, b; Mickey and Middlebrooks 2001; Reale and Brugge 2000; Yin 1994) and in the barn owl's ICx (Keller and Takahashi 1996b). As a result of temporally asymmetric masking, a substantial proportion of the more spatially selective units could signal the presence of a leading target, but not a lagging target at short lead-lag delays. The least spatially selective units exhibited temporally symmetric effects, in which the reduction of responses was equivalent when the masker led or lagged.
Spatial dependence of lead source effects
Several previous studies have addressed the dependence of lead-source effects on spatial cues, using transient stimuli. In the IC and auditory cortex of awake rabbits, suppression of responses to lagging sounds was commonly observed in both ITD-sensitive and ITD-insensitive neurons (Fitzpatrick et al. 1995, 1999). In most ITD-sensitive neurons, leading sounds presented at either the best or worst ITD suppressed responses to a lagging sound at the best ITD. Approximately equal proportions of neurons in both structures exhibited stronger suppression when the leading sound was at the best ITD (“best/best” configuration) or at the worst ITD (“worst/best” configuration). Other studies have used either free-field presentation (Litovsky and Yin 1998b) or VAS stimuli (Litovsky and Delgutte 2002; Reale and Brugge 2000), generated using a standard set of HRTFs, to examine the spatial dependence of lead-source effects in the same structures in anesthetized cats (barbiturates: Reale and Brugge 2000; urethane: Litovsky and Delgutte 2002). As in the rabbit, responses of spatially sensitive neurons to lagging sounds were usually suppressed by leading sounds at either the best or worst location. However, in cat IC and auditory cortex, large majorities of such units exhibited stronger lag-response suppression in the best/best configuration than when the leading and lagging sounds were at different locations. It is not clear to what extent the apparent differences in proportions of different types of spatial interactions observed in cats and rabbits reflect true species differences, or differences in anesthetic state or stimulus presentation methods.
All of the units tested for the spatial dependence of lead effects in the present study exhibited unequivocal suppression of responses when the leading sound was located outside of the SRF. This effect is analogous to worst/best suppression reported in previous studies using transient stimuli. Comparison of effects of leading and lagging maskers, located outside the SRF, on responses to targets at the best location revealed that the worst/best suppression observed in the least spatially selective neurons was temporally symmetric, and thus likely to result from the affects of acoustic superposition of leading and lagging waveforms on binaural cues. Because ITD and ILD in barn owls vary predominantly along the azimuthal and elevational dimensions, respectively, such interactions would be expected to have the greatest effects on ITD cues when masker and target differ in azimuth, and on ILD cues, when masker and target differ in elevation. In the more spatially selective neurons, worst/best suppression was often temporally asymmetric. This type of interaction requires an additional mechanism that is sensitive to the temporal order of masker and target. Regardless of the mechanism, temporally asymmetric suppression, occurring when a direct sound and reflection arrive from different locations, may be useful to reduce ambiguity about the location of the direct (leading) source.
In anesthetized cats, the strongest lead-evoked suppression is typically observed when the masker and target are both at a unit's best location (Litovsky and Delgutte 2002; Litovsky and Yin 1998a, b; Yin 1994). It is not clear how such best/best suppression would be expected to affect responses to the stimuli used in the present study, given that the temporal overlap of leading and lagging noise bursts at the same location generates what is essentially a single sound with the same binaural cues as the target alone. The combination sound is slightly louder (1.5 dB) and longer than the single target and has a rippled spectrum throughout the overlap period. Our data demonstrate that these alterations have little effect on the response, relative to that in the target-alone condition.
Site of origin of temporally asymmetric suppression
The association with spatial selectivity suggests that temporally asymmetric masking results from processing within the lateral subdivisions of IC. The single-peaked SRFs of ICx neurons are generated by the combination of binaural inputs in the lateral subdivisions of IC. In the barn owl, neuronal and behavioral sensitivity to the location of sounds in azimuth and elevation depend on sensitivity to ITD and ILD, respectively (Euston and Takahashi 2002; Knudsen and Konishi 1979; Moiseff 1989; Moiseff and Konishi 1983, 1981; Spezio and Takahashi 2003). Inputs from neurons sensitive to ITD and ILD first converge in ICc-ls (Adolphs 1993; Takahashi et al. 1989), where nearly all units are sensitive to both cues (Mazer 1995). The ITD-sensitive input originates exclusively from neurons in the ICc-core that are sensitive to ITD in narrow-frequency bands (Takahashi et al. 1989; Wagner et al. 1987). Because ITD tuning results from detection of ongoing interaural phase relationships (Carr and Konishi 1990; Moiseff and Konishi 1981), the ITD-sensitivity functions of ICc-core neurons are characterized by equal-amplitude peaks occurring at integer multiples of the period of the neurons' best frequencies (Wagner et al. 1987). This form of ITD sensitivity gives rise to SRFs in which maximal firing is distributed along multiple vertical stripes, spaced approximately equally in azimuth (Euston and Takahashi 2002). The generation of single-peaked SRFs thus requires the selective elimination of extraneous peaks from ITD-tuning functions. This process, termed “side-peak suppression,” involves cross-frequency convergence of ITD-sensitive inputs (Takahashi and Konishi 1986) and depends on GABAergic inhibitory mechanisms within ICc-ls and ICx (Fujita and Konishi 1991). The transformation of ILD sensitivity required to generate SRFs with single peaks, restricted in elevation, is not as well understood. The major ILD-sensitive input to ICc-ls originates from neurons in nucleus ventralis lemnisci lateralis, pars posterior (VLVp) that have sharp frequency tuning and sigmoid ILD-sensitivity functions (Manley et al. 1988). Unlike VLVp, many neurons in ICc-ls and ICx have peaked ILD-tuning functions (Moiseff and Konishi 1983). However, recent findings suggest that the generation of selectivity for elevation within ICc-ls and ICx involves the selective combination of peaked and sigmoid ILD-sensitive inputs from different frequency bands (Euston and Takahashi 2002; Spezio and Takahashi 2003).
Available evidence suggests that spatial selectivity emerges gradually within ICc-ls and ICx, and not in discrete steps associated with each structure. A comprehensive mapping study demonstrated that several measures of binaural sensitivity, associated with refinement of spatial selectivity, vary systematically as functions of both lateral position within ICc-ls and ICx, and response latency (Mazer 1995). Neurons in medial ICc-ls tended to have ITD-tuning functions with prominent side peaks, broad ILD tuning, sharp frequency tuning, and short latencies. As recordings were made more laterally, into the ICx, there was a progressive increase in ITD side-peak suppression, sharpness of ILD tuning, breadth of frequency tuning, and response latency. The spatial tuning characteristics and latency distributions of units from which detailed data sets were obtained in the present study are consistent with the properties of units in lateral ICc-ls and ICx reported by Mazer (1995). All of our units had SRFs containing a single dominant peak, although there was variation among units in the relative prominence of spatial side peaks and in the elevational extent of the main peak. Our observation that the least spatially selective units invariably exhibited symmetric masking, and that strong asymmetric suppression was evident only in more spatially selective units suggests that the latter property emerges, in parallel with the refinement of spatial selectivity, through processing within ICc-ls and ICx. This view is also consistent with the finding that, at delays >2 ms, the extent of response asymmetry is significantly correlated with response latency and by the observation that, among the more selective units, the extent of response asymmetry appears to increase systematically as a function of spatial selectivity.
Because recordings were obtained only from lateral subdivisions of IC, however, these data do not rule out the alternative possibility, that asymmetric suppression is generated before ICc-ls and ICx and passed on, selectively, to the more spatially selective neurons. However, this possibility seems less likely because, to bypass neurons with SRFs containing partially suppressed side peaks, there would need to be direct projections from IC-core and/or VLVp to the ICx, for which there is currently no evidence (Adolphs 1993; Takahashi and Keller 1992; Takahashi et al. 1989). In fact, ICx appears to receive input exclusively from ICc-ls (Knudsen 1983). Nevertheless, to establish conclusively whether asymmetric suppression originates within these structures, it will be necessary to record from ICc-core and VLVp.
Physiological basis of temporally asymmetric masking
Previous studies have demonstrated lead-evoked suppression of responses to lagging sounds in binaural or spatially sensitive neurons within a variety of central auditory structures (Fitzpatrick et al. 1995, 1999; Keller and Takahashi 1996b; Yin 1994). Several authors have proposed neuronal inhibitory mechanisms to account for these results (e.g., Fitzpatrick et al. 1995; Yin 1994), and for psychophysical PE phenomena (Harris et al. 1963; Lindemann 1986; Zurek 1987). Recent modeling studies, however, have demonstrated that many of the neuronal and psychophysical effects observed in previous studies could result from interactions of leading and lagging sounds within peripheral filters (Hartung and Trahiotis 2001; Trahiotis and Hartung 2002), or through asymmetric temporal weighting of inputs by neurons at the initial site of binaural interaction (Tollin 1998). The use of sounds with durations longer than the lag delay in the present study introduced an additional source of masking effects: the degradation of binaural cues resulting from acoustic superposition of leading and lagging sounds. Such acoustic effects are likely to cause temporally symmetric masking in all units. However, additional mechanisms are required to explain the temporally asymmetric masking observed at long delays in the most spatially selective units, and at the shortest delays in nearly all units.
Interactions within peripheral filters are unlikely to have made a major contribution to the suppression of responses to lagging targets at delays >2 ms in the present study. At frequencies above 4 kHz, which provide the basis for sound localization (Knudsen and Konishi 1979; Konishi 1973) and neuronal spatial selectivity (Knudsen and Konishi 1978b; Moiseff and Konishi 1983) in barn owls, the ringing response is largely attenuated within a few milliseconds after its peak. Mechanisms based on either peripheral interactions or temporal weighting at the initial site of binaural interaction are also inconsistent with the observed relation between asymmetric suppression and spatial selectivity. Such interactions may, however, explain the temporally asymmetric suppression exhibited by all neurons at delays of 1 and 2 ms. Although response adaptation may have contributed to the asymmetric suppression observed in the most spatially selective neurons, this mechanism cannot account for the suppression of the responses to the trailing segments of lagging targets at delays from 5 to 50 ms. Finally, the fact that the small shifts of azimuth tuning were equal for leading and lagging targets (Figs. 15 and 16) provides further evidence that peripheral interactions do not play a major role in asymmetric suppression at delays ≥3 ms.
The asymmetric masking observed at delays >2 ms is consistent with long-acting inhibition evoked by the leading sound. GABAergic inhibitory neurons and terminals are abundant throughout the brain stem pathways involved in coding ITD and ILD, including the ICc-ls and ICx (Carr et al. 1989). ITD side-peak suppression in space-specific neurons can be selectively counteracted by local application of GABA antagonists (Fujita and Konishi 1991), providing direct evidence that spatial selectivity is shaped by lateral inhibition within the ICx. Furthermore, side-peak suppression develops over the first few milliseconds of neuronal responses in ICx (Wagner 1990), suggesting that lateral inhibitory inputs act with a delay relative to excitatory inputs. Delayed lateral inhibition would be expected to cause greater suppression of responses to lagging targets, if the inhibitory neurons are located within ICx, and are themselves subject to lateral inhibition. In this case, the first arriving sound would excite neurons at the appropriate space map location and inhibit responses to later-arriving sounds at other locations, including inhibitory neurons that would otherwise inhibit responses to the first sound.
Relation to behavioral precedence phenomena
Human psychophysical studies have typically used one of 3 measures to quantify the PE: perceptual fusion, lateralization/localization dominance, and discrimination suppression (reviewed in Blauert 1997; Litovsky et al. 1999). A recent study measured all 3 phenomena as a function of lag delay using the same stimuli and subjects (Litovsky and Shinn-Cunningham 2001). The results demonstrated that, for transient stimuli, a single source is perceived at delays from 1 to 5 ms. Between 5 and 10 ms, subjects perceived the lagging sound as a separate event, but mislocalized it to a lateral position nearest to the leading source. The impairment of spatial discrimination ability for lagging sounds followed the same time course as the effects on perceived lateral position. These findings suggest that different mechanisms underlie perceptual fusion and the localization-dependent precedence phenomena (localization dominance and discrimination suppression).
Neuronal precedence phenomena, such as those documented in the present study, may contribute to both categories of behavioral phenomena. Previous neurophysiological studies have addressed fusion by attempting to relate human thresholds for resolving lagging sounds, referred to as “echo threshold,” to neuronal recovery thresholds determined using a 50% recovery criterion. Measured in this way, the echo thresholds of most neurons in the IC of anesthetized cats (Yin 1994) and auditory cortex of awake rabbits (Fitzpatrick et al. 1999) exceeded human echo thresholds. Putting aside, for the moment, the issue of species differences, such findings could be interpreted to suggest that echo thresholds are determined by the small proportion of neurons with the shortest thresholds. Results of the signal-detection analysis in the present study suggest an alternative interpretation. In this case, neuronal detection was quantified by measuring the ability to signal the presence of the target through a reliable increase in firing rate relative to the response to the masker alone. In most neurons, partially recovered responses were sufficient to support high levels of detection performance. Thus neuronal detection thresholds calculated using signal detection metrics in this study were shorter than those obtained using a 50% recovery criterion. If the same is true in mammals, the level of correspondence between neuronal and behavioral echo thresholds may be greater than was previously suspected. In the barn owl, there was a substantial difference in the proportion of highly selective units exhibiting suprathreshold detection performance [0.75 p(c) criterion] for leading and lagging targets at the shortest delays. As the delay was increased, the proportion of units that could detect lagging targets increased. At delays of 10 to 20 ms, the proportion of suprathreshold units was equal to that for leading sounds at a delay of 1 ms. This finding suggests that the lagging sound can be resolved as a separate event when a substantial proportion of units are capable of signaling its presence through reliable increases in firing rate. Of course, because the time course of lead-evoked suppression in anesthetized preparations (e.g., Reale and Brugge 2000; Yin 1994) appears to be longer than that in awake animals (Fitzpatrick et al. 1995, 1999), it may not be possible to relate the delay dependence of suppression observed in the present study directly to that of behavioral PE measures. Establishing a more conclusive link between neuronal responses and echo thresholds will require comparable neuronal and behavioral measures obtained using the same stimuli in the same species. Such a comparison will require development of a behavioral measure of fusion, suitable for animal studies, that is not dependent on localization/lateralization judgments.
The effect of leading sounds on the perceived location of lagging sounds may be related to the reduction in the magnitude of neuronal responses, which extends to longer delays than do the effects on neuronal detection ability. Explanations of localization-based PE phenomena have been proposed within the context of binaural models in which perceived location is based on a “position” variable derived from the pattern of activity across an array of binaural coincidence detectors (Colburn 1973, 1977; Stern and Colburn 1978). Although recent physiological studies (McAlpine and Grothe 2003; McAlpine et al. 2001) have questioned the general applicability of this model in mammals, there is abundant evidence for the proposed topographic representations of ITD (Carr and Konishi 1990; Wagner et al. 1987) and auditory space (Knudsen and Konishi 1978a) in the owl's central auditory pathways. If the position variable is computed as the centroid of activity along the ITD/azimuth dimension (Hartung and Trahiotis 2001; Stern and Colburn 1978; Trahiotis and Stern 1994), localization dominance will result from any mechanism producing greater effective activity on the array near the ITD of the leading source. Potential mechanisms include preferential weighting of earlier evoked activity by the decision process (Shinn-Cunningham et al. 1993), lateral inhibition (Lindemann 1986), or interactions of responses to leading and lagging sounds at early stages of peripheral (Hartung and Trahiotis 2001) or central (Tollin 1998) auditory processing. Whatever the physiological mechanism, localization dominance results from an effective weighted averaging of ITDs from the 2 sources, favoring the leading source. Discrimination suppression reflects the fact that a greater change in the location of a lagging source is required to change the perceived location by a just-detectable amount than is the case for leading or single sources. This model is supported by the finding that relative weighting factors, calculated from performance of human listeners in lateralization experiments with precedence stimuli, predict performance in ITD discrimination tasks using the same stimuli (Shinn-Cunningham et al. 1993).
The same conceptual framework can be applied to the barn owl by substituting the auditory space map for the binaural array (Saberi et al. 1998). The present results demonstrate that, among the more spatially selective units in IC, activity evoked by a leading source is either equal to or greater than that evoked by a lagging source. The centroid of space-map activity will thus correspond to a location near the leading source, but displaced in the direction of the lagging source, resulting in localization dominance. Both the reduction in the magnitude and the reliability of lag-evoked responses may contribute to lag-discrimination suppression. In addition, the smaller impairment of azimuth discrimination for leading sounds, observed with 25- and 100-ms stimuli (Spitzer et al. 2003), may result from the reduction of lead-evoked activity caused by acoustic superposition and the corresponding reduction of response reliability at short delays.
Recent findings demonstrating close agreement between behavioral and average neuronal performance suggest an alternative model of spatial discrimination in barn owls, based on point-by-point comparison of the neurophysiological images on the space map (Bala et al. 2003). In this case, the relative reductions in reliability of lead- and lag-evoked activity could produce the observed pattern of discrimination impairments by reducing the reliability of differences in the map images of test stimuli that differ in location of either leading or lagging sources. Because resolution of both image comparison and position variable models ultimately depend on the spatial resolution of space-specific neurons, both models might be expected to generate similar discrimination results. Whichever model is used, neither precedence phenomenon would be predicted from the symmetric responses of the less spatially selective IC neurons. Thus the asymmetric suppression of responses to lagging sounds, which appears to develop in parallel with the refinement of spatial selectivity within ICx, may provide the basis for localization-based precedence phenomena in barn owls. To determine whether such neuronal effects are sufficient to account for localization-based precedence phenomena will require further experiments to enable comparison of behavioral measures of the PE with neuronal responses to the same stimuli recorded from awake subjects.
This work was supported by National Institute on Deafness and Other Communication Disorders Grants DC-03925 and DC-00448 and the Medical Research Foundation of Oregon.
K. Keller provided assistance with HRTF measurement and programming the stimulus/data acquisition system.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Copyright © 2004 by the American Physiological Society