|
|
||||||||
1EatonPeabody Laboratory, Massachusetts Eye and Ear Infirmary, Boston; 2Speech and Hearing Bioscience and Technology Program, HarvardMassachusetts Institute of Technology Division of Health Sciences and Technology, Cambridge; and 3Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts
Submitted 26 October 2004; accepted in final form 24 April 2005
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Psychophysical studies have shown that SRM occurs at all stimulus frequencies, although the mechanisms appear to differ at low and high frequencies. At high frequencies (>1.5 kHz), SRM appears to be achieved primarily through the monaural changes in signal-to-noise ratio (SNR) resulting from directionally dependent filtering by the head and pinna. In contrast, signal detection at low frequencies appears to be based on binaural hearing because masked thresholds in binaural listening are substantially better than those obtained by listening monaurally through either ear (Gilkey and Good 1995
; Good et al. 1997
). Here, we focus on low frequencies, which are important for speech recognition and are often spared in hearing-impaired listeners.
The ability of the binaural system to improve signal detection at low frequencies has been studied extensively by psychophysicists using the binaural masking level difference (BMLD) paradigm (for a thorough review see Durlach and Colburn 1978
). In the most common BMLD experiments, identical noise is presented to both ears (N0), and the masked threshold for a signal (usually a pure tone) presented in phase at the two ears (S0) is compared with the threshold for a signal presented out of phase at the two ears (S
). These are the N0S0 and N0S
conditions, respectively, and the N0S
condition yields better thresholds because of the additional interaural phase cue. The difference in masked thresholds (the BMLD) between these two conditions is about 1215 dB for a low-frequency (
500 Hz) pure tone in broadband noise. For the case when the noise is antiphasic instead of the signal (i.e., N
S0 compared with N0S0), there is also a substantial, although slightly smaller BMLD of around 910 dB. Because there are no differences in signal-to-noise ratios at either ear between the BMLD conditions, the improvements in detectability must be attributed to the listener's ability to use the binaural system to exploit interaural timing cues.
A number of mathematical models are able to predict most psychophysical results on BMLDs (Colburn 1996
; Colburn and Durlach 1978
). Here, we focus on the cross-correlator model originally envisioned by Jeffress (1948)
and developed by Colburn (1973
, 1977a
,b
) because processing similar to the processing described by the model appears to occur in the medial superior olive (MSO) and because predictions of this model for the BMLD paradigm have been extensively studied. In this model, the stimulus waveform to each ear is processed by an auditory nerve fiber (ANF) model. Pairs of ANFs with the same characteristic frequencies (CFs) from each ear provide inputs to an array of delay lines and coincidence detectors, which fire when the neural inputs from the two sides coincide. The delay lines allow each coincidence detector to respond maximally to a different interaural time delay (ITD), its characteristic delay. Effectively, this model performs an instantaneous cross-correlation of the ANF responses, with each coincidence detector evaluating the correlation at its characteristic delay.
Colburn (1973
, 1977a
,b
) studied how a cross-correlator network responds to BMLD stimuli. Briefly, coincidence detector units with a characteristic delay of 0 (0-ITD units) maximally respond to the N0 stimuli, whereas the remaining units produce a weaker response. When a signal is added in phase (N0S0), the firing rate for 0-ITD units tuned to the tone frequency increases, allowing the signal to be detected. If the signal is instead added out of phase (N0S
), then the response of the 0-ITD units tuned to the tone frequency decreases because the signal decorrelates the inputs to the 0-ITD units at that frequency. This decrease in firing rate is more easily detected than the rate increase in the N0S0 case, thereby giving rise to the BMLD. In contrast, for the N
stimulus, units with characteristic delays equal to one half the period of their CF ("
-units") respond maximally. When the signal is added in phase (N
S0), it causes the response of the
-units tuned to the signal frequency to decrease as the signal reduces the correlation seen by the
-units. By looking at the response changes across the neural population, Colburn's model successfully predicts most psychophysical results for BMLDs, including the threshold hierarchy where the N0S
threshold is best, followed by N
S0 and then N0S0.
Although controversy exists over the precise neural mechanisms, the neurons in the MSO appear to perform a cross-correlation operation similar to the one hypothesized by Jeffress and Colburn (Batra and Yin 2004
; Batra et al. 1997
; Yin and Chan 1990
). As expected for a coincidence detector, units in the MSO receive binaural excitatory inputs, are sensitive to the ITD of tones and noise, and their best interaural phase can be predicted from the phases of the monaural responses (Batra et al. 1997
; Goldberg and Brown 1969
; Yin and Chan 1990
). ITD-sensitive units in the MSO form a major projection to the inferior colliculus (IC; Loftus et al. 2004
), where units with similar ITD sensitivity are also found (see Spitzer and Semple 1998
; Yin and Chan 1990
for a comparison between MSO and IC unit responses). Because essentially all of the ascending auditory pathways synapse in the tonotopically organized central nucleus of the IC, the IC represents the first nucleus after the cochlear nucleus that contains nearly all of the information available to the auditory system and is the first to receive inputs from binaural nuclei. This convergence of information makes the IC an interesting and convenient nucleus for studying the neural mechanisms of SRM. Furthermore, the responses of the units in the IC almost certainly reflect additional processing beyond that of the MSO, and this processing, whose nature is still poorly understood, is likely to be behaviorally relevant.
An extensive series of neurophysiological studies of BMLD (e.g., Jiang et al. 1997a
,b
; McAlpine et al. 1996
; Palmer et al. 1999
, 2000
) tested some of the predictions of the Colburn model for low-frequency units in the IC of anesthetized guinea pigs. Using 500-Hz pure tones in broadband noise, the authors showed that the masked thresholds for individual units in the anesthetized guinea pig vary with the interaural phases and interaural delays of the signal and masker (McAlpine et al. 1996
). Furthermore, they measured neural thresholds for the N0S0 and N0S
conditions for both individual units and populations of units (Jiang et al. 1997a
,b
). They showed that individual units could show positive or negative BMLDs, but that when averaged across their sample, the thresholds were better for the N0S
condition than for the N0S0 condition, similar to the psychophysical results. As expected from the Colburn model, the unit population with the best average thresholds had an excitatory response in the N0 condition and showed a decrease in firing rate when the antiphasic signal S
was added. Additionally, for a majority of neurons, the decreases in rate caused by the addition of S
were similar to the changes in response seen for noise when the interaural correlation is reduced (Palmer et al. 1999
), validating the general concept of the cross-correlator model. Finally, for a different unit sample, Palmer et al. (2000)
showed that the N
S0 thresholds averaged across all units were better than those seen for N0S0; however, the BMLDs were smaller than those seen with N0S
. Overall, the average unit N0S0 neural thresholds are worse than the N
S0 thresholds, which are worse than the N0S
thresholds, consistent with the psychophysical threshold hierarchy. Additionally, the responses of low-frequency IC units are generally consistent with the cross-correlator model. However, these authors reported that some units seemed to reflect the effects of additional inhibition; the responses of these units could not be entirely predicted by a simple cross-correlator model. Also, in some cases, the individual unit thresholds did not match the trends seen in the average unit thresholds.
The physiological studies of BMLD provide a nice link between the cross-correlator model and the behavioral results in BMLD experiments, but the BMLD paradigm is somewhat unnatural. First, the signal in BMLD experiments is a pure tone, which contains only one frequency and has a flat temporal envelope, whereas most natural sounds are broadband and have large amplitude modulations. Moreover, for the antiphasic condition used in the BMLD experiments, the stimulus is out of phase for all frequencies, essentially giving a different ITD for each frequency. This stimulus condition differs from the case when a broadband sound source is placed at a location off the median vertical plane, which gives a constant ITD across all frequencies. In this case, the constant ITD gives rise to a different interaural phase difference (IPD) for each frequency component. Figure 1F shows the sloping phase responses in the left and right ears for a stimulus placed at 90° for a spherical model of the cat head. Clearly, the different slopes resulting from the fixed ITD yield different IPDs at each frequency. Consequently, it is not entirely clear how the physiological BMLD results generalize to the more natural situation where broadband signals and maskers are placed at different spatial locations. Caird et al. (1991)
previously addressed some of these issues by looking at the effect of masker ITD on masked thresholds using either a pure tone or a vowel. The signal was placed at the best delay for each neuron, and the masked threshold was measured as a function of noise ITD. The masked thresholds were shown to reflect sensitivity of the neuron to the noise ITD, but the effects of placing the signal at different delays were not explored. Consequently, the effect of signal and masker separation was not directly addressed, nor was the effect of placing the signal at the worst delay, the condition predicted by the cross-correlator model to give the best thresholds.
|
| METHODS |
|---|
|
|
|---|
Responses of single units in the anesthetized cat inferior colliculus were recorded using methods similar to those of Litovsky and Delgutte (2002)
. Healthy, adult cats were initially anesthetized with an intraperitoneal injection of Dial in urethane (75 mg/kg), and additional doses were provided throughout the experiment to maintain deep anesthesia. Dexamethasone was injected intramuscularly to prevent swelling of neural tissue. A rectal thermometer was used to monitor the animal's temperature, which was maintained at 3738°C. A tracheal cannula was inserted, both pinnae were partially dissected away, and the ear canals were cut to allow insertion of acoustic assemblies. A small hole was drilled in each bulla, and a 30-cm plastic tube was inserted and glued in place to prevent static pressure from building up in the middle ear. The animal was placed in a double-walled, electrically shielded, sound-proof chamber. The posterior surface of the IC was exposed through a posterior fossa craniotomy and aspiration of the overlying cerebellum. Parylene-insulated tungsten stereo microelectrodes (Micro Probe, Potomac, MD) were mounted on a remote-controlled hydraulic microdrive and inserted into the IC. The electrodes were oriented nearly horizontally in a parasagittal plane, approximately parallel to the isofrequency planes (Merzenich and Reid 1974
). To improve single-unit isolation, the difference between the signals recorded from the two electrodes, which were separated by 125 µm, was often used as the input to the amplifier and spike timer. Spikes from single units were amplified and spike times measured with 1-µs resolution were stored in a computer file for analysis and display.
Histological processing for reconstruction of an electrode track was performed for one cat with a particularly large data yield. Only one track was made in this experiment. Every third 40-µm parasagittal section of the IC was immunostained for calretinin to visualize putative projections from the MSO (Adams 1995
), and the remaining sections were Nissl-stained. Staining for calretinin is thought to reveal terminals of MSO axons because the MSO is the only auditory structure projecting to the IC in which calretinin labeling is extensive. The electrode track was evident in the Nissl slice and one of the calretinin slices, and the track traversed the calretinin region. The microelectrode depths at which we found units indicate that we were recording from the calretinin region, suggesting that these units received inputs from the MSO. The other experiments had similar electrode placements and single-unit responses including ITD sensitivity. Therefore most of the units in our sample are likely to receive MSO inputs, as expected from the anatomical results of Loftus et al. (2004)
showing large projections of MSO to the low-frequency IC.
Stimuli
The signal used was a 200-ms-long train of broadband chirps with a 40-Hz repetition rate presented in continuous broadband noise (see Fig. 1, A and C). Each chirp's frequency was swept from 300 Hz to 30 kHz logarithmically and had an exponentially increasing envelope designed to produce a flat power spectrum. Consequently, both signal and noise had a relatively flat spectrum (Fig. 1, B and D), from 300 Hz to 30 kHz, before they were shaped by the frequency response of the head-related transfer functions (see following text). In some cases, we also used 100-Hz click trains as signals similar to the stimuli used in the psychophysical literature on SRM (e.g., Gilkey and Good 1995
; Saberi et al. 1991
); however, units in the IC often responded with higher, more sustained rates to the chirp trains, presumably because of the lower repetition rate of the chirp train. Only results obtained with the 40-Hz chirp trains are presented here.
Because SRM occurs for stimuli in all frequency ranges (Gilkey and Good 1995
), we use head-related transfer functions (HRTFs) to simulate sounds at different azimuths. Using HRTFs allows us to simulate sounds in the free-field while still allowing complete control over the inputs to the two ears, thereby enabling us to easily present more traditional stimuli, such as monaural stimuli or binaural beats. The HRTFs represent the directionally dependent transformations of sound pressure from a specific location in free field to the ear canal (see Fig. 1, E and F). Virtual-space stimuli were synthesized by filtering the stimuli with the same HRTFs used by Litovsky and Delgutte (2002)
. The nonindividualized cat HRTFs were measured by Musicant et al. (1990)
for frequencies >2 kHz and were simulated by a spherical-head model for frequencies <2 kHz. (The HRTF measurements were valid only for frequencies >2 kHz because of the limitations of the sound system and anechoic room.) The low-frequency HRTFs were the product of two components: 1) a directional component representing acoustic scattering by the cat head was provided by a rigid-sphere model with a diameter of 6.8 cm (Morse and Ingard 1968
); and 2) a nondirectional, frequency-dependent gain representing the sound pressure amplification by the external ear was derived from measurements of acoustic impedance in the cat ear canal (Rosowski et al. 1988
). Using a frequency-dependent weighting function, the model HRTF for frequencies <2 kHz was joined with the measured HRTF >2 kHz to obtain an HRTF covering the 0 to 40-kHz range.
This paper focuses on low-frequency neurons that are sensitive to ITD, the primary sound localization cue at low frequencies. Consequently, the spherical-head model provides most of the information in the HRTF for this work. Here the phase response of the HRTF was nearly a straight line for all azimuths, as expected for a pure delay, and the magnitude of the HRTF was relatively constant for different azimuths at these low frequencies (see example in Fig. 1, E and F). We expect our results to be similar to those obtained if only ITD was varied, provided the stimuli were appropriately shaped by the nondirectional HRTF magnitude (see RESULTS). However, the use of HRTFs would allow the present study of SRM to be easily extended to high frequencies in the future.
Experimental procedure
Search stimuli were either 200-ms chirp trains or broadband noise bursts. Both the azimuth and the mode of stimulation (binaural or monaural) of the search stimulus were varied in an effort to find a larger number of units and a more varied sample. Once a single unit was isolated, a frequency-tuning curve was measured by an automatic tracking procedure (Kiang and Moxon 1974
) to determine the characteristic frequency (CFTC).
A noise-delay function was also measured: the unit response was measured as a function of the ITD of 200-ms bursts of "frozen" noise (Fig. 2A, solid line with error bars). The ITD was usually varied from 2,000 to 2,000 µs with a step size of 400 µs, although ITDs inside the physiological range (290 to 290 µs as determined using our HRTFs) were often sampled more finely.
|
|
A unit was included in this study if it had a low CFTC (
2.5 kHz), gave a sustained response to chirp trains at some signal azimuth, and was sensitive to ITD. We considered a unit ITD sensitive if the noise-delay function was modulated by
50% (i.e., if the minimum discharge rate was less than half of the maximum rate). We measured the rate in a window that began 5 ms after the onset of the 200-ms noise burst and lasted 190 ms.
To determine the best ITD and best frequency (BFITD) for each unit, we fit the noise-delay function with a Gabor function (McAlpine and Palmer 2002b
), which is a sinusoid with a Gaussian envelope
![]() |
To facilitate comparisons between the responses of units having different CFs and best ITDs, we define a relative IPD (
) in cycles by the equation
![]() |
![]() |
In a few cases (four out of 31) for which the noise-delay functions were not sampled finely enough, the Gabor fit predicted best and worst azimuths at obviously incorrect locations. In these cases, the best ITD and BFITD were adjusted manually to give appropriate best and worst ITDs to match the best and worst azimuths.
Masked thresholds
To obtain neural thresholds that can be directly compared with psychophysical thresholds, which are based on a percentage correct criterion near 75%, masked threshold was defined as the lowest signal-to-noise ratio (SNR) at which the signal can be detected for 75% of the stimulus repetitions. Two different response metrics, mean rate and synchronized rate, were used to define detection thresholds. Mean rate is simply the number of spikes in the measurement window (from 5 to 195 ms post-stimulus onset), and the synchronized rate (Kim and Molnar 1979
) is the Fourier component of the peristimulus time histogram at the signal repetition rate, 40 Hz. The synchronized rate, which is also the mean rate multiplied by the synchronization index or vector strength (Goldberg and Brown 1969
), contains information about the spike timing as well as the number of spikes. Figure 3 shows both the mean rate (third row) and the synchronized rate (fourth row) as a function of noise level for one unit. The 200-ms chirp-train signal was held at 43 dB SPL, and the locations of the signal and the masker differ for each column of panels. To determine the masked threshold, we calculate the percentage of stimulus presentations for which the detection metric (mean rate or synchronized rate) is greater in the signal-plus-noise window compared with the noise-alone window (Fig. 3, bottom row). To improve the reliability of threshold estimates, the percentage-correct values were converted to z-scores by a Gaussian transform (Green and Swets 1974
), smoothed with a three-point triangular filter, and then converted back to a percentage value. Because a signal can be detected through either an increase or decrease in rate (Jiang et al. 1997b
), thresholds (circles in Fig. 3, bottom row) can occur when the percentage curve crosses either 75% (an increase in rate, see dashed lines) or 25% (a decrease in rate). This criterion gives the highest noise level or, equivalently, the lowest SNR, where the signal can still be detected 75% of the time.
We determined confidence intervals for the masked thresholds using bootstrapping methods (Efron and Tibshirani 1993
). For each noise level, we sample the responses to each stimulus presentation with replacement, obtaining a new, "bootstrapped" set of spike trains. We then recompute the percentage curves for the new set of spike trains and recalculate the thresholds. The threshold is recomputed in this way 100 times. The error bars for the masked thresholds are then the range between the 10th and the 90th percentiles. Reliably estimating the thresholds was difficult because the percentage curves could be nonmonotonic, especially when the signal suppressed the noise response. Consequently, we required that, for a 25% threshold to be accepted, the signal had to decrease the overall rate below 25% for 80 out of 100 of the bootstrapped percentage curves. If this criterion was not met, then the 75% threshold was used. This requirement eliminated very low threshold SNRs that occurred as a result of spurious estimates of percentage-correct points. For the actual threshold estimate, we took the median of all the bootstrapped percentage curves and determined the threshold for this median curve.2
| RESULTS |
|---|
|
|
|---|
When azimuth is varied, other localization cues present in the HRTFs (interaural level differences and spectral cues) vary as well as ITD. For the majority of our units (n = 19), we compared the response for changes in noise azimuth and changes in noise ITD. Figure 2B shows one unit's rate response as a function of both noise azimuth (solid line) and noise-ITD (dashdot line). The two responses were similar (Fig. 2B), although the rate was higher for the ITD-only condition. For the 19 units in our sample for which this measurement was taken, Fig. 2C compares the rate when only ITD was varied to the rate when the noise azimuth was varied. Except for very low discharge rates, the two responses are similar for all of these units, indicating that ITD largely determines these units' azimuth sensitivities.3
Dependency of single-unit masked thresholds on signal and masker azimuths
Based on previous physiological results (Caird et al. 1991
; McAlpine et al. 1996
), we expect that masked thresholds would change with signal and masker azimuth. Fig. 3 shows a typical unit's responses to the signal in noise and the noise alone as a function of noise level for three signal and masker configurations. The unit had a BFITD of 740 Hz and a best ITD of 290 µs, which corresponds to +90°; because the unit's worst ITD (about 380 µs) was outside the physiological range, its worst azimuth was 90°.
The first row sketches the three signal and masker configurations: signal and masker co-located at +90° (column A, S90, N90); signal at +90° and noise at 90° (column B, S90, N90); and signal at 90° and noise at +90° (column C, S90, N90). The second row in Fig. 3 shows the temporal discharge patterns for the signal-plus-noise interval (S +N) and the noise-alone interval (N) as a function of noise level. In these dot rasters, every dot represents a spike, and the solid lines separate the blocks of stimulus presentations for each noise level. As the noise level is raised, the signal response can be either overwhelmed by the noise response (excitatory or "line-busy" masking, columns A and C) or suppressed by the noise (suppressive masking, column B).
These rasters show a wide variety of potential cues for detecting the signal in noise. The types of cues available depend on the signal and masker configuration. For the signal at +90° in low-level noise (Fig. 3, columns A and B), the unit shows a highly synchronized response to the 40-Hz repetition rate of the chirp train. For this signal azimuth, the response to the signal plus noise is always greater than the noise-alone response. In contrast, the response to the signal at 90° (column C) is much weaker, consisting of only an onset response at the lowest noise level. At moderate noise levels for S90, the signal suppresses the noise response, and a weak response at the signal repetition rate can be discerned, possibly reflecting a recovery from suppression during the silent periods between individual chirps in the train (see Fig. 1A). The signal can also alter the distribution of spike arrival times without causing a change in mean firing rate (column A). It is possible that any or all of these cues could be used to detect the signal, and an optimal central processor would use the best combination of cues, perhaps through the use of a signal template. Because we had only a few stimulus presentations for each stimulus condition, developing a reliable signal template was not feasible; instead, we chose to detect the signal through more traditional methods involving changes in mean rate and spike synchrony.
In the following, we first present all the results for thresholds based on mean rate and then discuss how synchronized rate thresholds differ at the end of the RESULTS section. As described in METHODS, the rate-based masked threshold is the highest noise level (or lowest signal-to-noise ratio) where the signal can still be detected 75% of the time, based on either an increase (75% mark in Fig. 3, row 5) or a decrease (25% mark in Fig. 3, row 5) in mean rate. The rate thresholds for the unit of Fig. 3 (shown as circles in the rows) differ substantially for the three signal and masker configurations. Specifically, the threshold for the co-located condition S+90, N+90 (column A) is about 18 dB poorer than the threshold for S+90, N90 (column B), and the threshold for S90, N+90 (column C) falls between the two despite the weak signal response in this case. In column C, the signal causes an increase in rate at the lowest noise levels; then, once the noise level is raised a few decibels, the signal can be detected through a decrease in rate (see dot raster; the signal's presence is shown by the suppression of the noise response). The signal can still clearly be detected for noise levels >49 dB (as shown in the dot raster), making about 52 dB the masked threshold. It is apparent from this example that by only allowing the signal to be detected through increases in rate (the 75% mark), the threshold signal-to-noise ratios for individual neurons would be systematically overestimated (Jiang et al. 1997
).
Figure 4 (top) shows rate-based masked thresholds as a function of noise azimuth for the unit in Fig. 3 for four different signal azimuths. When the signal is at either +45 or +90°, moving the noise away from the signal to the ipsilateral side (negative azimuths) improves thresholds by
20 dB. When the signal is at 0°, thresholds also improve as the noise is moved away from the midline to the ipsilateral side, but they become slightly worse as the noise moves to the contralateral side (positive azimuths). For these three signal locations (S90, S45, and S0), the worst thresholds occur when the noise is placed near +90°, regardless of the signal location. However, when the signal is placed at 90°, the pattern is different: the thresholds increase slightly and then decrease as the noise is moved away from the signal.
|
To test the effect of signal and masker separation on the thresholds for all of our units, we determined the worst threshold for each unit and examined how this worst threshold relates to the signal and masker locations. Figure 5 shows the noise azimuth that gives rise to the worst threshold, the "worst-threshold noise azimuth," as a function of both signal azimuth (top) and the unit's best azimuth (bottom, defined as the azimuth with the relative IPD nearest to 0 within the physiological range). If separation improved thresholds, then the worst threshold should occur when the signal and noise are at the same azimuth, i.e., the worst-threshold noise azimuth and the signal azimuth should be the same. Contrary to this prediction, the correlation between a unit's worst-threshold noise azimuth and the signal azimuth is very low (0.15) and is not significant (P = 0.2, two-sided t-test, n = 68). Thus the worst thresholds do not necessarily occur when the signal and the masker are co-located. In contrast, the correlation between the worst-threshold azimuth and the best azimuth is much higher (0.57) and is highly significant (P < 0.001), indicating that strong excitation by the masker tends to produce poor masked thresholds. Consequently, the individual unit responses do not show a correlate of spatial release from masking, consistent with previous BMLD studies. However, as suggested by the previous results (e.g., Caird et al. 1991
; Colburn 1973
, 1977a
,b
; Jiang et al. 1997a
,b
), a neural correlate of spatial release from masking may still exist in the response of a population of ITD-sensitive neurons.
|
To test the hypothesis that the population of low-frequency, ITD-sensitive units is sufficient for explaining spatial release from masking at low frequencies, we defined a population threshold based on the "lower-envelope principle" (Parker and Newsome 1998
). Specifically, for each signal and noise configuration, the population threshold is the best single-unit threshold in our sample of ITD-sensitive units. The top row of Fig. 6 shows both the individual mean-rate thresholds for all the units in our sample (dot-dash lines) and the population thresholds (thick solid lines) as a function of noise azimuth for three signal azimuths (arrows). The bottom row shows the synchronized rate thresholds, which are discussed later. Unlike the single-unit thresholds, the population thresholds do show a correlate of spatial release from masking in that they generally improve when the signal and noise are separated. Clearly, the curves do not show perfect spatial release from masking: for example, the thresholds do not improve for the signal at 45° when the noise azimuth is >45°, and the improvement for the signal at 0° is not symmetric with respect to the midline. It is not obvious whether obtaining a larger sample of neurons would improve the correlate for the S45° condition, but as units in the opposite IC are expected to have mirror-imaged threshold curves, incorporating units from both ICs would almost certainly eliminate the asymmetry in population thresholds for S0° (see following text). Overall, it seems that the combination of all the unit responses, each with a different azimuth preference, allows for a correlate of spatial release from masking to emerge in the population response.
|
|
Having identified a neural correlate of SRM in the population response of ITD-sensitive units, we now focus on whether the responses of these units to SRM stimuli can be predicted by a cross-correlator model similar to the one described by Colburn (1973
, 1977a
,b
). The example unit in Fig. 3 shows that, depending on the stimulus configuration, the signal can be detected through either an increase (A, B) or a decrease (C) in rate over the noise-alone response. Furthermore, masking can arise from the noise either suppressing (B) or overwhelming (A, C) the signal response. To test whether this diversity of responses is qualitatively consistent with the cross-correlator model, we implemented a simple cross-correlator model with parameters that matched the CF and best ITD for the unit in Fig. 3 (see caption of Fig. 8 for implementation details). The model response is shown in Fig. 8 for comparison with the unit's rate response in row 4 of Fig. 3.
|
Effect of noise on signal response depends on noise azimuth
To test whether the units' behavior is quantitatively consistent with the cross-correlator model, we define two metrics: one that characterizes how the noise masks the signal response and one that characterizes the effect of the signal on the noise response. The first metric, the "masking type index" (MTI) quantifies whether the noise masker overwhelms or suppresses the signal response at threshold. The MTI is the difference between the signal-in-noise rate at threshold, R(S + NTh), and the approximate signal-alone rate, R(S), the signal response with the noise at the lowest level. This difference is then normalized by whichever of the two rates is larger
![]() |
To test the predictions of the cross-correlator model for our entire sample of units, we examine how the MTI depends on noise azimuth for all of the units. The model predicts that, for the masker at a favorable azimuth, the number of coincidences increases with noise level to produce excitatory masking (MTI >0). For the noise at unfavorable azimuths, we expect the noise to decorrelate the signal response and produce suppressive masking (MTI <0), providing the signal response is sufficiently strong. We plot the MTI as a function of both the noise azimuth (Fig. 9A) and the noise relative IPD (Fig. 9B) for all the units in our sample. We show results only for favorable signal azimuths (|
s| <0.1) to see the effect of the masker on a strong signal response. When the noise is in the ipsilateral hemifield (negative azimuths), the masking is usually suppressive (MTI near 1). However, noise in the contralateral hemifield (positive azimuths) can mask through either excitation or suppression. This dependency of MTI on noise azimuth may arise from the fact that most of the units have their best azimuths on the contralateral side. To test this possibility, Fig. 9B replots the MTI as a function of noise relative IPD
n, thereby normalizing for differences in best azimuths across units (see METHODS). By definition, favorable azimuths have relative IPDs near 0, whereas unfavorable azimuths have relative IPDs near 0.5. Figure 9B shows that the MTI across the population changes abruptly around
n = 0.25: when the noise is at an unfavorable azimuth (
n < 0.25), the masking is always suppressive, as expected; however, when the relative IPD of the noise is favorable (
n > 0.25), the masking can be either excitatory or suppressive, despite the fact that the signal and masker are both at favorable azimuths. It seems that the masker can reduce the overall rate even when it is placed at a "favorable" azimuth, contrary to the predictions of a simple cross-correlator model. Because the signal and the masker have similar spectra, effects such as lateral (cross-frequency) inhibition or cochlear suppression are not likely to explain this result. Instead, this result suggests that additional processing beyond cross-correlation, probably some type of temporal processing, affects the relative responses to the signal and the noise in some units. In the DISCUSSION, we propose a likely candidate for such additional processing.
|
The second metric used to compare the neural responses to the predictions of the cross-correlator model is the "signal effect index" (SEI), which characterizes the effect of the signal on the noise response. The SEI is again a normalized difference, this time between the S + N rate, R(S + NMax), and the N rate, R(NMax), at the noise level NMax where the signal causes the largest change in rate
![]() |
![]() |
Figure 9 shows the SEI as a function of both signal azimuth (Fig. 9C) and the signal relative IPD (Fig. 9D). Only results for favorable noise azimuths (|
n| <0.1) are shown so that the effect of adding the signal can be reliably evaluated. When the signal is near the midline or in the contralateral hemifield (positive azimuths), it is detected through an increase in rate in most cases (108 out of 122 thresholds in Fig. 9C; many of the points are plotted on top of each other, especially near 1). The median SEI (solid line) is near 1 in these cases. In contrast, when the signal is placed at 90°, it is usually detected through a decrease in rate (11 out of 14 cases). The SEIs never reach 1, but are usually near 0.5, indicating that the signal does not completely suppress the noise response. In Fig. 9D, the SEI is replotted against the signal relative IPD to normalize for cross-unit differences in best ITD and CF. Placing the signal at a favorable azimuth (
s > 0.25) almost always increases the overall rate (106 out of 122 thresholds), as expected, but occasionally decreases the rate. Signals at unfavorable azimuths (
s < 0.25) decrease the overall rate, as expected, in a majority of cases (nine out of 14), but can also increase the rate in some cases. When combined with the MTI results, these results suggest that, whereas the cross-correlator model gives useful predictions for many units, some additional processing is affecting the relative rates to the signal and the masker in a substantial fraction of the units.
Best thresholds occur for signal at best azimuth
For a majority of the units in our sample, +90° is near the best ITD, and 90° is near the worst ITD. For such units, placing the stimuli at these azimuths makes the neural inputs arrive as near to in phase or as near to out of phase as possible inside the physiological range. Therefore placing the signal and noise at +90° and 90° in various combinations is analogous to the well-studied N0S0, N0S
, and N
S0 conditions for units having their best ITD near 0 (0-ITD units). In the modeling studies by Colburn (1973
, 1977a
,b
), the 0-ITD units were the most sensitive to changes in interaural correlation in the traditional BMLD conditions (N0S0 compared with N0S
). Specifically, the in-phase conditions (N0, S0) for the 0-ITD units are similar to placing the stimulus at the best azimuth in this study because the inputs from the two ears would arrive in phase at the coincidence detector. The out-of-phase conditions (N
, S
) for a 0-ITD unit are similar to placing the stimulus at the worst azimuth for our units because the inputs would arrive nearly out of phase. The psychophysical thresholds for the N0S
condition are better than the N
S0 for a wide variety of signals, and both thresholds are better than those for the N0S0 condition (Durlach and Colburn 1978
). Using a 500-Hz pure-tone signal, Jiang et al. (1997)
found a correlate of this threshold hierarchy in the average thresholds of IC units. Furthermore, as predicted by the Colburn model, they showed that for the majority of units, the N0S
neural thresholds were better when adding the signal decreased the overall response, indicating that the best thresholds occur when the signal decorrelates the noise response. If these results could be extended to our experiments, one would expect the best thresholds to occur when the signal is placed at the worst azimuth, usually near 90°, and the noise is placed at the best azimuth, usually near 90°, so that the signal is detected by decorrelating the noise response.
Figure 10 shows the thresholds plotted against CF for 11 units in three animals for which we measured responses for the signal and the noise on opposite sides of the head. Each unit's response was tested with the signal near the best azimuth and the noise near the worst (S90, N 90, white squares) as well as the signal near the worst azimuth and the noise near the best (S90, N90, black circles). The thresholds for the signal and masker co-located near the best azimuth at 90° (S90, N90, x's) are also shown for the same units. For all the units, the thresholds for the signal placed near the best azimuth, the condition most like N
S0, are always at least as good as the thresholds for the signal placed near the worst azimuth, the condition most analogous to N0S
(white squares are always lower than black circles). This relationship is the reverse of the one expected from previous psychophysical and physiological studies of BMLD with pure-tone signals. The S90, N90 thresholds are always better than the co-located thresholds as expected from the BMLD psychophysical and physiological results (white squares are always lower than x's); however, the S90, N90 thresholds, which might be expected to be the best thresholds overall, are not necessarily even as good as the co-located thresholds (x's are sometimes lower than black circles). It should be noted, however, that we have biased our results somewhat by selecting only neurons that gave a sustained response to the chirp at some azimuth; searching for units that showed a response to the signal by suppressing the noise response would be difficult at best. Nevertheless, in contrast to previous findings and model predictions, the best thresholds for these stimuli do not seem to occur when the signal decorrelates the noise response, but rather when the signal correlates the anticorrelated noise response.4
|