|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1Department of Biomedical and Chemical Engineering and Institute for Sensory Research and 2Department of Electrical Engineering and Computer Science, Syracuse University, Syracuse, New York
Submitted 27 July 2006; accepted in final form 25 October 2006
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Basic psychophysical envelope-processing tasks have received renewed attention recently, caused in large part by the success of a model of the "effective" signal processing of the auditory system (Dau et al. 1997
) in predicting behavioral data that are difficult to interpret unless one assumes the existence of a bank of filters tuned in the amplitude modulation (AM) frequency domain (Bacon and Grantham 1989
; Dau et al. 1997
; Ewert and Dau 2000
; Houtgast 1989
; Kay 1982
). Such a conceptual framework is fundamentally different from the assumptions of earlier models of AM perception, which describe the putative central processor as a low-pass filter (Viemeister 1979
).
Physiological studies of responses to AM provided some of the motivation for psychophysical studies of frequency selectivity in the AM domain (Creutzfeldt et al. 1980
; Langner and Schreiner 1988
), but specific hypotheses concerning the relationships between physiological responses and perceptual signal-processing models have not been adequately examined. One possibility is that single neurons in the auditory midbrain function as modulation filters. A qualitative scan of the relevant literature suggests this may be a reasonable hypothesis, because many neurons in the inferior colliculus (IC) systematically change their responses with variations in modulation frequency (fm) and modulation depth (m) (Krishna and Semple 2000
; Langner and Schreiner 1988
; Mueller-Preuss et al. 1994
; Rees and Moller 1983
). The next portion of this introduction provides a more detailed description of the published physiological responses to AM that are relevant to this hypothesis, with a focus on the gaps in evidence that the current experiments were designed to fill.
Neural representations of sounds with dynamic temporal envelopes change dramatically as the auditory neuraxis is ascended. Much of our understanding about this transformation comes from studies of physiological responses to stimuli with systematically varied AM frequencies. From this body of work, a reasonably consistent picture has emerged: peripheral neurons seem to carry envelope-frequency information in a temporal (phase-locked code), with average rates that do not change with stimulus fm (Joris and Yin 1992
). More central neurons, in contrast, often exhibit average firing rates that are strongly dependent on fm and a reduced ability to synchronously follow faster fluctuations (for a review, see Joris et al. 2004
). This frequency-focused description of neural responses to AM leaves a fundamental issue pertaining to the relationships between physiology and psychophysics unclear. Specifically, modulation transfer functions (MTFs) only provide information about responses to a single (usually high) modulation depth; as a result, direct comparisons to behavioral data are difficult because the goal of much of the relevant psychoacoustics is to determine the smallest detectable or discriminable m. A major objective of this study was to obtain neural responses to stimuli with a wide range of modulation depths, including depths near psychophysical AM detection thresholds.
A few studies have reported physiological responses to variations in stimulus modulation depth. In auditory-nerve fibers (ANFs) and most ventral cochlear nucleus (VCN) units, synchronization to the envelope increases monotonically with depth, with average rates that are largely depth-independent (AN: Joris and Yin 1992
; VCN: Rhode 1994
). Such generalizations cannot be made about AM-depth processing at higher levels of the central auditory system because of the striking response diversity. For instance, in the superior olivary complex (SOC), changes in the response with m are strongly correlated to the units pure-tone response properties. Sustained pure-tone responders in the periolivary nuclei of the SOC tend to be similar to ANFs in terms of the shape of rate- and synchrony-modulation depth functions (rMDFs and sMDFs), whereas offset responders exhibit monotonically increasing rMDFs and saturating sMDFs with narrow dynamic ranges (Kuwada and Batra 1999
).
Krishna and Semple (2000)
provided the most complete account of neural responses at the level of the IC across a range of modulation depths. They found that average firing rates in many cells varied monotonically with m, especially near the cells preferred modulation frequency. The change in rate could be an increase or a decrease with m, depending on the presence of regions of excitation and suppression in the cells rate modulation transfer function (rMTF) and their relationship to the chosen stimulus modulation frequency. Temporal response patterns also changed with m in their study. A minimum depth was required to elicit significant synchrony in individual cells; this value ranged from as low as 10% (the lowest depth tested) in some of the neurons to 70% in others (Krishna and Semple 2000
). Changes in vector strength above the minimum m were less stereotypical: synchrony in some neurons varied over a wide range of depths, but it remained constant in most cells. Results from other (less systematic) studies are in qualitative agreement with the single-unit IC modulation-depth dependence description of Krishna and Semple (Mueller-Preuss et al. 1994
; Nelson et al. 1966
; Rees and Moller 1983
). Relatively little is known about cortical responses to variation in m; Eggermont (1994)
and Liang et al. (2002)
measured tone-carrier modulation transfer functions (MTFs) in primary auditory cortex neurons at several depths from 25 to 100% and concluded that 1) MDFs were monotonic and 2) neural best modulation frequencies were essentially independent of m.
A survey of this previous work allows for a qualitative description of neural responses to variations in modulation depth, but a direct and quantitative comparison of physiological responses at any level of the pathway to basic psychophysical AM detection and discrimination performance is still lacking. Two requirements for such a comparison to be made are met in this study. First, the stimulus parameter space used in the physiology was designed to match that of the psychophysics. Specifically, m was varied from below detection threshold to 100%, in some cases using step sizes smaller than the behavioral just-noticeable difference (jnd). Second, a description of the statistical variability of the neural responses was included to quantify the significance of small changes in a given response metric (e.g., average rate and synchrony). In addition, because most naturally occurring sounds have complex modulation spectra, and recent models of envelope processing were developed based on masked AM detection paradigms, a similar rate- and synchrony-based analysis was applied to responses elicited by a sinusoidal signal modulation embedded in a competing masker modulation.
The IC is an inherently interesting nucleus in which to study AM processing. Structurally, it occupies a critical position in the subcortical processing pathway, as an almost obligatory ascending synapse (Aitkin and Phillips 1984
; Malmierca et al. 2002
; Ramon y Cajal 1904
) and a receiving station for both inhibitory and excitatory inputs converging from afferent (Warr 1982
; Winer et al. 1995
), efferent (Winer 2004
), and intrinsic and commissural connections (Saldana and Merchan 2004
). In contrast to the established anatomical description of the IC and its connections, the functional representation of modulated sounds in the IC is still a matter of debate (Joris et al. 2004
), but it is clear that both magnitude (rate) and phase (synchrony) information is present in the responses of single neurons (Krishna and Semple 2000
; Langner and Schreiner 1988
; Rees and Moller 1983
). Thus the IC apparently plays a transitional role between the temporal representation of AM in the periphery (Joris and Yin 1992
) and a more rate-based code in the cortex (Liang et al. 2002
).
Here, we show that changes in the average firing rates of single IC neurons in the awake Dutch-belted rabbit are generally poor predictors of human behavioral performance in psychophysical AM detection tasks. Synchronization to the envelope, on the other hand, can emerge and change at modulation depths much closer to psychoacoustical thresholds. At suprathreshold depths, the situation is different; changes in average rates can, in some neurons, account for psychophysical sensitivity in masked AM detection and AM depth discrimination.
| METHODS |
|---|
|
|
|---|
AM responses in the IC were obtained from 198 cells in three unanesthetized female Dutch-Belted rabbits (oryctolagus cuniculus). All procedures were approved by the Syracuse University Institutional Animal Care and Use Committee and conformed to National Institutes of Health guidelines and protocols. Our preparation was developed based on techniques used in several previous studies of the awake rabbit IC and superior olivary complex (Batra et al. 1989
; Kuwada et al. 1987
). Before recordings began, two separate aseptic surgeries were performed to allow for chronic access to the midbrain in daily 2-hour recording sessions. In both procedures, the animals were anesthetized with ketamine (66 mg/kg) and xylazine (2 mg/kg) delivered intramuscularly, and supplemental doses were administered to maintain areflexia.
In the initial surgery, a 15-mm inner diameter stainless steel cylinder and brass headbar (aligned parallel to the sagittal suture) were centered on the midline and affixed to the exposed skull with dental acrylic and screws. The rostral edge of the cylinder was aligned with bregma and a wall of dental acrylic was built up under the posterior side of the cylinder to compensate for the slope of the skull.
Each animal was given several weeks to recover from the first surgery before it was gradually adapted to sitting in the recording chamber and exposed to auditory stimuli. The rabbit was restrained with a snug blanket around the body and placed in a plexiglass chair positioned in front of a clamp used to fix the headbar. Daily sessions were increased in duration over the course of 23 wk until the animal was acclimated to sitting quietly for 2 h.
A small (
34 mm diameter) craniotomy was made in the skull in the second surgery. The medial edge of the hole was
2 mm lateral of the midline, and the rostral edge was slightly forward of the middle of the cylinder. The exposed dura was rinsed with sterile saline and treated with a topical antibiotic (Bacitracin), and the cylinder was filled with a sterile silastic elastopolymer cap between sessions. A 1- to 2-day recovery period was allowed before removing the silastic cap and attempting electrode penetrations. Additional craniotomy surgeries were occasionally performed to extend the existing hole or to provide access to the opposite IC. After each of the anesthetized surgeries, Banamine (flunixin meglumine, 1 mg/kg) was administered as an analgesic.
After every session, new dural scarring was removed with forceps before reapplying Bacitracin and filling the cylinder with the polymer plug. In one rabbit, the dura was also treated with an anti-mitotic compound (5-fluorouracil; 25 mg/ml saline) before sealing the cylinder to discourage scar tissue from forming between sessions (Spinks et al. 2003
). Using these daily cleaning techniques, recording sessions yielded reasonable success rates in a single IC for 36 mo.
Acoustic stimuli
Sound stimuli were generated digitally and converted to analog signals using a Tucker-Davis System II D-A converter (TDT DA 34). The stimuli were filtered at 20 kHz (TDT FT6) and attenuated (TDT PA4), before being passed to a headphone buffer (TDT HB6) and finally to a pair of Beyer-Dynamic speakers (DT-48). The speaker outputs were delivered through custom-made soft plastic (Hal-Hen Per-form) earmolds, and a probe tube allowed daily calibration of the closed acoustic system before each session with an Etymotic ER-7C probe microphone system. Calibration tables based on the frequency shaping introduced by the system to wideband (100 Hz to 20 kHz) noises were used to determine the attenuation values required to describe the sound levels in dB SPL (dB re: 20 µPa). Monaural (usually contralateral) or diotic stimuli were presented, depending on the properties of each individual unit (see response classification below).
Recording methods
Single-unit extracellular responses were recorded using glass-insulated tungsten microelectrodes (Bullock et al. 1988
). Electrode impedances between 10 and 30 M
measured at 135 Hz were usually required for the successful isolation and holding of neurons, but measures of impedance were only marginally reliable as predictors of electrode performance. The electrode signal was amplified (Grass Instruments), filtered (700 Hz to 3 kHz), and AC-coupled with a TDT PC1 spike conditioner before being passed to a spike discriminator (TDT SD1) and event timer (TDT ET1). Isolated spike times were recorded with respect to a stimulus onsettriggered reference with a resolution of ±10 µs.
Before lowering the electrode, a topical anesthetic (Lidocaine) was applied to desensitize the dura. The position of the electrode was set with a stereotaxic system (Edmund), which was mounted on the cylinder affixed to the rabbits skull. A sharply beveled, sterile, stainless-steel guide tube (23xx gauge) was used to pierce the dura and protect the electrode tip. The guide tube was lowered by hand until its sharpened end was
23 mm from the proximal (dorsal) surface of the IC. From there, the electrode was lowered independent of the guide tube until a unit was isolated. Because of the tonotopic organization of the structure (low frequencies were encountered at shallower depths) and the limited recording time, the distribution of best frequencies (BFs) of the neurons described here was biased toward lower frequencies (94% of the population had a BF <10 kHz). Electrodes were advanced from outside the double-walled soundproof booth with a hydraulic microdrive (Kopf Instruments, Tujunga, CA). Stimulus presentation, on-line data analyses, and video monitoring of the animal were also controlled from outside the booth.
At the conclusion of the recordings in each rabbit, electrolytic lesions were made in the approximate center of the three-dimensional coordinates that described the spatial distribution of the population of well-studied neurons from that specific IC. Standard histological techniques were used to confirm that the recording sites were likely within the central nucleus of the IC (ICC). However, the prolonged duration of recording from each IC made it impossible to definitively state that every unit was positioned within the ICC.
Response classification and analysis
Parameters of AM stimuli were designed for each neuron based on its responses to a battery of simpler sounds. Specifically, to study a cells sensitivity to changes in modulation depth, it was necessary to determine the appropriate binaural configuration, tone carrier frequency, sound-pressure level (SPL), and modulation frequency. This section describes the stimuli and response quantifications used to make those decisions.
SEARCH STIMULI AND INITIAL CHARACTERIZATION. To search for driven activity, a 500-ms Gaussian wideband (10010,000 Hz) noise with 10-ms cos2 ramps was presented binaurally every 1.5 s. The interaural time difference (ITD) for each presentation was randomly chosen from a uniformly distributed range from 300 (contralateral ear leading) to +300 µs (ipsilateral ear leading) in steps of 100 µs, and the spectrum level of the noise was typically fixed at a level between 5 and 20 dB SPL.
Once a unit was encountered and isolated using the search stimulus, its binaural configuration preference (contralateral, ipsilateral, diotic, or silence) was quantified by counting the number of spikes elicited by each configuration in response to five repetitions of a 10-dB SPL spectrum level (50 dB SPL rms), 500-ms noise (or silent interval) presented once per second. The bandwidth of the noise was the same as the search stimulus. Next, the units BF and threshold were estimated by manually controlling the frequency and level of 100-ms pure tones (10-ms cos2 ramps) separated by 500-ms interstimulus interval (ISI).
PURE TONE RESPONSES.
Based on the audio-visually determined estimates of BF and threshold, information about the response area was obtained at two SPLs, 10 and 40 dB above threshold (Ramachandran et al. 1999
), and at 15 log-spaced frequencies from an octave below to an octave above BF. Average rates were measured over the entire duration of the 200-ms tones, which were presented once per second and windowed with 10-ms cos2 onset and offset ramps. Usually just one repetition was sufficient to determine the frequency that elicited an excitatory response at the lowest tested SPL (defined as BF), but more repetitions were presented if necessary.
All stimuli presented after the response area had BF tone carriers. First, a rate-level function (RLF) was obtained, usually over a 70-dB range starting about 10 dB below the threshold estimate that was determined audio-visually. Ten repetitions per level of each 100-ms tone burst (including 10-ms cos2 ramps) were presented with 400-ms ISIs. Rates were measured over the entire 100-ms stimulus presentation window, and a peristimulus time histogram (PSTH) was constructed using a bin size of 0.5 ms. From these responses, cells were classified based on their PSTH type (onset, sustained, on+sustained, or other; similar to Krishna and Semple 2000
and Le Beau et al. 1996
), RLF shape (monotonic, saturating, nonmonotonic, or other), and mean first-spike latency (FSL) across the 10 repetitions. PSTH type and FSLs were often level dependent; the SPL used to classify responses was the level used for AM stimulation (below).
FULLY MODULATED AM TONE RESPONSES.
Next, 100%-modulated sinusoidally AM (SAM) tone responses were recorded, usually at 15 modulation frequencies log spaced from 2 to 311 Hz. The overall SPL was fixed (i.e., there was no level increment caused by modulation) and chosen to correspond to a level on the ascending portion of the RLF. This convention was followed unless the strongest response to tones was suppression of firing rate below spontaneous activity, in which case an SPL was chosen that clearly elicited such suppression (this occurred in 9 neurons). To accommodate several cycles of low-fm stimuli, a 2-s BF tone including 50-ms cos2 ramps (a common ramp duration in AM psychophysics) was used as the carrier. Modulation was applied for the entire duration of the carrier (including the ramps). Three repetitions of each stimulus were presented with an ISI of
1 s. Time permitting, additional (usually higher) SPLs and fms were presented.
Four metrics were used to quantify the responses to fully modulated SAM tones. The average firing rate was computed excluding the first 100 ms to avoid onset effects, although there was usually negligible temporal adaptation to AM stimulation for stimulus modulation rates near the cells preferred values. Synchronization or vector strength (VS, Goldberg and Brown 1969
) to the modulation period was calculated from period histograms, which were constructed with a fixed number of bins per AM cycle (64 bins). Synchronized rate (Sachs et al. 1983
) was defined as the product of vector strength and average rate. Synchrony and phase values were plotted only if the vector strength was significant (Rayleigh statistic > 13.8, or an equivalent P < 0.001; Mardia and Jupp 2000
). In addition, at least five spikes across all three stimulus repetitions were required before a response was designated as being significantly synchronized. Envelope-locked response descriptions were computed only for the component synchronized to the stimulus fm [see footnote 1 in Krishna and Semple (2000)
for a brief discussion of this issue and Khanna and Teich (1989)
for ANF responses examined at other stimulus-related frequencies]. Quantifications based on average rate (rMTF), synchrony (sMTF), synchronized rate (srMTF), and response phase (pMTF) provided a modulation-frequency focused description of AM responses.
Several aspects of the MTFs were extracted for comparisons across the population and for making decisions concerning the stimulus parameters to be studied at lower modulation depths. rMTFs, sMTFs, and srMTFs were classified as all-pass, low-pass, band-pass, band-reject, or high-pass (over the range of fm tested), based on a 70% change criterion in the response above or below the cells best modulation frequency (BMF, fm resulting in an excitatory peak in the MTF) or worst modulation frequency (WMF, fm eliciting the strongest response suppression flanked by excitatory regions). sMTFs and srMTFs were almost exclusively band-pass or low-pass, whereas rMTFs could take on any of the five shapes (see RESULTS).
RESPONSE MODULATION-DEPTH DEPENDENCE. Because of time limitations in the unanesthetized preparation and our goal to study a wide range of modulation depths, a single fm was used to study the response dependence on m (as opposed to obtaining complete MTFs at several depths). Modulation depth functions (MDFs) based on rate, synchrony, synchronized rate, and phase were measured at a stimulus fm set equal to the frequency at the peak of the srMTF, regardless of whether the srMTF was strictly defined as band-pass or low-pass based on the 70% drop criterion. The srMTF peak was chosen as a compromise between pure rate and pure timing analyses; time permitting, additional MDFs were obtained at other interesting fms (e.g., a rate-based WMF).
Modulation depths from 35 to 0 dB in 20 log m (0.018 < m < 1) were tested in 5- or 1-dB steps. Other than m, the stimulus parameters were identical to those used in recording the MTF. Rate and synchronization analyses were also broadly similar, except the initial 500 ms of the response was discarded, and the remaining 4.5 s (1.5 s x 3 reps) was separated into nine 500-ms segments when determining a mean and variance of the rate estimate. The 500-ms window was used because it matched that used in much of the AM psychophysical literature. Ignoring the onset at low m was more crucial for avoiding artifacts than with fully modulated stimuli because a pure-tone onset response could result in artificially high values of vector strength if the duration of the onset response interacted with the period of the modulating waveform.
To determine neural detection and discrimination thresholds, responses to different stimulus depths were tested for significant differences between one another. Neural rate-based detection threshold (
rate) was defined as the lowest m that elicited a rate different from the rate in response to the lowest tested depth (paired t-test P < 0.05). An additional condition was imposed: responses to depths higher than
rate were also required to elicit significantly different rates compared with those in response to the lowest tested depth. This requirement rarely changed the resulting thresholds in practice, but it did eliminate the effect of spurious changes in rate resulting from movement or chewing by the rabbit. When calculating rate-based neural discrimination (as opposed to the special case of detection) thresholds, the responses to each depth were treated as responses to a standard; the lowest comparison depth resulting in a significantly different rate response determined the predicted just-noticeable difference in depth. Synchrony-based detection threshold (
sync) was defined as the lowest depth that resulted in a significant value of vector strength (Rayleigh statistic > 13.8). This criterion is commonly used in physiological studies (Liang et al. 2002
) and almost always resulted in thresholds matching those determined qualitatively by visually inspecting the period histograms at each depth. An alternate method for comparing metrics based on physiological responses to the psychophysical results will be presented in the DISCUSSION.
RESPONSES TO MASKED SAM TONES.
In addition to the tests of sensitivity to pure-SAM stimuli, neurons were also tested for their ability to represent deterministic (SAM) envelope fluctuations in the presence of a competing stochastic (Gaussian) masker modulation. The equation for the stimuli in the masked-detection task is
![]() |
rate and
sync. Implementation of the computational model
Implementation details of the phenomenological model tested here were recently described, and its responses were quantitatively compared with previously published AM physiological responses (Nelson and Carney 2004
). The first stage of the overall processing cascade was an auditory-nerve model. Rate functions from this peripheral model were low-pass filtered and additively combined as same-frequency inhibitory and excitatory (SFIE) inputs that interacted with one another in two successive stages to give rise to model neurons with response properties comparable with those of cells in the ventral cochlear nucleus and IC. There are three key parameters at the level of the model IC cells that significantly change the models overall AM response properties. By choosing appropriate values of the time constants associated with the successive low-pass filtering properties of inhibition (
inh) and excitation (
exc), the model cells rBMF can be adjusted to match single-unit recordings (Nelson and Carney 2004
). The relative strength of inhibition with respect to excitation (SINH,IC) determines the degree of suppression observed in the SFIE model cell at low and high fms (away from BMF). This parameter was not systematically studied in the initial modeling study, but it was crucial to account for the different AM response types that we observed in this study in groups of neurons with different pure-tone response properties.
| RESULTS |
|---|
|
|
|---|
Population pure tone responses and correlations with 100% SAM responses
The heterogeneity of BF pure-tone responses in the IC is impressive compared with the responses observed in lower brain stem structures such as the cochlear nucleus (Blackburn and Sachs 1989
) and remarkable compared with the highly stereotypical nature of ANF responses (Kiang et al. 1965
). Because of this diversity, any classification scheme is somewhat arbitrary, because the number of potential categories is essentially unlimited and at least partially subjective. We have chosen to use a small number of classifications (4) for both PSTH type and RLF shape, including one all-encompassing "other" category.
Distributions of PSTH type across the population are shown in Fig. 1A. Sustained pure-tone responses (without a clear onset component) were the most common PSTH type (43%), whereas only 13% of the neurons were pure onset responders. Nearly a third of the population (30%) exhibited some combination of an onset and sustained response. The PSTHs of the remaining 14% of the cells did not fall neatly into one of the three other categories. This PSTH group included offset responses (n = 2), pauser-buildups (n = 6), combined onset and offset responses (n = 3), responses with regularly spaced peaks of discharge not related to the stimulus periodicity (choppers, n = 2), suppression below spontaneous rate without an excitatory region (n = 9), and unusual histograms (n = 5). Because of the paucity of pauser response types in our population (n = 6), they were not identified as a separate class (as in Le Beau et al. 1996
and Krishna and Semple 2000
).
|
A characterization of the population based on the shapes of single-unit BF RLFs is shown in Fig. 1B. A 50% rate drop at high SPLs relative to the peak response was required for a RLF to be classified as nonmonotonic (as in Aitkin 1991
); 24% of the neurons had a single peak and such a rate drop at high SPLs. Some units that met the 50%-drop criterion were placed in the "other" RLF shape category because of multiple peaks in the RLF or a rebound from an initial rate drop at the highest levels tested (20% of the units). The remaining cells exhibited monotonic (11%) or saturating (44%) RLF shapes over the range of levels tested (almost always 70 dB).
The pure-tone response properties examined in Fig. 1 (PSTH type and RLF shape) were broken down further with respect to the corresponding neurons fully modulated SAM response properties. Neurons with band-pass (BP) rMTFs (which made up 47% of the population) are shown with the dark portions of the bars in Fig. 1; cells with non-BP rMTFs are represented by the light upper segments of each bar. rMTFs in onset pure-tone responders always revealed a band-pass shape (25/26 onset cells were classified as BP over the 2- to 312-Hz fm range; one onset neuron had a rBMF of 312 Hz), whereas most rMTFs in sustained pure-tone responders (63/86) were not BP. Classifications of on + sustained or "other" pure-tone responses were not predictive of the rMTF shape: approximately one half of each category was BP tuned. Similarly, RLF shape was not predictive of the presence of a single region of excitation in the rMTF (i.e., a BP shape).
One aspect of BF pure-tone responses that has been shown to be correlated with AM responses in previous studies is the mean FSL: IC neurons with longer FSLs tend to have lower BMFs (Heil et al. 1995
; Krishna and Semple 2000
; Langner et al. 1987
). This data set corroborates the finding of a weak inverse FSL-BMF correlation (FSL-rBMF Kendalls
= 0.25, P < 0.001; FSL-sBMF
= 0.26, P < 0.001; FSL-srBMF
= 0.23, P < 0.001). Despite the significant correlations, Fig. 2 makes it clear that FSL is in general an unreliable predictor of rBMF (the same is true for srBMF and sBMF). The different symbols in Fig. 2 denote the various PSTH types; the three major groups (sustained, onset, and on + sustained) contain neurons with a similar range of rBMFs, but the longest FSLs were found in sustained pure-tone responders. Below the scatter plot in Fig. 2 is a histogram of the FSL values across the entire population (only neurons with BP rMTFs were included in the scatter plot). There was not a significant correlation between BMF and BF (Kendalls
<0.1, P > 0.1 for rBMF, sBMF, and srBMF correlations with BF) (consistent with Krishna and Semple 2000
). It is worth reiterating the fact that the FSLs reported here were based on responses at a relatively low SPL as opposed to the minimum mean FSL across the entire range of levels tested with pure tones. One might expect that the use of a single low SPL to derive FSLs would result in higher estimates of latency, but this was not strictly true. Some neurons in the IC exhibited an increase in FSL with level, an effect that has been termed the "paradoxical latency shift" (Sullivan 1982
). In our population, 16% of the cells revealed such a latency level dependence, in the form of a mean FSL at a higher SPL >1 SD higher than the mean latency at the (lower) SPL used for the population analysis.
|
|
|
|
Example MTFs and MDFs
Detailed AM responses of four example neurons are highlighted in this section; they were chosen as representatives of each of the four categories of pure-tone responses (sustained, onset + sustained, onset, and other).
REPRESENTATIVE ONSET PURE-TONE RESPONSE. Without exception, onset units exhibited BP-tuned rMTFs; Fig. 6 characterizes such a neuron in more detail. Fully modulated AM responses are shown in Fig. 6A, revealing rate tuning to stimulus fm between 30 and 100 Hz. The mismatch between the peaks in the rMTF and the sMTF resulted in a srBMF (58 Hz) between the rBMF (81 Hz) and the sBMF (41 Hz). The sharpness of rMTF tuning was relatively high in this example (Q = 1.5). Raw period and PSTHs are also included in Fig. 6, B and C. Two features of the histograms in Fig. 6 were consistently observed across neurons. First, the response phase changed near BMF (as expected with period histograms constructed using a fixed starting point in time; Fig. 6B). Second, there was negligible temporal adaptation observed over the 2-s stimulation period in the PSTHs in response to AM tones near BMF (Fig. 6, C and F).
|
rate = 10 dB).
Plotted alongside the rMDF in Fig. 6D is the srMDF (
), which only includes values that were computed with a significant synchrony coefficient. A consistent offset between the rMDF and srMDF indicates a constant value of vector strength across depth. This is confirmed with the sMDF (
, bottom panel), in which vector strength values were all between 0.6 and 0.8. Another way to interpret depth-independent response synchrony is in terms of a modulation gain that decreases with increasing m. Single-cell synchrony-based neural thresholds were defined as the lowest modulation depth that evoked a significantly envelope-locked response (
sync = 10 dB for the neuron shown in Fig. 6). Visual inspection of the period histograms in Fig. 6E suggests that the statistical criteria used to define synchrony threshold were reasonable: response modulation clearly emerges at m = 10 dB. Also, the period histograms indicate that phase of the response did not change appreciably as m was varied. As with the PSTHs shown for different modulation frequencies (Fig. 6C), there was no evidence for gross, slow temporal adaptation in the PSTHs at different modulation depths (Fig. 6F).
REPRESENTATIVE ON + SUSTAINED PURE-TONE RESPONSE. Several aspects of the AM responses of the on + sustained pure-tone responder (Fig. 7) are fundamentally different from those of the pure onset responders. Perhaps the most salient difference was that the entire range of 100%-AM stimuli elicited synchronized firing (from 2 to 311 Hz). The resulting rMTF reveals weaker tuning (Q = 0.81), with a peak at 81 Hz (the same rBMF as the representative onset neuron). Vector strength reached a maximum of 0.56 at 113 Hz, and synchronized rate peaked at 58 Hz. Period histograms at low fms show that the probability of firing remained relatively constant for a longer portion of the corresponding stimulus waveform than for the onset neuron, with a weak cycle-by-cycle onset adaptation component. The shape of the period histogram at the srBMF (58 Hz) was somewhat more complex, with two peaks near the onset of adaptation of the response during each cycle. This multimodal period histogram shape emerged only at higher modulation depths (Fig. 7E). PSTHs plotted in Fig. 7, C and F, again suggest minimal adaptation over a time scale on the order of hundreds of milliseconds.
|
rate = 0 dB). In contrast, synchronization to the period of the modulating waveform was significant at 20 dB, and synchrony increased monotonically with increasing m. Correspondingly, the period histograms were modulated, and the timing of spikes became more phase-locked to a particular phase of the envelope as m was varied between 20 and 0 dB (Fig. 7E).
REPRESENTATIVE SUSTAINED PURE-TONE RESPONSE.
Pure sustained responders to short tone bursts were usually associated with rMTFs that were not BP (Fig. 1A). An example of such a cell is described in terms of its MTFs and MDFs in Fig. 8. The main feature of the rMTF of Fig. 8A was the presence of a broadly tuned suppressive region (i.e., a BR rMTF); a complementary (BP) sMTF had a peak within the region of rate suppression. Taking the product of synchrony and rate resulted in a srMTF with both a region of suppression at lower fms, an excitatory region at higher fms, and a srBMF that matched the sBMF of 113 Hz (Fig. 8A). The period histograms in Fig. 8B indicate that the drop in rate was largely mediated by a suppression of firing after the onset response elicited in each cycle and that the rate recovery at higher fms was not strongly synchronized (although VS remained significant,
311 Hz).
|
rate of 10 dB based on a drop in rate was still much higher than human AM detection abilities. As with the first two example neurons, a timing-based metric such as synchrony was more sensitive to low-depth AM stimulation: the cells
sync was 25 dB. Also, the value of VS increased monotonically with depth; this trend was mainly observed in neurons that responded to pure tones with a substantial sustained rate (see
|
|
|
|
|
65 Hz, whereas synchrony was
70% of its peak value (VSmax = 0.8)
107 Hz.
The characteristic phase [the low-fm y-intercept of the phase-MTF (data not shown)] was
180° out of phase with respect to those observed in the neurons shown in Figs. 68. This can be qualitatively verified by comparing the period histograms at the lowest tested fms across the four example neurons and suggests that the neuron was released from inhibition at times corresponding to the valleys of the modulating waveform. Also, in contrast to the first three examples, a weak form of slow adaptation was observed in the PSTHs: onset inhibition was slowly released over a time-course of
1 s (this is most clearly shown by the 113-Hz PSTH; Fig. 9C).
Because there was not a clear peak in the srMTF of Fig. 9A, a modulation of 40 Hz was chosen for the MDF simply as an AM frequency within the pass-band of all three MTFs. Interestingly, based on the MDFs alone, the neurons characterized in Figs. 6D and 9D were remarkably similar (despite their obvious differences in pure-tone responses). The rMDF revealed an increase in firing rate with increasing depth (and longer effective times of release from inhibition), with a
rate of 10 dB (Fig. 9D). The synchrony-based threshold was, once again, lower than the rate threshold (
sync = 15 dB). Although VS values were similar for 15 dB
m
5 dB, the corresponding period histograms were quite different. This shows one of the limitations of the synchronization coefficient alone as a general description of temporal response characteristics.
Single-unit rate- and synchrony-based AM thresholds
Performance in three basic psychoacoustic AM tasks was predicted based on changes in neural responses at different modulation depths that were quantified in terms of average rate and synchrony. The three psychophysical paradigms are 1) pure SAM detection, 2) masked SAM detection, and 3) SAM depth discrimination.
PURE SAM DETECTION.
Human listeners can discriminate the difference between a pure tone and a SAM tone at modulation depths lower than 30 dB (Zwicker 1952
). At a given SPL, performance does not systematically depend on fm or fc for modulation frequencies between 10 and 150 Hz and carrier frequencies >1,000 Hz (Kohlrausch et al. 2000
). Sensitivity to SAM is best at higher SPLs, but thresholds can remain lower than 20 dB at low sensation levels (Kohlrausch et al. 2000
). There is indirect psychophysical evidence suggesting that listeners probably use audio-frequency channels other than that of the carrier to perform the task at high SPLs, where the effective level is lower and peripheral saturation and compression are less likely to have a strong influence on performance (Kohlrausch et al. 2000
; Ruggero et al. 1997
).
Neural rate and synchrony SAM detection thresholds across the population of 164 neurons are shown in Fig. 10A as a function of the stimulus modulation rate. Consistent with the example neurons described in the previous section, the vast majority of rate thresholds (
) were 10 dB or higher (139/164 neurons). If the rate did not change across the entire range of m, thresholds were deemed immeasurable; this group of neurons is identified with the X on the axes in Fig. 10. Five neurons had a rate threshold of 20 dB, and 20 responded with a significant change in rate at 15 dB. The histogram of rate thresholds to the left of Fig. 10A reinforces the fact that rate changes in single IC neurons were, in general, poor predictors of human SAM sensitivity. There was not a strong relationship between rate thresholds and fm of stimulation, which was set equal to the most prominent peak in the cells srMTF.
Synchrony-based thresholds (shown in the histogram to the right of Fig. 10A) were more evenly distributed across the perceptually relevant dynamic range than the values of
rate. Twenty-eight percent (46/164) of the neurons had synchrony thresholds of 20 dB or lower; three units were significantly phase-locked at a modulation depth of 30 dB. Examination of the individual neural thresholds (Fig. 10A, X symbols) reveals no obvious trends either in maximum sensitivity or threshold distribution as a function of the stimulus fm.
Figure 10B shows a feature of the data that is suggested by but not explicitly contained in Fig. 10A:
sync was almost always lower than
rate on a neuron-by-neuron basis. In the scatter plot of Fig. 10B, this aspect of the data takes the form of almost all of the points lying above the diagonal. The three neurons that had the most sensitive synchrony thresholds (
sync = 30 dB, the 3 left points in Fig. 10B) had corresponding rate thresholds of 15 and 5 dB and one immeasurable
rate.
As mentioned above, overall SPL can affect behavioral SAM detection thresholds. The neural responses summarized in Figs. 210 were obtained at an SPL chosen based on individual RLFs: the SPL used for AM stimulation was set to a level on the ascending portion of the RLF, or at its peak if it was sharply nonmonotonic. One obvious question is whether the relatively poor neural rate-based thresholds might improve if a higher SPL were chosen. In 33 neurons, this question was addressed by remeasuring MTFs and MDFs at a level typically 2040 dB higher than that used for the low-SPL responses. The stimulus fm for each MDF was determined based on the peak in the srMTF, which could vary across SPL (for a more detailed discussion of the level-dependence of MTF shapes, see Krishna and Semple 2000
).
The resulting rate- and synchrony-based thresholds for this subset of the population are shown for both tested SPLs in Fig. 11. Thresholds that increased (i.e., sensitivity got worse) or remained the same with increasing SPL are plotted with solid lines; the other neurons, which exhibited a decrease in threshold at the higher SPL, are represented by dashed lines. The majority of rate thresholds (22/33) did not improve at high SPLs (Fig. 11A), and 11 of the comparisons revealed an increase in
rate at higher SPLs. The 11 cells that did exhibit an improvement in rate-based sensitivity with level still did not approach human detection thresholds at comparable SPLs (the lowest
rate >40 dB SPL was 20 log m = 15 dB). Synchrony-based thresholds (Fig. 11B) were also more likely to show a decrease in sensitivity at high SPLs (16/33) than an improvement (11/33). The average
rate increased by 0.9 dB at the higher SPL, and the average
sync increased by 1.1 dB at the higher level. Overall, the trends shown in Fig. 11 suggest that the use of a relatively low SPL in the population analysis (e.g., Fig. 10) probably did not bias the results toward higher thresholds.
We conclude that, for pure SAM detection, some temporal aspect of the envelope-locked response (i.e., synchrony) at the level of the IC must be taken into consideration to account for psychophysical detection thresholds based on the responses of single neurons.
MASKED SAM DETECTION.
Psychophysical experiments studying the effect of a competing masker modulation on the detectability of a SAM signal modulation have shown effects of the frequency relationship between masker and signal (Bacon and Grantham 1989
; Ewert and Dau 2000
; Ewert et al. 2002
; Houtgast 1989
; Strickland and Viemeister 1996
) and of the "level" or modulation depth of the masker (Bacon and Grantham 1989
; Nelson and Carney 2006
; Strickland and Viemeister 1996
). Because a main focus of this set of experiments was to establish the modulation-depth dependence of responses in the IC, neural masked thresholds were determined with a narrowband Gaussian masker (centered on the sinusoidal signal frequency) at several masker depths.
Over a 10-dB range of masker depths from 23 to 13 dB rms, psychophysical SAM detection thresholds in a similar task (with fm = 64 Hz, masker BW = 32 Hz, fc = 5,500 Hz, and SPL = 65 dB) increased monotonically as the masker fluctuations became stronger (more details and thresholds in a wider range of stimulus conditions can be found in Nelson and Carney 2006
). The behavioral thresholds are shown in Fig. 12, along with neural thresholds obtained in 28 units, again based on rate and synchrony to the signal envelope frequency. The signal fm was set to the peak in the cells srBMF, and the masker bandwidth was fixed at one half of the SAM signal frequency. To avoid overmodulation (i.e., a modulation depth > 1), the signal depth was restricted to values
5 dB for masker depths of 23 and 18 dB rms and to values
10 dB for the 13 dB rms masker. As a result, there were no predicted neural thresholds at 0 dB for any masker level and no thresholds at 5 or 0 dB (20 log m) for the 13 dB rms condition.
Figure 12A shows that, in general, average rates were more successful in predicting masked thresholds than they were at predicting pure SAM detection (i.e., Fig. 10). A small group of cells (8) exhibited rate thresholds within
5 dB of the listeners thresholds at one or more masker depths. This is weakly suggestive that the functional contributions of rate and synchrony may depend on the range of relevant modulation depths in a given task. We will come back to this idea in the following section (see SAM DEPTH DISCRIMINATION), which considers a general discrimination task across the entire perceptual modulation-depth dynamic range.
Synchronization to the signal SAM was significant in some neurons at depths even lower than the behavioral thresholds, most notably at the highest tested masker modulation depth, where 7/24 neural thresholds were below the psychophysical data (Fig. 12B). As with the pure SAM detection population analysis, the distribution of synchrony thresholds was more uniform than the rate-based distribution (which was skewed toward higher or immeasurable predictions of threshold). Trends in threshold across masker depth for individual neurons are not shown in Fig. 12 for clarity, because there was not a consistently observed increase or decrease in predicted sensitivity with increasing masker level (as there was in the psychophysical data).
SAM DEPTH DISCRIMINATION. Another fundamental measure of envelope processing in psychoacoustics is SAM depth discrimination, which describes the ability of the system to resolve small changes in m. Pure SAM detection is a special case of this more general paradigm: the standard depth (ms) for detection is set to 0, and the comparison depth (mc) is adjusted until it is just noticeably different from the standard interval. The same procedure can be repeated for any value of ms. Psychophysical measurements of pure-tone carrier SAM-depth discrimination reveal thresholds that are approxi