JN Fuel your research with LabChart
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J Neurophysiol 97: 522-539, 2007. First published November 1, 2006; doi:10.1152/jn.00776.2006
0022-3077/07 $8.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
97/1/522    most recent
00776.2006v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (1)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Nelson, P. C.
Right arrow Articles by Carney, L. H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Nelson, P. C.
Right arrow Articles by Carney, L. H.

Neural Rate and Timing Cues for Detection and Discrimination of Amplitude-Modulated Tones in the Awake Rabbit Inferior Colliculus

Paul C. Nelson1 and Laurel H. Carney1,2

1Department of Biomedical and Chemical Engineering and Institute for Sensory Research and 2Department of Electrical Engineering and Computer Science, Syracuse University, Syracuse, New York

Submitted 27 July 2006; accepted in final form 25 October 2006


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Neural responses to amplitude-modulated (AM) tones in the unanesthetized rabbit inferior colliculus (IC) were studied in an effort to establish explicit relationships between physiological and psychophysical measures of temporal envelope processing. Specifically, responses to variations in modulation depth (m) at the cell’s best modulation frequency, with and without modulation maskers, were quantified in terms of average rate and synchronization to the envelope over the entire perceptual dynamic range of depths. Statistically significant variations in the metrics were used to define neural AM detection and discrimination thresholds. Synchrony emerged at modulation depths comparable with psychophysical AM detection sensitivities in some neurons, whereas the lowest rate-based neural thresholds could not account for psychoacoustical thresholds. The majority of rate thresholds (85%) were –10 dB or higher (in 20 log m), and 16% of the population exhibited no systematic dependence of average rate on m. Neural thresholds for AM detection did not decrease systematically at higher SPLs (as observed psychophysically): thresholds remained constant or increased with level for most cells tested at multiple sound-pressure levels (SPLs). At depths higher than the rate-based detection threshold, some rate modulation-depth functions were sufficiently steep with respect to the across-trial variability of the rate to predict depth discrimination thresholds as low as 1 dB (comparable with the psychophysics). Synchrony, on the other hand, did not vary systematically with m in many cells at high modulation depths. A simple computational model was extended to reproduce several features of the modulation frequency and depth dependence of both transient and sustained pure-tone responders.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Temporal envelope fluctuations abound in natural acoustic landscapes, and the preservation of psychophysically relevant amplitude modulations is essential for robust vocalization perception (Drullman 1995Go; Shannon et al. 1995Go), auditory grouping and stream segregation (Bregman 1990Go), and signal detection in the presence of competing maskers (Hall et al. 1984Go). Although the auditory system is often described in terms of its audio-, or spectral-frequency, tuning properties, many central neurons are actually more sensitive to degradations of the stimulus temporal envelope than to smearing of spectral content (Nagarajan et al. 2002Go; Theunissen and Doupe 1998Go).

Basic psychophysical envelope-processing tasks have received renewed attention recently, caused in large part by the success of a model of the "effective" signal processing of the auditory system (Dau et al. 1997Go) in predicting behavioral data that are difficult to interpret unless one assumes the existence of a bank of filters tuned in the amplitude modulation (AM) frequency domain (Bacon and Grantham 1989Go; Dau et al. 1997Go; Ewert and Dau 2000Go; Houtgast 1989Go; Kay 1982Go). Such a conceptual framework is fundamentally different from the assumptions of earlier models of AM perception, which describe the putative central processor as a low-pass filter (Viemeister 1979Go).

Physiological studies of responses to AM provided some of the motivation for psychophysical studies of frequency selectivity in the AM domain (Creutzfeldt et al. 1980Go; Langner and Schreiner 1988Go), but specific hypotheses concerning the relationships between physiological responses and perceptual signal-processing models have not been adequately examined. One possibility is that single neurons in the auditory midbrain function as modulation filters. A qualitative scan of the relevant literature suggests this may be a reasonable hypothesis, because many neurons in the inferior colliculus (IC) systematically change their responses with variations in modulation frequency (fm) and modulation depth (m) (Krishna and Semple 2000Go; Langner and Schreiner 1988Go; Mueller-Preuss et al. 1994Go; Rees and Moller 1983Go). The next portion of this introduction provides a more detailed description of the published physiological responses to AM that are relevant to this hypothesis, with a focus on the gaps in evidence that the current experiments were designed to fill.

Neural representations of sounds with dynamic temporal envelopes change dramatically as the auditory neuraxis is ascended. Much of our understanding about this transformation comes from studies of physiological responses to stimuli with systematically varied AM frequencies. From this body of work, a reasonably consistent picture has emerged: peripheral neurons seem to carry envelope-frequency information in a temporal (phase-locked code), with average rates that do not change with stimulus fm (Joris and Yin 1992Go). More central neurons, in contrast, often exhibit average firing rates that are strongly dependent on fm and a reduced ability to synchronously follow faster fluctuations (for a review, see Joris et al. 2004Go). This frequency-focused description of neural responses to AM leaves a fundamental issue pertaining to the relationships between physiology and psychophysics unclear. Specifically, modulation transfer functions (MTFs) only provide information about responses to a single (usually high) modulation depth; as a result, direct comparisons to behavioral data are difficult because the goal of much of the relevant psychoacoustics is to determine the smallest detectable or discriminable m. A major objective of this study was to obtain neural responses to stimuli with a wide range of modulation depths, including depths near psychophysical AM detection thresholds.

A few studies have reported physiological responses to variations in stimulus modulation depth. In auditory-nerve fibers (ANFs) and most ventral cochlear nucleus (VCN) units, synchronization to the envelope increases monotonically with depth, with average rates that are largely depth-independent (AN: Joris and Yin 1992Go; VCN: Rhode 1994Go). Such generalizations cannot be made about AM-depth processing at higher levels of the central auditory system because of the striking response diversity. For instance, in the superior olivary complex (SOC), changes in the response with m are strongly correlated to the units’ pure-tone response properties. Sustained pure-tone responders in the periolivary nuclei of the SOC tend to be similar to ANFs in terms of the shape of rate- and synchrony-modulation depth functions (rMDFs and sMDFs), whereas offset responders exhibit monotonically increasing rMDFs and saturating sMDFs with narrow dynamic ranges (Kuwada and Batra 1999Go).

Krishna and Semple (2000)Go provided the most complete account of neural responses at the level of the IC across a range of modulation depths. They found that average firing rates in many cells varied monotonically with m, especially near the cell’s preferred modulation frequency. The change in rate could be an increase or a decrease with m, depending on the presence of regions of excitation and suppression in the cell’s rate modulation transfer function (rMTF) and their relationship to the chosen stimulus modulation frequency. Temporal response patterns also changed with m in their study. A minimum depth was required to elicit significant synchrony in individual cells; this value ranged from as low as 10% (the lowest depth tested) in some of the neurons to 70% in others (Krishna and Semple 2000Go). Changes in vector strength above the minimum m were less stereotypical: synchrony in some neurons varied over a wide range of depths, but it remained constant in most cells. Results from other (less systematic) studies are in qualitative agreement with the single-unit IC modulation-depth dependence description of Krishna and Semple (Mueller-Preuss et al. 1994Go; Nelson et al. 1966Go; Rees and Moller 1983Go). Relatively little is known about cortical responses to variation in m; Eggermont (1994)Go and Liang et al. (2002)Go measured tone-carrier modulation transfer functions (MTFs) in primary auditory cortex neurons at several depths from 25 to 100% and concluded that 1) MDFs were monotonic and 2) neural best modulation frequencies were essentially independent of m.

A survey of this previous work allows for a qualitative description of neural responses to variations in modulation depth, but a direct and quantitative comparison of physiological responses at any level of the pathway to basic psychophysical AM detection and discrimination performance is still lacking. Two requirements for such a comparison to be made are met in this study. First, the stimulus parameter space used in the physiology was designed to match that of the psychophysics. Specifically, m was varied from below detection threshold to 100%, in some cases using step sizes smaller than the behavioral just-noticeable difference (jnd). Second, a description of the statistical variability of the neural responses was included to quantify the significance of small changes in a given response metric (e.g., average rate and synchrony). In addition, because most naturally occurring sounds have complex modulation spectra, and recent models of envelope processing were developed based on masked AM detection paradigms, a similar rate- and synchrony-based analysis was applied to responses elicited by a sinusoidal signal modulation embedded in a competing masker modulation.

The IC is an inherently interesting nucleus in which to study AM processing. Structurally, it occupies a critical position in the subcortical processing pathway, as an almost obligatory ascending synapse (Aitkin and Phillips 1984Go; Malmierca et al. 2002Go; Ramon y Cajal 1904Go) and a receiving station for both inhibitory and excitatory inputs converging from afferent (Warr 1982Go; Winer et al. 1995Go), efferent (Winer 2004Go), and intrinsic and commissural connections (Saldana and Merchan 2004Go). In contrast to the established anatomical description of the IC and its connections, the functional representation of modulated sounds in the IC is still a matter of debate (Joris et al. 2004Go), but it is clear that both magnitude (rate) and phase (synchrony) information is present in the responses of single neurons (Krishna and Semple 2000Go; Langner and Schreiner 1988Go; Rees and Moller 1983Go). Thus the IC apparently plays a transitional role between the temporal representation of AM in the periphery (Joris and Yin 1992Go) and a more rate-based code in the cortex (Liang et al. 2002Go).

Here, we show that changes in the average firing rates of single IC neurons in the awake Dutch-belted rabbit are generally poor predictors of human behavioral performance in psychophysical AM detection tasks. Synchronization to the envelope, on the other hand, can emerge and change at modulation depths much closer to psychoacoustical thresholds. At suprathreshold depths, the situation is different; changes in average rates can, in some neurons, account for psychophysical sensitivity in masked AM detection and AM depth discrimination.


    METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Animal preparation

AM responses in the IC were obtained from 198 cells in three unanesthetized female Dutch-Belted rabbits (oryctolagus cuniculus). All procedures were approved by the Syracuse University Institutional Animal Care and Use Committee and conformed to National Institutes of Health guidelines and protocols. Our preparation was developed based on techniques used in several previous studies of the awake rabbit IC and superior olivary complex (Batra et al. 1989Go; Kuwada et al. 1987Go). Before recordings began, two separate aseptic surgeries were performed to allow for chronic access to the midbrain in daily 2-hour recording sessions. In both procedures, the animals were anesthetized with ketamine (66 mg/kg) and xylazine (2 mg/kg) delivered intramuscularly, and supplemental doses were administered to maintain areflexia.

In the initial surgery, a 15-mm inner diameter stainless steel cylinder and brass headbar (aligned parallel to the sagittal suture) were centered on the midline and affixed to the exposed skull with dental acrylic and screws. The rostral edge of the cylinder was aligned with bregma and a wall of dental acrylic was built up under the posterior side of the cylinder to compensate for the slope of the skull.

Each animal was given several weeks to recover from the first surgery before it was gradually adapted to sitting in the recording chamber and exposed to auditory stimuli. The rabbit was restrained with a snug blanket around the body and placed in a plexiglass chair positioned in front of a clamp used to fix the headbar. Daily sessions were increased in duration over the course of 2–3 wk until the animal was acclimated to sitting quietly for 2 h.

A small (~3–4 mm diameter) craniotomy was made in the skull in the second surgery. The medial edge of the hole was ~2 mm lateral of the midline, and the rostral edge was slightly forward of the middle of the cylinder. The exposed dura was rinsed with sterile saline and treated with a topical antibiotic (Bacitracin), and the cylinder was filled with a sterile silastic elastopolymer cap between sessions. A 1- to 2-day recovery period was allowed before removing the silastic cap and attempting electrode penetrations. Additional craniotomy surgeries were occasionally performed to extend the existing hole or to provide access to the opposite IC. After each of the anesthetized surgeries, Banamine (flunixin meglumine, 1 mg/kg) was administered as an analgesic.

After every session, new dural scarring was removed with forceps before reapplying Bacitracin and filling the cylinder with the polymer plug. In one rabbit, the dura was also treated with an anti-mitotic compound (5-fluorouracil; 25 mg/ml saline) before sealing the cylinder to discourage scar tissue from forming between sessions (Spinks et al. 2003Go). Using these daily cleaning techniques, recording sessions yielded reasonable success rates in a single IC for 3–6 mo.

Acoustic stimuli

Sound stimuli were generated digitally and converted to analog signals using a Tucker-Davis System II D-A converter (TDT DA 3–4). The stimuli were filtered at 20 kHz (TDT FT6) and attenuated (TDT PA4), before being passed to a headphone buffer (TDT HB6) and finally to a pair of Beyer-Dynamic speakers (DT-48). The speaker outputs were delivered through custom-made soft plastic (Hal-Hen Per-form) earmolds, and a probe tube allowed daily calibration of the closed acoustic system before each session with an Etymotic ER-7C probe microphone system. Calibration tables based on the frequency shaping introduced by the system to wideband (100 Hz to 20 kHz) noises were used to determine the attenuation values required to describe the sound levels in dB SPL (dB re: 20 µPa). Monaural (usually contralateral) or diotic stimuli were presented, depending on the properties of each individual unit (see response classification below).

Recording methods

Single-unit extracellular responses were recorded using glass-insulated tungsten microelectrodes (Bullock et al. 1988Go). Electrode impedances between 10 and 30 M{Omega} measured at 135 Hz were usually required for the successful isolation and holding of neurons, but measures of impedance were only marginally reliable as predictors of electrode performance. The electrode signal was amplified (Grass Instruments), filtered (700 Hz to 3 kHz), and AC-coupled with a TDT PC1 spike conditioner before being passed to a spike discriminator (TDT SD1) and event timer (TDT ET1). Isolated spike times were recorded with respect to a stimulus onset–triggered reference with a resolution of ±10 µs.

Before lowering the electrode, a topical anesthetic (Lidocaine) was applied to desensitize the dura. The position of the electrode was set with a stereotaxic system (Edmund), which was mounted on the cylinder affixed to the rabbit’s skull. A sharply beveled, sterile, stainless-steel guide tube (23xx gauge) was used to pierce the dura and protect the electrode tip. The guide tube was lowered by hand until its sharpened end was ~2–3 mm from the proximal (dorsal) surface of the IC. From there, the electrode was lowered independent of the guide tube until a unit was isolated. Because of the tonotopic organization of the structure (low frequencies were encountered at shallower depths) and the limited recording time, the distribution of best frequencies (BFs) of the neurons described here was biased toward lower frequencies (94% of the population had a BF <10 kHz). Electrodes were advanced from outside the double-walled soundproof booth with a hydraulic microdrive (Kopf Instruments, Tujunga, CA). Stimulus presentation, on-line data analyses, and video monitoring of the animal were also controlled from outside the booth.

At the conclusion of the recordings in each rabbit, electrolytic lesions were made in the approximate center of the three-dimensional coordinates that described the spatial distribution of the population of well-studied neurons from that specific IC. Standard histological techniques were used to confirm that the recording sites were likely within the central nucleus of the IC (ICC). However, the prolonged duration of recording from each IC made it impossible to definitively state that every unit was positioned within the ICC.

Response classification and analysis

Parameters of AM stimuli were designed for each neuron based on its responses to a battery of simpler sounds. Specifically, to study a cell’s sensitivity to changes in modulation depth, it was necessary to determine the appropriate binaural configuration, tone carrier frequency, sound-pressure level (SPL), and modulation frequency. This section describes the stimuli and response quantifications used to make those decisions.

SEARCH STIMULI AND INITIAL CHARACTERIZATION. To search for driven activity, a 500-ms Gaussian wideband (100–10,000 Hz) noise with 10-ms cos2 ramps was presented binaurally every 1.5 s. The interaural time difference (ITD) for each presentation was randomly chosen from a uniformly distributed range from –300 (contralateral ear leading) to +300 µs (ipsilateral ear leading) in steps of 100 µs, and the spectrum level of the noise was typically fixed at a level between 5 and 20 dB SPL.

Once a unit was encountered and isolated using the search stimulus, its binaural configuration preference (contralateral, ipsilateral, diotic, or silence) was quantified by counting the number of spikes elicited by each configuration in response to five repetitions of a 10-dB SPL spectrum level (50 dB SPL rms), 500-ms noise (or silent interval) presented once per second. The bandwidth of the noise was the same as the search stimulus. Next, the unit’s BF and threshold were estimated by manually controlling the frequency and level of 100-ms pure tones (10-ms cos2 ramps) separated by 500-ms interstimulus interval (ISI).

PURE TONE RESPONSES. Based on the audio-visually determined estimates of BF and threshold, information about the response area was obtained at two SPLs, 10 and 40 dB above threshold (Ramachandran et al. 1999Go), and at 15 log-spaced frequencies from an octave below to an octave above BF. Average rates were measured over the entire duration of the 200-ms tones, which were presented once per second and windowed with 10-ms cos2 onset and offset ramps. Usually just one repetition was sufficient to determine the frequency that elicited an excitatory response at the lowest tested SPL (defined as BF), but more repetitions were presented if necessary.

All stimuli presented after the response area had BF tone carriers. First, a rate-level function (RLF) was obtained, usually over a 70-dB range starting about 10 dB below the threshold estimate that was determined audio-visually. Ten repetitions per level of each 100-ms tone burst (including 10-ms cos2 ramps) were presented with 400-ms ISIs. Rates were measured over the entire 100-ms stimulus presentation window, and a peristimulus time histogram (PSTH) was constructed using a bin size of 0.5 ms. From these responses, cells were classified based on their PSTH type (onset, sustained, on+sustained, or other; similar to Krishna and Semple 2000Go and Le Beau et al. 1996Go), RLF shape (monotonic, saturating, nonmonotonic, or other), and mean first-spike latency (FSL) across the 10 repetitions. PSTH type and FSLs were often level dependent; the SPL used to classify responses was the level used for AM stimulation (below).

FULLY MODULATED AM TONE RESPONSES. Next, 100%-modulated sinusoidally AM (SAM) tone responses were recorded, usually at 15 modulation frequencies log spaced from 2 to 311 Hz. The overall SPL was fixed (i.e., there was no level increment caused by modulation) and chosen to correspond to a level on the ascending portion of the RLF. This convention was followed unless the strongest response to tones was suppression of firing rate below spontaneous activity, in which case an SPL was chosen that clearly elicited such suppression (this occurred in 9 neurons). To accommodate several cycles of low-fm stimuli, a 2-s BF tone including 50-ms cos2 ramps (a common ramp duration in AM psychophysics) was used as the carrier. Modulation was applied for the entire duration of the carrier (including the ramps). Three repetitions of each stimulus were presented with an ISI of ≥1 s. Time permitting, additional (usually higher) SPLs and fms were presented.

Four metrics were used to quantify the responses to fully modulated SAM tones. The average firing rate was computed excluding the first 100 ms to avoid onset effects, although there was usually negligible temporal adaptation to AM stimulation for stimulus modulation rates near the cell’s preferred values. Synchronization or vector strength (VS, Goldberg and Brown 1969Go) to the modulation period was calculated from period histograms, which were constructed with a fixed number of bins per AM cycle (64 bins). Synchronized rate (Sachs et al. 1983Go) was defined as the product of vector strength and average rate. Synchrony and phase values were plotted only if the vector strength was significant (Rayleigh statistic > 13.8, or an equivalent P < 0.001; Mardia and Jupp 2000Go). In addition, at least five spikes across all three stimulus repetitions were required before a response was designated as being significantly synchronized. Envelope-locked response descriptions were computed only for the component synchronized to the stimulus fm [see footnote 1 in Krishna and Semple (2000)Go for a brief discussion of this issue and Khanna and Teich (1989)Go for ANF responses examined at other stimulus-related frequencies]. Quantifications based on average rate (rMTF), synchrony (sMTF), synchronized rate (srMTF), and response phase (pMTF) provided a modulation-frequency focused description of AM responses.

Several aspects of the MTFs were extracted for comparisons across the population and for making decisions concerning the stimulus parameters to be studied at lower modulation depths. rMTFs, sMTFs, and srMTFs were classified as all-pass, low-pass, band-pass, band-reject, or high-pass (over the range of fm tested), based on a 70% change criterion in the response above or below the cell’s best modulation frequency (BMF, fm resulting in an excitatory peak in the MTF) or worst modulation frequency (WMF, fm eliciting the strongest response suppression flanked by excitatory regions). sMTFs and srMTFs were almost exclusively band-pass or low-pass, whereas rMTFs could take on any of the five shapes (see RESULTS).

RESPONSE MODULATION-DEPTH DEPENDENCE. Because of time limitations in the unanesthetized preparation and our goal to study a wide range of modulation depths, a single fm was used to study the response dependence on m (as opposed to obtaining complete MTFs at several depths). Modulation depth functions (MDFs) based on rate, synchrony, synchronized rate, and phase were measured at a stimulus fm set equal to the frequency at the peak of the srMTF, regardless of whether the srMTF was strictly defined as band-pass or low-pass based on the 70% drop criterion. The srMTF peak was chosen as a compromise between pure rate and pure timing analyses; time permitting, additional MDFs were obtained at other interesting fms (e.g., a rate-based WMF).

Modulation depths from –35 to 0 dB in 20 log m (0.018 < m < 1) were tested in 5- or 1-dB steps. Other than m, the stimulus parameters were identical to those used in recording the MTF. Rate and synchronization analyses were also broadly similar, except the initial 500 ms of the response was discarded, and the remaining 4.5 s (1.5 s x 3 reps) was separated into nine 500-ms segments when determining a mean and variance of the rate estimate. The 500-ms window was used because it matched that used in much of the AM psychophysical literature. Ignoring the onset at low m was more crucial for avoiding artifacts than with fully modulated stimuli because a pure-tone onset response could result in artificially high values of vector strength if the duration of the onset response interacted with the period of the modulating waveform.

To determine neural detection and discrimination thresholds, responses to different stimulus depths were tested for significant differences between one another. Neural rate-based detection threshold ({theta}rate) was defined as the lowest m that elicited a rate different from the rate in response to the lowest tested depth (paired t-test P < 0.05). An additional condition was imposed: responses to depths higher than {theta}rate were also required to elicit significantly different rates compared with those in response to the lowest tested depth. This requirement rarely changed the resulting thresholds in practice, but it did eliminate the effect of spurious changes in rate resulting from movement or chewing by the rabbit. When calculating rate-based neural discrimination (as opposed to the special case of detection) thresholds, the responses to each depth were treated as responses to a standard; the lowest comparison depth resulting in a significantly different rate response determined the predicted just-noticeable difference in depth. Synchrony-based detection threshold ({theta}sync) was defined as the lowest depth that resulted in a significant value of vector strength (Rayleigh statistic > 13.8). This criterion is commonly used in physiological studies (Liang et al. 2002Go) and almost always resulted in thresholds matching those determined qualitatively by visually inspecting the period histograms at each depth. An alternate method for comparing metrics based on physiological responses to the psychophysical results will be presented in the DISCUSSION.

RESPONSES TO MASKED SAM TONES. In addition to the tests of sensitivity to pure-SAM stimuli, neurons were also tested for their ability to represent deterministic (SAM) envelope fluctuations in the presence of a competing stochastic (Gaussian) masker modulation. The equation for the stimuli in the masked-detection task is

Formula
where fc is the carrier frequency (set equal to the neuron’s BF) and M(t) is the masker waveform (0 when measuring absolute thresholds). Masker level was defined in terms of the root-mean-square (rms) of M(t). The compensation factor c was included so the overall power in both intervals was fixed as m and M(t) varied. The response to a range of signal m was measured for three values of the overall depth of the masker: –23, –18, and –13 dB rms. The frequency composition of M(t) was defined in terms of its bandwidth, which was set to 1/2 of the neuron’s srBMF, and center frequency, which was equal to the cell’s srBMF. This was chosen to mirror a psychophysical paradigm, which used a single AM frequency (64 Hz) and a masker bandwidth of 32 Hz (Nelson and Carney 2006Go). The signal AM was varied as it was in the "unmasked" MDFs, but was restricted to depths below –5 dB (20 log m) for masker levels of –23 and –18 dB rms and below –10 dB for a masker level of –13 dB rms to avoid overmodulation (i.e., overall modulation indices > 1 were not presented). Data analyses and other stimulus parameters were identical to those used for the unmasked MDFs and allowed for a description of masked neural {theta}rate and {theta}sync.

Implementation of the computational model

Implementation details of the phenomenological model tested here were recently described, and its responses were quantitatively compared with previously published AM physiological responses (Nelson and Carney 2004Go). The first stage of the overall processing cascade was an auditory-nerve model. Rate functions from this peripheral model were low-pass filtered and additively combined as same-frequency inhibitory and excitatory (SFIE) inputs that interacted with one another in two successive stages to give rise to model neurons with response properties comparable with those of cells in the ventral cochlear nucleus and IC. There are three key parameters at the level of the model IC cells that significantly change the model’s overall AM response properties. By choosing appropriate values of the time constants associated with the successive low-pass filtering properties of inhibition ({tau}inh) and excitation ({tau}exc), the model cell’s rBMF can be adjusted to match single-unit recordings (Nelson and Carney 2004Go). The relative strength of inhibition with respect to excitation (SINH,IC) determines the degree of suppression observed in the SFIE model cell at low and high fms (away from BMF). This parameter was not systematically studied in the initial modeling study, but it was crucial to account for the different AM response types that we observed in this study in groups of neurons with different pure-tone response properties.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Our population consisted of 198 single units characterized through the MTF and 164 cells held for at least one MDF. These cells were isolated in four ICs of three animals. All of the neurons in the sample were responsive to pure-tone carrier SAM stimuli, although a small number of isolated cells were not characterized with SAM tones because of poor responses to the short tone bursts used in the search mode or response area measurements. Furthermore, all of the neurons exhibited significant synchrony to at least one modulation rate, in contrast with cortical responses, which are often not phase-locked to the envelope over a range of fms similar to the range used here (Liang et al. 2002Go).

Population pure tone responses and correlations with 100% SAM responses

The heterogeneity of BF pure-tone responses in the IC is impressive compared with the responses observed in lower brain stem structures such as the cochlear nucleus (Blackburn and Sachs 1989Go) and remarkable compared with the highly stereotypical nature of ANF responses (Kiang et al. 1965Go). Because of this diversity, any classification scheme is somewhat arbitrary, because the number of potential categories is essentially unlimited and at least partially subjective. We have chosen to use a small number of classifications (4) for both PSTH type and RLF shape, including one all-encompassing "other" category.

Distributions of PSTH type across the population are shown in Fig. 1A. Sustained pure-tone responses (without a clear onset component) were the most common PSTH type (43%), whereas only 13% of the neurons were pure onset responders. Nearly a third of the population (30%) exhibited some combination of an onset and sustained response. The PSTHs of the remaining 14% of the cells did not fall neatly into one of the three other categories. This PSTH group included offset responses (n = 2), pauser-buildups (n = 6), combined onset and offset responses (n = 3), responses with regularly spaced peaks of discharge not related to the stimulus periodicity (choppers, n = 2), suppression below spontaneous rate without an excitatory region (n = 9), and unusual histograms (n = 5). Because of the paucity of pauser response types in our population (n = 6), they were not identified as a separate class (as in Le Beau et al. 1996Go and Krishna and Semple 2000Go).


Figure 1
View larger version (16K):
[in this window]
[in a new window]

 
FIG. 1. Pure-tone response characteristics across the population of 198 neurons. A: distribution of peristimulus time histogram (PSTH) types. B: distribution of rate-level function (RLF) shapes. Example PSTHs and RLFs are shown above bars in each category. Dark bottom area of each bar represents neurons that were band-pass rate-tuned to modulation frequency; rate modulation transfer functions (rMTFs) of remaining units (gray regions) did not have a single or prominent peak (see Fig. 5 for a further breakdown of MTF shapes).

 
Many PSTH type classifications were strongly dependent on the tone SPL; the high proportion of pure sustained responses is in part due to the fact that the neurons were grouped based on their responses at a relatively low SPL (usually on the initial ascending portion of the RLF). Typically, pauser-buildup and offset PSTHs emerged only at higher SPLs. The proportion of chopper units was probably underestimated, because the number of repetitions of the short pure tones was insufficient to perform reliable regularity analyses. Responses phase-locked to the carrier frequency were not observed, even in the seven neurons with BFs <600 Hz, although more repetitions may have been required to observe significant vector strength to the fine structure.

A characterization of the population based on the shapes of single-unit BF RLFs is shown in Fig. 1B. A 50% rate drop at high SPLs relative to the peak response was required for a RLF to be classified as nonmonotonic (as in Aitkin 1991Go); 24% of the neurons had a single peak and such a rate drop at high SPLs. Some units that met the 50%-drop criterion were placed in the "other" RLF shape category because of multiple peaks in the RLF or a rebound from an initial rate drop at the highest levels tested (20% of the units). The remaining cells exhibited monotonic (11%) or saturating (44%) RLF shapes over the range of levels tested (almost always 70 dB).

The pure-tone response properties examined in Fig. 1 (PSTH type and RLF shape) were broken down further with respect to the corresponding neurons’ fully modulated SAM response properties. Neurons with band-pass (BP) rMTFs (which made up 47% of the population) are shown with the dark portions of the bars in Fig. 1; cells with non-BP rMTFs are represented by the light upper segments of each bar. rMTFs in onset pure-tone responders always revealed a band-pass shape (25/26 onset cells were classified as BP over the 2- to 312-Hz fm range; one onset neuron had a rBMF of 312 Hz), whereas most rMTFs in sustained pure-tone responders (63/86) were not BP. Classifications of on + sustained or "other" pure-tone responses were not predictive of the rMTF shape: approximately one half of each category was BP tuned. Similarly, RLF shape was not predictive of the presence of a single region of excitation in the rMTF (i.e., a BP shape).

One aspect of BF pure-tone responses that has been shown to be correlated with AM responses in previous studies is the mean FSL: IC neurons with longer FSLs tend to have lower BMFs (Heil et al. 1995Go; Krishna and Semple 2000Go; Langner et al. 1987Go). This data set corroborates the finding of a weak inverse FSL-BMF correlation (FSL-rBMF Kendall’s {tau} = –0.25, P < 0.001; FSL-sBMF {tau} = –0.26, P < 0.001; FSL-srBMF {tau} = –0.23, P < 0.001). Despite the significant correlations, Fig. 2 makes it clear that FSL is in general an unreliable predictor of rBMF (the same is true for srBMF and sBMF). The different symbols in Fig. 2 denote the various PSTH types; the three major groups (sustained, onset, and on + sustained) contain neurons with a similar range of rBMFs, but the longest FSLs were found in sustained pure-tone responders. Below the scatter plot in Fig. 2 is a histogram of the FSL values across the entire population (only neurons with BP rMTFs were included in the scatter plot). There was not a significant correlation between BMF and BF (Kendall’s {tau} <0.1, P > 0.1 for rBMF, sBMF, and srBMF correlations with BF) (consistent with Krishna and Semple 2000Go). It is worth reiterating the fact that the FSLs reported here were based on responses at a relatively low SPL as opposed to the minimum mean FSL across the entire range of levels tested with pure tones. One might expect that the use of a single low SPL to derive FSLs would result in higher estimates of latency, but this was not strictly true. Some neurons in the IC exhibited an increase in FSL with level, an effect that has been termed the "paradoxical latency shift" (Sullivan 1982Go). In our population, 16% of the cells revealed such a latency level dependence, in the form of a mean FSL at a higher SPL >1 SD higher than the mean latency at the (lower) SPL used for the population analysis.


Figure 2
View larger version (17K):
[in this window]
[in a new window]

 
FIG. 2. Among the neurons with a well-defined rate best modulation frequency (rBMF), cells with a longer first-spike latency (FSL) tended to have a lower rBMF. PSTH types are indicated by the symbols (categories are the same as in Fig. 1A). Histogram of FSLs shown below the axis is a description of the entire population with a measurable FSL (n = 182) not just neurons with rBMFs. Peak histogram value corresponds to a count of 27 neurons.

 
The range of observed rBMFs was qualitatively similar to the range of sBMFs and srBMFs, but the exact value of BMF could vary considerably across the different response quantifications in a single neuron. Figure 3 shows these points, with a comparison of BMF histograms for the three metrics (Fig. 3A) and scatter plots of the three combinations of rBMF, sBMF, and srBMF (Fig. 3B), which are summarized with a correlation coefficient. BMFs were typically between 10 and 100 Hz, although the upper bound of computed BMFs was partially biased by the range of fms used (2–311 Hz). The correlation between rBMF and sBMF (R = 0.38) was significant (P < 0.01) but considerably lower than the correlations between the srBMFs and the other two metrics (rBMF x srBMF: R = 0.88, P < 0.0001; sBMF x srBMF: R = 0.58, P < 0.0001). This is not surprising, because synchrony and rate are, by definition, correlated with synchronized rate. In any case, the even spread of points above and below the diagonal (rBMF = sBMF) in the top panel of Fig. 3B suggests that the relatively low correlation between sBMF and rBMF was not caused by a consistent bias in either direction (i.e., sBMFs were not reliably lower than rBMFs or vice versa).


Figure 3
View larger version (15K):
[in this window]
[in a new window]

 
FIG. 3. Most values of rBMF, synchrony BMF (sBMF), and synchronized rate BMF (srBMF) were between 10 and 128 Hz, but BMFs based on the 3 quantifications were not always the same in single neurons. A: distributions of rBMF, sBMF, and srBMF. B: scatter plots showing the 3 combinations of BMF based on one quantification vs. BMF based on another quantification. Correlation coefficients are shown within each scatter plot. Only neurons with well-defined BMFs based on both quantifications are included in scatter plots.

 
Sharpness of tuning was quantified across the population of BP-tuned rMTFs with a quality factor Q = BMF/BW, where the bandwidth (BW) was defined as the width of the rMTF at 70% of the peak rate evoked at BMF. The resulting values are plotted as a function of the neuron’s rBMF in Fig. 4. Most (76%) of the Q-values were <1, and only three were >2. Onset responders to pure tones (+ symbols in Fig. 4) tended to be more sharply tuned (mean Q = 1.15) than the other unit types (mean Q = 0.61). There was no correlation between the quality factor and rBMF, suggesting a constant relative BW across the range of relevant fm. Both the overall average value (Q = 0.76) and the independence of Q on BMF are qualitatively consistent with psychophysical measures of AM-frequency tuning curves and masking patterns (Ewert and Dau 2000Go; Ewert et al. 2002Go; Houtgast 1989Go; Wojtczak and Viemeister 2005Go).


Figure 4
View larger version (17K):
[in this window]
[in a new window]

 
FIG. 4. Q-values based on rMTFs with clearly defined peaks. Symbols indicate PSTH type as in Fig. 2. Inset: calculation of bandwidth at 70% of peak rate; open cross shows corresponding Q and rBMF for the neuron from inset.

 
The diversity of MTFs in the IC cannot be fully appreciated by limiting analyses to a determination of the cell’s BMF, because many neurons do not have a well-defined BMF (according to the 70% drop criterion both below and above BMF over a range of stimulus fm from 2 to 312 Hz). Example rMTFs representing the five shapes encountered in the rabbit IC are shown in Fig. 5, along with the distribution of the different shapes across the population. Most common were BP-tuned rMTFs, followed by low-pass (LP), high-pass (HP), band-reject (BR), and all-pass (AP) shapes. The scarcity of AP rMTFs in the IC lies in sharp contrast to peripheral (ANF) rMTFs, which are almost exclusively AP (Cooper et al. 1993Go; Joris and Yin 1992Go).


Figure 5
View larger version (21K):
[in this window]
[in a new window]

 
FIG. 5. Example rMTFs and relative proportions of neurons exhibiting each rMTF shape. Unit identification numbers are included with each rMTF; representative band-pass (BP), low-pass (LP), high-pass (HP), band-reject (BR), and all-pass (AP) neurons are shown in each row.

 
Because IC neurons are limited in their capacity to follow envelope fluctuations faster than several hundred Hertz (Burger and Pollak 1998Go; Krishna and Semple 2000Go; Langner and Schreiner 1988Go; Rees and Moller 1983Go), MTFs based on synchrony and synchronized rate were never HP or AP; most sMTFs were either BP (49%) or LP (42%), and 70% of srMTFs were BP. It is worth noting that, although synchrony always decreased at high modulation rates, 35% (69/198) of the neurons maintained significant synchrony at the highest fm tested (usually 311 Hz; limited to 222 Hz in 30/198), and 80% (159/198) were phase-locked to the envelope at fm >100 Hz. These proportions are comparable with those reported by Krishna and Semple (2000)Go in the anesthetized gerbil.

Example MTFs and MDFs

Detailed AM responses of four example neurons are highlighted in this section; they were chosen as representatives of each of the four categories of pure-tone responses (sustained, onset + sustained, onset, and other).

REPRESENTATIVE ONSET PURE-TONE RESPONSE. Without exception, onset units exhibited BP-tuned rMTFs; Fig. 6 characterizes such a neuron in more detail. Fully modulated AM responses are shown in Fig. 6A, revealing rate tuning to stimulus fm between 30 and 100 Hz. The mismatch between the peaks in the rMTF and the sMTF resulted in a srBMF (58 Hz) between the rBMF (81 Hz) and the sBMF (41 Hz). The sharpness of rMTF tuning was relatively high in this example (Q = 1.5). Raw period and PSTHs are also included in Fig. 6, B and C. Two features of the histograms in Fig. 6 were consistently observed across neurons. First, the response phase changed near BMF (as expected with period histograms constructed using a fixed starting point in time; Fig. 6B). Second, there was negligible temporal adaptation observed over the 2-s stimulation period in the PSTHs in response to AM tones near BMF (Fig. 6, C and F).


Figure 6
View larger version (21K):
[in this window]
[in a new window]

 
FIG. 6. AM response properties of a neuron with a pure onset response to best frequency (BF) tones as a function of modulation frequency (fm, A–C) and modulation depth (m, D–F). A: MTFs based on rate (rMTF), synchrony (sMTF), and synchronized rate (srMTF). Asterisks correspond to the srBMF (and fm used in D–F). Quantifications including synchrony are only plotted for significant values of vector strength. B: period histograms for each fm represented in MTF. Spikes were binned into 1 period of modulation, and resulting histogram was plotted twice for clarity. C: PSTHs for each tested fm. D: modulation depth functions (MDFs), again based on rate, synchrony, and synchronized rate. Rate responses plotted with open squares were significantly different (P < 0.05) from rate response to lowest tested m (–35 dB). Bold portion of axes indicates an approximate range of behavioral AM detection thresholds. E and F: period histograms and PSTHs, respectively, for each tested m. SPL = 20 dB, fc = 2,000 Hz.

 
Stimulus AM depth was varied over a 35-dB range in 5-dB steps to obtain the responses shown in Fig. 6D. This corresponded to linear modulation depths ranging from 1.8 to 100%. For reference, human tone-carrier AM detection thresholds at medium SPLs and modulation rates <150 Hz are typically –25 dB (m = 5.6%) or lower (Kohlrausch et al. 2000Go). This range is highlighted in the panels of Fig. 6D by the thicker portions of the abscissa. First, consider the rMDF (the curve with error bars in Fig. 6D), which was typical of many neurons in the population in terms of its flat (m-independent) characteristic for depths from –35 to –15 dB. Rates that were significantly different from the rate response to the –35 dB AM tone are indicated with open squares. The lowest depth that elicited such a significant change (see METHODS) was defined as the neuron’s rate threshold ({theta}rate = –10 dB).

Plotted alongside the rMDF in Fig. 6D is the srMDF (bullet), which only includes values that were computed with a significant synchrony coefficient. A consistent offset between the rMDF and srMDF indicates a constant value of vector strength across depth. This is confirmed with the sMDF ({circ}, bottom panel), in which vector strength values were all between 0.6 and 0.8. Another way to interpret depth-independent response synchrony is in terms of a modulation gain that decreases with increasing m. Single-cell synchrony-based neural thresholds were defined as the lowest modulation depth that evoked a significantly envelope-locked response ({theta}sync = –10 dB for the neuron shown in Fig. 6). Visual inspection of the period histograms in Fig. 6E suggests that the statistical criteria used to define synchrony threshold were reasonable: response modulation clearly emerges at m = –10 dB. Also, the period histograms indicate that phase of the response did not change appreciably as m was varied. As with the PSTHs shown for different modulation frequencies (Fig. 6C), there was no evidence for gross, slow temporal adaptation in the PSTHs at different modulation depths (Fig. 6F).

REPRESENTATIVE ON + SUSTAINED PURE-TONE RESPONSE. Several aspects of the AM responses of the on + sustained pure-tone responder (Fig. 7) are fundamentally different from those of the pure onset responders. Perhaps the most salient difference was that the entire range of 100%-AM stimuli elicited synchronized firing (from 2 to 311 Hz). The resulting rMTF reveals weaker tuning (Q = 0.81), with a peak at 81 Hz (the same rBMF as the representative onset neuron). Vector strength reached a maximum of 0.56 at 113 Hz, and synchronized rate peaked at 58 Hz. Period histograms at low fms show that the probability of firing remained relatively constant for a longer portion of the corresponding stimulus waveform than for the onset neuron, with a weak cycle-by-cycle onset adaptation component. The shape of the period histogram at the srBMF (58 Hz) was somewhat more complex, with two peaks near the onset of adaptation of the response during each cycle. This multimodal period histogram shape emerged only at higher modulation depths (Fig. 7E). PSTHs plotted in Fig. 7, C and F, again suggest minimal adaptation over a time scale on the order of hundreds of milliseconds.


Figure 7
View larger version (25K):
[in this window]
[in a new window]

 
FIG. 7. AM responses of a representative neuron with an onset + sustained BF tone PSTH. Format is the same as Fig. 6. SPL = 40 dB, fc = 3,900 Hz.

 
The rMDF measured at 58 Hz (Fig. 7D) shows that the peak in the rMTF became apparent only at high modulation depths ({theta}rate = 0 dB). In contrast, synchronization to the period of the modulating waveform was significant at –20 dB, and synchrony increased monotonically with increasing m. Correspondingly, the period histograms were modulated, and the timing of spikes became more phase-locked to a particular phase of the envelope as m was varied between –20 and 0 dB (Fig. 7E).

REPRESENTATIVE SUSTAINED PURE-TONE RESPONSE. Pure sustained responders to short tone bursts were usually associated with rMTFs that were not BP (Fig. 1A). An example of such a cell is described in terms of its MTFs and MDFs in Fig. 8. The main feature of the rMTF of Fig. 8A was the presence of a broadly tuned suppressive region (i.e., a BR rMTF); a complementary (BP) sMTF had a peak within the region of rate suppression. Taking the product of synchrony and rate resulted in a srMTF with both a region of suppression at lower fms, an excitatory region at higher fms, and a srBMF that matched the sBMF of 113 Hz (Fig. 8A). The period histograms in Fig. 8B indicate that the drop in rate was largely mediated by a suppression of firing after the onset response elicited in each cycle and that the rate recovery at higher fms was not strongly synchronized (although VS remained significant, ≤311 Hz).


Figure 8
View larger version (25K):
[in this window]
[in a new window]

 
FIG. 8. AM responses of a representative neuron with a sustained BF tone PSTH. Format matched to Figs. 6 and 7. SPL = 20 dB, fc = 3,900 Hz.

 
Because the location of the srBMF was within the region of rate suppression, increasing the stimulus m in the MDF protocol resulted in a drop in average rate. This rate drop is quantified in the rMDF of Fig. 8D. In terms of information-carrying capacity, a rate code that is based on a decrease in firing rate is just as effective as one that requires an increase in rate as the stimulus depth is increased. However, the cell’s {theta}rate of –10 dB based on a drop in rate was still much higher than human AM detection abilities. As with the first two example neurons, a timing-based metric such as synchrony was more sensitive to low-depth AM stimulation: the cell’s {theta}sync was –25 dB. Also, the value of VS increased monotonically with depth; this trend was mainly observed in neurons that responded to pure tones with a substantial sustained rate (see GoGoGoGoFig. 13B).


Figure 9
View larger version (23K):
[in this window]
[in a new window]

 
FIG. 9. AM responses of a representative neuron that responded with a decrease in rate when stimulated using pure tones. Format is the same as Figs. 68. SPL = 10 dB, fc = 12,885 Hz.

 

Figure 10
View larger version (20K):
[in this window]
[in a new window]

 
FIG. 10. Neural AM detection thresholds based on rate and synchrony. Axis labels marked with an X indicate that thresholds were immeasurable and that average rate was depth independent over entire dynamic range (all neurons responded with significant synchrony at m = 0 dB). A: lowest AM depth that elicited a significant change in rate (open circles) or a significant value of vector strength (crosses) for each of the 164 tested neurons. Histograms framing the y-axes in A show the distribution of rate thresholds (left axis; peak histogram value = 47 neurons) and synchrony thresholds (right axis; peak histogram value = 39 neurons). Bold portions of ordinate indicate approximate range of perceptual AM detection thresholds. B: rate threshold vs. synchrony threshold scatter plot. Points located above diagonal of equality in scatter plot indicate that the cell’s rate threshold was higher than its synchrony threshold. In both A and B, a uniformly distributed random number between –2 and 2 was added to value of threshold to facilitate visualization of responses of all of neurons. For reference, human tone-carrier AM detection thresholds at suprathreshold SPLs and modulation frequencies below ~120 Hz are between –20 and –30 dB (Kohlrausch et al. 2000Go).

 

Figure 11
View larger version (25K):
[in this window]
[in a new window]

 
FIG. 11. Increasing SPL does not systematically improve AM sensitivity in inferior colliculus (IC) neurons. A: rate-based thresholds in 33 neurons as a function of the SPL of stimulation. B: synchrony-based thresholds in the same population. In both A and B, neurons with thresholds that were level independent or increasing with SPL are depicted with solid black lines (22/33 for both rate and synchrony), and sensitivities that improved with level are plotted with the lighter gray lines.

 

Figure 12
View larger version (22K):
[in this window]
[in a new window]

 
FIG. 12. Neural AM detection thresholds in the presence of a competing masker modulation. Thresholds are plotted as in Fig. 10A, but some neurons are represented by >1 point in these panels, if they were tested at several masker depths. Human psychophysical thresholds are plotted for reference in both panels. A: rate thresholds. B: synchrony thresholds. See text for details concerning the lack of thresholds at 0 dB (and –5 dB for the –13-dB rms masker modulation depth).

 

Figure 13
View larger version (26K):
[in this window]
[in a new window]

 
FIG. 13. Changes in average rate can account for psychophysical amplitude-modulation depth discrimination, but only at high standard depths. A: comparison of neural predictions of depth discrimination performance in 20 neurons (black lines) and human psychophysical thresholds for the same task (gray connected circles, from Ewert and Dau 2004Go). B: raw rate MDFs (gray lines with error bars) and synchrony MDFs (black lines) for the same neurons examined in A.

 
REPRESENTATIVE INHIBITED PURE-TONE RESPONSE. Nine neurons in the population did not exhibit a clear excitatory response at any tested frequency or SPL using pure tones (a small island of excitation may have potentially been revealed had we used finer SPL-resolution; in that case, these neurons might belong to the type O response area category as defined in Ramachandran et al. 1999Go). The BFs of the units belonging to this subset were defined based on the tone frequency that evoked an inhibitory response (below spontaneous rate) at the lowest effective SPL. Figure 9 shows that the responses of such neurons can be highly dependent on the modulation properties of the stimulus. All three of the MTFs shown in Fig. 9A were LP in shape; rate and synchronized rate dropped to 70% of their peak values at ~65 Hz, whereas synchrony was ≥70% of its peak value (VSmax = 0.8) ≤107 Hz.

The characteristic phase [the low-fm y-intercept of the phase-MTF (data not shown)] was ~180° out of phase with respect to those observed in the neurons shown in Figs. 68. This can be qualitatively verified by comparing the period histograms at the lowest tested fms across the four example neurons and suggests that the neuron was released from inhibition at times corresponding to the valleys of the modulating waveform. Also, in contrast to the first three examples, a weak form of slow adaptation was observed in the PSTHs: onset inhibition was slowly released over a time-course of ~1 s (this is most clearly shown by the 113-Hz PSTH; Fig. 9C).

Because there was not a clear peak in the srMTF of Fig. 9A, a modulation of 40 Hz was chosen for the MDF simply as an AM frequency within the pass-band of all three MTFs. Interestingly, based on the MDFs alone, the neurons characterized in Figs. 6D and 9D were remarkably similar (despite their obvious differences in pure-tone responses). The rMDF revealed an increase in firing rate with increasing depth (and longer effective times of release from inhibition), with a {theta}rate of –10 dB (Fig. 9D). The synchrony-based threshold was, once again, lower than the rate threshold ({theta}sync = –15 dB). Although VS values were similar for –15 dB ≤ m ≤ –5 dB, the corresponding period histograms were quite different. This shows one of the limitations of the synchronization coefficient alone as a general description of temporal response characteristics.

Single-unit rate- and synchrony-based AM thresholds

Performance in three basic psychoacoustic AM tasks was predicted based on changes in neural responses at different modulation depths that were quantified in terms of average rate and synchrony. The three psychophysical paradigms are 1) pure SAM detection, 2) masked SAM detection, and 3) SAM depth discrimination.

PURE SAM DETECTION. Human listeners can discriminate the difference between a pure tone and a SAM tone at modulation depths lower than –30 dB (Zwicker 1952Go). At a given SPL, performance does not systematically depend on fm or fc for modulation frequencies between 10 and 150 Hz and carrier frequencies >1,000 Hz (Kohlrausch et al. 2000Go). Sensitivity to SAM is best at higher SPLs, but thresholds can remain lower than –20 dB at low sensation levels (Kohlrausch et al. 2000Go). There is indirect psychophysical evidence suggesting that listeners probably use audio-frequency channels other than that of the carrier to perform the task at high SPLs, where the effective level is lower and peripheral saturation and compression are less likely to have a strong influence on performance (Kohlrausch et al. 2000Go; Ruggero et al. 1997Go).

Neural rate and synchrony SAM detection thresholds across the population of 164 neurons are shown in Fig. 10A as a function of the stimulus modulation rate. Consistent with the example neurons described in the previous section, the vast majority of rate thresholds ({circ}) were –10 dB or higher (139/164 neurons). If the rate did not change across the entire range of m, thresholds were deemed immeasurable; this group of neurons is identified with the X on the axes in Fig. 10. Five neurons had a rate threshold of –20 dB, and 20 responded with a significant change in rate at –15 dB. The histogram of rate thresholds to the left of Fig. 10A reinforces the fact that rate changes in single IC neurons were, in general, poor predictors of human SAM sensitivity. There was not a strong relationship between rate thresholds and fm of stimulation, which was set equal to the most prominent peak in the cell’s srMTF.

Synchrony-based thresholds (shown in the histogram to the right of Fig. 10A) were more evenly distributed across the perceptually relevant dynamic range than the values of {theta}rate. Twenty-eight percent (46/164) of the neurons had synchrony thresholds of –20 dB or lower; three units were significantly phase-locked at a modulation depth of –30 dB. Examination of the individual neural thresholds (Fig. 10A, X symbols) reveals no obvious trends either in maximum sensitivity or threshold distribution as a function of the stimulus fm.

Figure 10B shows a feature of the data that is suggested by but not explicitly contained in Fig. 10A: {theta}sync was almost always lower than {theta}rate on a neuron-by-neuron basis. In the scatter plot of Fig. 10B, this aspect of the data takes the form of almost all of the points lying above the diagonal. The three neurons that had the most sensitive synchrony thresholds ({theta}sync = –30 dB, the 3 left points in Fig. 10B) had corresponding rate thresholds of –15 and –5 dB and one immeasurable {theta}rate.

As mentioned above, overall SPL can affect behavioral SAM detection thresholds. The neural responses summarized in Figs. 210 were obtained at an SPL chosen based on individual RLFs: the SPL used for AM stimulation was set to a level on the ascending portion of the RLF, or at its peak if it was sharply nonmonotonic. One obvious question is whether the relatively poor neural rate-based thresholds might improve if a higher SPL were chosen. In 33 neurons, this question was addressed by remeasuring MTFs and MDFs at a level typically 20–40 dB higher than that used for the low-SPL responses. The stimulus fm for each MDF was determined based on the peak in the srMTF, which could vary across SPL (for a more detailed discussion of the level-dependence of MTF shapes, see Krishna and Semple 2000Go).

The resulting rate- and synchrony-based thresholds for this subset of the population are shown for both tested SPLs in Fig. 11. Thresholds that increased (i.e., sensitivity got worse) or remained the same with increasing SPL are plotted with solid lines; the other neurons, which exhibited a decrease in threshold at the higher SPL, are represented by dashed lines. The majority of rate thresholds (22/33) did not improve at high SPLs (Fig. 11A), and 11 of the comparisons revealed an increase in {theta}rate at higher SPLs. The 11 cells that did exhibit an improvement in rate-based sensitivity with level still did not approach human detection thresholds at comparable SPLs (the lowest {theta}rate >40 dB SPL was 20 log m = –15 dB). Synchrony-based thresholds (Fig. 11B) were also more likely to show a decrease in sensitivity at high SPLs (16/33) than an improvement (11/33). The average {theta}rate increased by 0.9 dB at the higher SPL, and the average {theta}sync increased by 1.1 dB at the higher level. Overall, the trends shown in Fig. 11 suggest that the use of a relatively low SPL in the population analysis (e.g., Fig. 10) probably did not bias the results toward higher thresholds.

We conclude that, for pure SAM detection, some temporal aspect of the envelope-locked response (i.e., synchrony) at the level of the IC must be taken into consideration to account for psychophysical detection thresholds based on the responses of single neurons.

MASKED SAM DETECTION. Psychophysical experiments studying the effect of a competing masker modulation on the detectability of a SAM signal modulation have shown effects of the frequency relationship between masker and signal (Bacon and Grantham 1989Go; Ewert and Dau 2000Go; Ewert et al. 2002Go; Houtgast 1989Go; Strickland and Viemeister 1996Go) and of the "level" or modulation depth of the masker (Bacon and Grantham 1989Go; Nelson and Carney 2006Go; Strickland and Viemeister 1996Go). Because a main focus of this set of experiments was to establish the modulation-depth dependence of responses in the IC, neural masked thresholds were determined with a narrowband Gaussian masker (centered on the sinusoidal signal frequency) at several masker depths.

Over a 10-dB range of masker depths from –23 to –13 dB rms, psychophysical SAM detection thresholds in a similar task (with fm = 64 Hz, masker BW = 32 Hz, fc = 5,500 Hz, and SPL = 65 dB) increased monotonically as the masker fluctuations became stronger (more details and thresholds in a wider range of stimulus conditions can be found in Nelson and Carney 2006Go). The behavioral thresholds are shown in Fig. 12, along with neural thresholds obtained in 28 units, again based on rate and synchrony to the signal envelope frequency. The signal fm was set to the peak in the cell’s srBMF, and the masker bandwidth was fixed at one half of the SAM signal frequency. To avoid overmodulation (i.e., a modulation depth > 1), the signal depth was restricted to values ≤ –5 dB for masker depths of –23 and –18 dB rms and to values ≤ –10 dB for the –13 dB rms masker. As a result, there were no predicted neural thresholds at 0 dB for any masker level and no thresholds at –5 or 0 dB (20 log m) for the –13 dB rms condition.

Figure 12A shows that, in general, average rates were more successful in predicting masked thresholds than they were at predicting pure SAM detection (i.e., Fig. 10). A small group of cells (8) exhibited rate thresholds within ~5 dB of the listeners’ thresholds at one or more masker depths. This is weakly suggestive that the functional contributions of rate and synchrony may depend on the range of relevant modulation depths in a given task. We will come back to this idea in the following section (see SAM DEPTH DISCRIMINATION), which considers a general discrimination task across the entire perceptual modulation-depth dynamic range.

Synchronization to the signal SAM was significant in some neurons at depths even lower than the behavioral thresholds, most notably at the highest tested masker modulation depth, where 7/24 neural thresholds were below the psychophysical data (Fig. 12B). As with the pure SAM detection population analysis, the distribution of synchrony thresholds was more uniform than the rate-based distribution (which was skewed toward higher or immeasurable predictions of threshold). Trends in threshold across masker depth for individual neurons are not shown in Fig. 12 for clarity, because there was not a consistently observed increase or decrease in predicted sensitivity with increasing masker level (as there was in the psychophysical data).

SAM DEPTH DISCRIMINATION. Another fundamental measure of envelope processing in psychoacoustics is SAM depth discrimination, which describes the ability of the system to resolve small changes in m. Pure SAM detection is a special case of this more general paradigm: the standard depth (ms) for detection is set to 0, and the comparison depth (mc) is adjusted until it is just noticeably different from the standard interval. The same procedure can be repeated for any value of ms. Psychophysical measurements of pure-tone carrier SAM-depth discrimination reveal thresholds that are approxi