JN Watch the video to learn how APS reaches out to developing nations.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J Neurophysiol 90: 456-476, 2003. First published March 26, 2003; doi:10.1152/jn.00851.2002
0022-3077/03 $5.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
90/1/456    most recent
00851.2002v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (13)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Qiu, A.
Right arrow Articles by Escabí, M. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Qiu, A.
Right arrow Articles by Escabí, M. A.

Gabor Analysis of Auditory Midbrain Receptive Fields: Spectro-Temporal and Binaural Composition

Anqi Qiu1, Christoph E. Schreiner3 and Monty A. Escabí1,2

1Biomedical Engineering Program and 2Department of Electrical and Computer Engineering, University of Connecticut, Storrs, Connecticut 06269-2157; and 3W. M. Keck Center for Integrative Neuroscience, University of California, San Francisco, California 94143

Submitted 25 September 2002; accepted in final form 3 March 2003


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ACKNOWLEDGMENTS
 REFERENCES
 
The spectro-temporal receptive field (STRF) is a model representation of the excitatory and inhibitory integration area of auditory neurons. Recently it has been used to study spectral and temporal aspects of monaural integration in auditory centers. Here we report the properties of monaural STRFs and the relationship between ipsi- and contralateral inputs to neurons of the central nucleus of cat inferior colliculus (ICC) of cats. First, we use an optimal singular-value decomposition method to approximate auditory STRFs as a sum of time-frequency separable Gabor functions. This procedure extracts nine physiologically meaningful parameters. The STRFs of ~60% of collicular neurons are well described by a time-frequency separable Gabor STRF model, whereas the remaining neurons exhibited obliquely oriented or multiple excitatory/inhibitory subfields that require a nonseparable Gabor fitting procedure. Parametric analysis reveals distinct spectro-temporal tradeoffs in receptive field size and modulation filtering resolution. Comparisons between an identical model used to study spatio-temporal integration areas of visual neurons further shows that auditory and visual STRFs share numerous structural properties. We then use the Gabor STRF model to compare quantitatively receptive field properties of contra- and ipsilateral inputs to the ICC. We show that most interaural STRF parameters are highly correlated bilaterally. However, the spectral and temporal phases of ipsi- and contralateral STRFs often differ significantly. This suggests that activity originating from each ear share various spectro-temporal response properties such as their temporal delay, bandwidth, and center frequency but have shifted or interleaved patterns of excitation and inhibition. These differences in converging monaural receptive fields expand binaural processing capacity beyond interaural time and intensity aspects and may enable colliculus neurons to detect disparities in the spectro-temporal composition of the binaural input.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ACKNOWLEDGMENTS
 REFERENCES
 
Auditory neurons are unique for their ability to process rapidly varying stimuli and track changes in the stimulus spectrum. Neurons in central auditory stations are highly sensitive to dynamic variations in the temporal, spectral, intensity, and aural composition of the sensory stimulus (Goldberg and Brown 1969Go; Irvine and Gago 1990Go; Krishna and Semple 2000Go; Kuwada et al. 1997Go; Langner and Schreiner 1988Go; Ramachandran et al. 1999Go; Rees and Møller 1983Go). Although numerous studies have evaluated the response characteristics to structurally simple stimuli, only a handful of studies have analyzed the joint spectral, temporal, and/or binaural receptive field arrangements responsible for this response diversity (Depireux et al. 2001Go; Miller et al. 2002Go; Sen et al. 2001Go).

Auditory receptive fields are typically derived with isolated pure tones that are presented at varying frequencies and intensities or by measuring neural sensitivity to narrowband time-varying stimuli (e.g., Krishna and Semple 2000Go; Langner and Schreiner 1988Go; Ramachandran et al. 1999Go; Rees and Møller 1983Go). Recently, the auditory spectro-temporal receptive field (STRF), a linear model representation of the integration area of a neuron, has expanded these classical methods. The auditory STRF has the advantage that it simultaneously describes spectral and temporal stimulus attributes that preferentially activate a neuron and can be used to identify the spectral arrangement and temporal dynamics of neural excitation and inhibition of a neuron during dynamic broadband stimulation (Aersten et al. 1980Go; deCharms et al. 1998Go; Depireux 2001; Escabí and Schreiner 2002Go; Klein et al. 2000Go; Miller et al. 2002Go; Nelken et al. 1997Go; Sen et al. 2001Go; Theunissen et al. 2000Go). In particular, the STRF technique is useful for predicting neuronal response patterns to complex auditory stimuli, including natural sounds (Aersten et al. 1980Go; Klein et al. 2000Go; Sen et al. 2001Go; Theunissen et al. 2000Go), and can accurately account for spatial selectivity profiles that contribute to sound localization (Schnupp et al. 2001Go).

In the visual system, the direct counterpart of the auditory STRF is the spatio-temporal receptive field. Here the spectral dimension (which extends along the primary sensory epithelium receptor surface of the cochlea) is replaced by spatial dimensions along the retinal sensory epithelium (Cai et al. 1997Go; DeAngelis et al. 1995Go; De Valois and Cottaris 1998Go; Shamma 2001Go). Visual neurophysiologists have used Gabor and Gamma functions as quantitative descriptors of visual STRFs (Cai et al. 1997Go; DeAngelis et al. 1993aGo, 1999Go; Jones and Palmer 1987aGo,bGo). Advantages for fitting visual STRFs by quantitative functions include: improved estimates of the spatio-temporal structure of visual response areas and the removal of estimation noise. Furthermore, these model STRFs can be used to study the arrangements of excitatory and inhibitory neural inputs and to extract physiologically meaningful parameters from neural data (DeAngelis et al. 1993aGo, 1999Go). Although it has been suggested that auditory and visual STRFs have remarkably similar time-varying structure (deCharms et al. 1998Go; Shamma 2001Go), only a few studies have quantitatively evaluated the spectro-temporal structure of auditory STRFs (Depireux et al. 2001Go; Escabí and Schreiner 2002Go; Miller et al. 2002Go; Sen et al. 2001Go). However, these studies did not quantitatively compare the structure of the auditory STRF directly with their visual counterpart.

In this study, we present a time-frequency Gabor STRF model to fit auditory STRFs in the central nucleus of cat's inferior colliculus (ICC). Spectral and temporal Gabor functions are used to model spectral receptive field (SRF) and temporal receptive field (TRF) profiles of ICC neurons, respectively. Each STRF is then fitted by a weighted sum of products of time-frequency separable Gabor functions. From the definition of a Gabor function, nine physiologically meaningful parameters are extracted: the center frequency, the best ripple density, the best temporal modulation frequency, the peak latency, the bandwidth of the SRF profile, the response duration, the response strength, and the spectral and temporal phases. These parameters are used to quantify spectral, temporal, and time-frequency response characteristics to dynamic moving ripple stimuli (Escabí and Schreiner 2002Go; Miller et al. 2002Go). This Gabor STRF model is a direct extension of receptive field models used to study the structure of visual receptive fields in the primary visual cortex (DeAngelis et al. 1993aGo,bGo, 1999Go) and provides a basis for comparing the structure of auditory and visual STRFs. In particular, we apply this methodology to compare STRF properties of contra- and ipsilateral inputs to ICC neurons. We demonstrate specific aural STRF differences that suggest binaural filtering mechanisms beyond intra-aural time and level sensitivity.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ACKNOWLEDGMENTS
 REFERENCES
 
Electrophysiology

Physiological recording methods have been presented in detail elsewhere (Escabí and Schreiner 2002Go). Briefly, cats (n = 4) were initially anesthetized with a mixture of ketamine HCl (10 mg/kg) and acepromazine (0.28 mg/kg im). A surgical state of anesthesia was induced with ~30 mg/kg pentobarbital sodium (Nembutal) and maintained throughout the surgery with supplements via an intravenous infusion line. Body temperature was measured and maintained at ~37.5°C. The overlying cerebrum and part of the bony tentorium was removed to expose the ICC via a dorsal approach. During the unit recordings, animals were maintained in an areflexive state via continuous infusion of ketamine (2–4 mg · kg1 · h1) and diazepam (0.4–1 mg · kg1 · h1) in lactated Ringer solution (1–4 mg · kg1 · h1). The infusion rate was adjusted according to physiologic criteria (heart rate, breathing rate, temperature, and peripheral reflexes). All surgical methods and experiment procedures follow National Institutes of Health and U.S. Department of Agriculture guidelines.

Neural data was acquired from n = 99 single units in the ICC with parylen-coated tungsten microelectrodes (Microprobe, Potomac, MD; 1–3 M{Omega} at 1 kHz) that were advanced into the central nucleus with a hydraulic microdrive (David Kopft Instruments, Tujunga, CA). Action potential traces were recorded onto a digital audio tape (Cygnus Technologies CDAT16; Delaware Water Gap, PA) at a sampling rate of 24.0 kHz (41.7-µs resolution) and spike sorted off-line with a Bayesian spike sorting algorithm (Lewicki 1994Go).

Acoustic stimuli

Dynamic moving ripple (DMR) stimuli (Escabí and Schreiner 2002Go) were presented with the animal in a sound-shielded chamber (IAC, Bronx, NY) with stimuli delivered via a closed, binaural speaker system (electrostatic diaphragms from Stax). The Dynamic Moving Ripple sound is specifically designed to dynamically activate the primary sensory epithelium and to probe the physiologically relevant range of spectral and temporal stimulus modulations of neurons in an unbiased fashion. Sounds were presented binaurally with an independent sound sequence to each ear—from which independent contra- and ipsi-lateral STRFs were computed via spike-triggered averaging (Escabí and Schreiner 2002Go).

In three experiments, the DMR stimulus was presented for a period of 10–20 min (Escabí and Schreiner, 2002Go). In one experiment, a two-repeat 4-min sequence of the DMR (8 min total) was presented. In all experiments, stimuli covered the same range of spectral and temporal parameters and were presented at ~30–70 dB above the neurons response threshold.

Gabor STRF model

STRFs were decomposed into a superposition of time-frequency separable functions from which we could model and fit each component by a spectro-temporal Gabor function (product of Gaussian and cosine; Fig. 3). Measured STRFs were first decomposed using a singular value decomposition (SVD) (Depireux et al. 2001Go; Press et al. 1995Go; Theunissen et al. 2000Go) into a sum of separable STRF components (STRFi)

(1)
where U and V are unitary orthogonal matrixes containing the temporal and spectral receptive field profiles of each STRF component (Fig. 3, B and C; top and right); S is a diagonal matrix with real, non-negative elements, {sigma}i, in descending rank order according to energy; and * denotes the Hermitian transpose. Each STRF component, STRFi, is obtained by the vector product

(2)
where {sigma}i is the ith singular value of STRF(t, x) and determines the energy of the ith STRF component. ui and vi are the ith unitary orthogonal vectors of U and V, respectively. Conceptually, these correspond to the spectral and temporal receptive field profiles of each component STRF (e.g., shown on the top and right of Fig. 3, B and C). The dominant spectral and temporal receptive field profiles, u1 and v1, account for ~80% of the total STRF energy, and we therefore use these to quantify spectral and temporal response characteristics throughout.



View larger version (24K):
[in this window]
[in a new window]
 
FIG. 3. Schematic diagram of the Gabor STRF model. A singular value decomposition procedure (SVD; see METHODS) is used to decompose the measured STRF into a weighted sum of separable STRF components (STRF1, STRF2,...; B and C; shown for the 1st 2 components only). The SRF profile at the peak latency and the TRF profile at the center frequency of each separable STRF component are illustrated on the right and top of B and C, respectively. SRF and TRF profiles are then individually fitted by a Gabor function (D and E; top and right waveforms). Each separable STRF component is described by the product of 2 Gabor functions [Gi(x) and Hi(t)] in D and E. Finally, the fitted STRF (STRFm, F) is modeled as the weighted sum of the statistically significant separable STRF components (from D and E).

 

According to the SVD procedure, every STRFi component is time-frequency separable (although the entire STRF may be nonseparable). Therefore each component can be modeled by the product of a spectral and a temporal waveform, which we approximate by a Gabor function. Thus the fitted STRF model is expressed as a weighted sum of a finite set of N of statistically significant separable Gabor components (typically, N = 1 or 2)

(3)
where STRFm(t, x) (e.g., in Fig. 3F) is the fitted STRF model. STRFim(t, x) (e.g., in Fig. 3, D and E) is the fitted STRFi component. Ki, Gi(x), and Hi(t) correspond to the response strength, the fitted and normalized SRF profile, and the fitted and normalized TRF profile of the ith STRF component, STRFi. The modeled spectral and temporal profiles, Gi(x) and Hi(t), assume the form of a Gabor function (see Eqs. 11 and 13, respectively) each with an independent set of spectral and temporal parameters. Finally, the variable sign assumes a value of 1 or –1 and is included in the model to designate the type of STRF, which can be dominantly excitatory (+) or inhibitory (–), respectively. The optimal parameters of the Gabor-STRF model are determined iteratively by minimizing the mean square error between the model and the real data (Press et al. 1995Go).

Level of noise

Auditory STRFs are estimated from real neural data by a spike-triggered average method (Escabí and Schreiner 2002Go) that is inherently noisy. Measurement noise corresponds to random deviations from the expected STRF that would result from an infinite amount of averaging. These variations result from unexpected variations in the neural response and from finite data averaging due to the finite experiment recording periods (Klein et al. 2000Go; Theunissen 2000). Therefore to minimize the effects of noise, it is necessary to consider only those independent time-frequency components of the Gabor STRF model that significantly contribute to the STRF's energy and structure.

To determine the maximum number of independent dimensions of the STRF that contribute to its structure (N in Eq. 3), it is essential to quantify the STRF noise level. Singular values that exceed the measured noise level typically contribute significantly to the neural response and should therefore be incorporated into the Gabor STRF model; alternately, singular values that fall below the noise level contribute largely to the noise and can therefore be ignored. A significant noise level (P < 0.01) was determined empirically via a bootstrap STRF re-estimation procedure for a random Poisson firing neuron of identical spike rate as the neuron under investigation. Twenty-five randomly constructed STRFs, STRFr (e.g., Fig. 4A), were simulated by correlating a random Poisson spike train of firing rate, {lambda}, with the dynamic moving ripple noise stimulus. The first singular value ({sigma}r1) of each random-STRF, STRFr, was obtained directly by performing a SVD. For each of the 25 trials (shown by vertical red circles in Fig. 4B), the measured level of noise was randomly distributed. Therefore the desired threshold noise level for a specific spike rate (solid line in Fig. 4B) was determined as the sum of the mean of {sigma}r1 and 2.57 times its SD (P < 0.01). The mean ± SD of {sigma}r1 were calculated from the 25 simulated samples by a bootstrap resampling technique (Efron and Tibshirani 1993Go). All first-order STRFs considered here were above the estimated noise level.



View larger version (52K):
[in this window]
[in a new window]
 
FIG. 4. Significance analysis of the Gabor STRF model. A random noise STRF, STRFr (A), is generated by reverse-correlating the dynamic moving ripple sound and a random, Poisson-distributed spike train at a specific spike rate ({lambda} = 3.93 spikes/s for this example). The noise level is obtained by measuring the first singular value ({sigma}r1 = 0.42 for the shown example) of the STRFr with the SVD method used to break up the STRF into separable components (Fig. 3). For each spike rate this procedure is resimulated 25 times to estimate the distribution of noise-levels (vertical red circles in B). A resampling bootstrap technique is used to estimate the threshold-level required to achieve a significance of P < 0.01 at each spike rate (continuous line, B and C). The relationship between the noise-threshold level, the measured spike rate and the 1st, 2nd, and 3rd singular values obtained from the STRFs of all neurons is depicted in (C). All of the 1st-singular values (100%) exceed the noise threshold (red *), whereas only 39.7% of the 2nd (blue {diamond}), and 7.5% of the 3rd singular (green {circ}) values exceed significance (P < 0.01). Energy contribution of the separable STRF component (D). The 1st STRF component, (STRF1) accounts for 78.9 ± 15.7% (mean ± SD) of the STRFs energy. The contributions of the 2nd (6.2 ± 5.0%) and 3rd (2.3 ± 1.8%) STRF components is significantly smaller.

 

Similarity index

The Gabor STRF model can potentially account for much of the structure of collicular receptive fields, however, the utility of the model needs to be quantitatively evaluated. We devised three metrics to validate the goodness of fit of the model. We evaluated the goodness of fit of SRF and TRF profiles independently and for the entire STRF.

To compare the receptive field structure of the model and data, we devised the spectral similarity index (SIs), temporal similarity index (SIt) and spectro-temporal similarity index (SI). The spectral SI, SIs, accounts for differences in shape between original and model SRF profiles; SIt is used to compare the original and model TRF profiles; the spectro-temporal SI, SI, measures shape differences between original and model STRFs. Individually these metrics correspond to a correlation analysis performed between the model and original data (DeAngelis et al. 1999Go; Escabí and Schreiner 2002Go; Miller et al. 2002Go) and can be expressed as

(4)

(5)

(6)
where >,< corresponds to the vector correlation, and || · || designates the vector norm operator. Because the STRF is formally defined by a two-dimensional matrix of spectral and temporal samples, Eq. 6 could not be evaluated directly since it requires vector inputs. Therefore the statistically significant samples of the STRF that exceeded a significance criterion of P < 0.002, were converted into a unidimensional vector, from which the SI was determined using Eq. 6 (Escabí and Schreiner 2002Go).

Because all three similarity indices are effectively correlation coefficients between the real data and model waveforms, they assume a value of one whenever the waveforms inside their arguments are identical in shape, zero if the waveforms have nothing in common and negative one if the waveforms have identical shapes but differ by a negative sign.

Normalized mean square error

A fourth metric was defined that quantifies the relative difference in energy between the fitted (STRFm) and the measured STRF (STRF). The normalized mean square error (MSE) is defined as the energy of the difference STRF normalized by the energy of a measured STRF (DeAngelis et al. 1999Go)

(7)
The MSE assumes values between zero and one, where lower MSE values are indicative of a properly fitted STRF.

Temporal asymmetry index

Initial evaluation of the temporal receptive field envelope revealed that timing profiles of ICC neurons are characterized by sharp transient onset. We therefore quantitatively evaluated the structure of the temporal response envelope. To evaluate the degree of temporal asymmetry in the TRF profile, we define an asymmetry index ({alpha}t) as the skewness of the temporal envelope (Bliss 1967Go)

(8)
where µt is the mean or centroid of the temporal envelope, Et(t), measured at the center frequency (x0) of the neuron and normalized for unit area. A temporal asymmetry index of zero is observed only for TRF envelopes with perfectly symmetric envelopes about the mean point, µt. A {alpha}t significantly less than 0 indicates that the TRF profile is skewed to the right; and a {alpha}t significantly greater than 0 indicates the TRF profile is skewed to the left.

Separability index

An inherent aspect of the Gabor model is that it is composed of multiple receptive field components, each of which is a time-frequency separable function. If the receptive field contains only one singular value, the receptive field is time-frequency separable; that is, it can be described by a multiplicative product of a temporal and spectral receptive field profile as in Eq. 2. Hypothetically, such a neuron would encode spectral and temporal information independently. If, alternately, the receptive field has multiple significant singular values, the receptive field will exhibit time-frequency inseparable structure. This can manifest as obliquely oriented STRF features or multiple asymmetrically aligned excitatory and inhibitory receptive field subregions. Neurons with such receptive field arrangements most likely prefer sound stimuli with dynamically changing frequency components, and, consequently, the spectral and temporal dimensions for such neurons cannot be treated independently of each other. This effect becomes more pronounced if the higher-order singular values account for a large proportion of the receptive field energy. Thus we can define a separability index by considering the proportion of energy provided by first singular value in relationship to the cumulative energy of the higher-order singular values. We define the separability index ({alpha}d) as

(9)
where {sigma}1 and {sigma}i are the first- and higher-order singular values of the STRF (Eq. 1), and N is the number of statistically significant singular values used in the Gabor STRF model. Conceptually, {alpha}d is defined as the normalized energy of the first singular value (relative to the total energy of the model STRF) minus the normalized energy of the higher-order singular values. Separability index values range from 0 to 1; where 1 corresponds to a perfectly separable STRF and values close to zero designate a highly inseparable receptive field arrangement.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ACKNOWLEDGMENTS
 REFERENCES
 
We studied in 99 single neurons how dynamic stimuli are encoded in the ICC by identifying structural characteristics of the auditory STRF. Our dynamic moving ripple stimulus (DMR) is a broadband sound that efficiently probes spectro-temporal attributes of the acoustic space (Escabí and Schreiner 2002Go). It is characterized by a dynamically changing spectrum with widespread spectral fluctuations over a broad range of resolutions (0–4 cycles/octave). Superimposed on this spectral variability, the DMR exhibits temporal energy fluctuations over a wide range of modulation frequencies: 0–350 Hz. Its statistically unbiased properties makes the stimulus directly applicable for the study of auditory receptive fields during dynamic stimulation. We combined STRF measurement techniques with a spectro-temporal Gabor model to study the structural properties and binaural arrangements of inferior colliculus STRFs. This model allows us to extract nine physiologically meaningful STRF parameters. To determine whether the Gabor model is well suited for describing auditory STRFs, we first fitted each contralateral STRF to the Gabor model and found the optimal parameters of each receptive field. Next, we independently characterized spectral and temporal receptive field profiles as well as the arrangement of excitation and inhibition of each neuron in order to determine how these dimensions contribute to the STRF. Finally, we use the Gabor STRF model to characterize and compare ipsi- and contralateral receptive field arrangements. By studying the spectral and temporal parameters of the contralateral and ipsilateral STRFs, we identify how the spectro-temporal arrangement of excitation and inhibition contribute to the formation of binaural response properties seen in the inferior colliculus.

Structure of the spectral receptive field

The spectral receptive field (SRF) profile is a model representation of the frequency integration area of auditory neurons (Calhoun and Schreiner 1998Go; Kowalski et al. 1996Go; Miller et al. 2002Go; Schreiner and Calhoun 1994Go; Versnell and Shamma 1998). This descriptor can be used to quantify neuronal responses to sounds with complex spectra (such as for formant transitions in speech and spectral resonances in animal vocalizations) and to study the receptive field arrangement of excitation and inhibition along the cochleotopic dimension of the stimulus. Most studies using this descriptor largely focused on qualitatively identifying general integration properties (such as the arrangement of spectral excitation and inhibition) and only for stimuli with static temporal characteristics. By slicing the STRF at a fixed latency (solid lines in Fig. 1, B and C) we can study the dynamic behavior of the SRF profile for complex stimuli with time-varying structure. Specifically, we would like to identify a model representation of the STRF that quantitatively captures the general characteristics of the SRF profile and its associated dynamics. When the latency is >40 ms, there is no discernible SRF structure for the STRF shown in Fig. 1A. At shorter latencies, however, SRF profiles can exhibit pure excitation, inhibition, or an alternating arrangement of excitation and inhibition. The phase of SRF profiles changes continuously so that the excitatory bandwidths and center frequencies change with increasing latency. Consequently, there is no direct analytic equation to model the SRF profile at all latencies.



View larger version (19K):
[in this window]
[in a new window]
 
FIG. 1. Spectral receptive field (SRF) profile analysis. A typical inferior colliculus spectro-temporal receptive field (STRF) showing obliquely oriented excitatory and inhibitory subregions (A). Two SRF profiles taken along the excitatory (T = 12.7 ms) and inhibitory (T = 26.7 ms) spectral cross-sections (solid lines in B and C, respectively). Their Hilbert transform (H[SRF(x)]) are represented by dotted lines and their spectral envelope, Es(x), by dashed-lines (B and C). The neuron's center frequency (CF) is determined from the peak of the SRF envelope. Typically, the CF is close to the peak of the SRF profile (as in B), although these may differ depending on the arrangement of spectral excitation and inhibition (as in C). The bandwidth of the SRF profile, BW, is measured directly from the spectral envelope. The range of frequencies covered by the BW account for ~85% of the energy of the SRF envelope. Measured SRF profile (red line) and Gabor fitted SRF profile (black line) are typically in close agreement (D and E).

 

One step toward solving this problem is to break up the SRF profile into an envelope and a carrier component via the Hilbert transform (Cai et al. 1997Go; Daugman 1985Go; DeAngelis et al. 1993aGo, 1999Go; Jones and Palmer 1987aGo,bGo; Marcelja 1980Go). The envelope, Es(x), is computed by the vector sum of the SRF profile, SRF(x), and its Hilbert transform, H[SRF(x)]

(10)
Example spectral envelopes of a single neuron are shown as dashed lines at two latencies in Fig. 1, B and C. The Hilbert transforms of each envelope, H[SRF(x)] (Fig. 1, B and C), are represented by the dotted lines and are obtained by shifting the phase of all frequency components of SRF(x) by 90° (solid lines in Fig. 1, B and C). Conceptually, the Hilbert transform isolates the fine carrier structure from the coarse envelope structure of the STRF.

Although the SRF profile depends strongly on the latency of the STRF, the spectral envelope assumes a nearly invariant structure at all latencies. The envelopes of the SRF profiles (dashed lines in Fig. 1, B and C) are approximately Gaussian functions and can be conveniently defined by their bandwidth and center frequency. The bandwidth of the SRF profile is defined as the width of the envelope at a response level that is 1/e relative to the absolute maximum of the envelope, capturing ~85% of the energy in a Gaussian the SRF envelope. The center frequency is defined as the peak value of the spectral envelope. As expected for the SRF profiles of Fig. 1, B and C, the measured bandwidths and center frequencies along the excitatory and inhibitory cross-sections are in close agreement: bandwidth = 1.00 and 0.89 octaves (octave is defined as log2 (f/fr), fr = 500 Hz is a reference frequency), respectively; center frequency = 4.37 and 4.42 octaves.

The spectral receptive field structure was modeled at each time point as the product of a Gaussian envelope and a sinusoidal carrier. Qualitatively, the Gaussian function defines the center and extent over which the neuron integrates spectral information, whereas the sinusoid carrier component is necessary to account for the interleaved patterns of excitation and inhibition. This functional form of the SRF profile, a Gabor function, is a direct extension of the receptive field models used to study spatio-temporal integration in the visual system (Cai et al. 1997Go; Daugman 1985Go; DeAngelis et al. 1993aGo; Jones and Palmer 1987aGo,bGo; Marcelja 1980Go). The Gabor function can capture numerous receptive field aspects and can be used to extract physiologically meaningful parameters directly from the neuron's receptive field.

At each time point, the SRF profile was fitted by a Gabor function taking the general form

(11)
where K, x0, BW, {Omega}0, and P are free parameters. The parameter K models the strength of the spectral response in unit of spikes · s1 · dB1. x0 is the center frequency or the central position of the SRF envelope in units of octaves; BW is the bandwidth of the SRF which accounts for the spectral extent of the receptive field; {Omega}0 is the best ripple density (units of cycles/octaves) that models the distance between the excitatory and inhibitory lobes; P is the spectral phase of the SRF profile with respect to the center frequency of the Gaussian envelope. This parameter accounts for the alignment of excitation and inhibition relative to the peak of the SRF envelope. The optimal parameters in Eq. 11 can be obtained by minimizing the mean square error between the Gabor function and the measured SRF profile (Press et al. 1995Go). Example SRF profiles (Fig. 1, D and E) and optimal-fitted results are shown in Fig. 1, D and E at two latencies of the STRF. Fitted profiles (continuous red lines) and the measured SRF profiles (continuous black lines) are in close agreement.

Structure of the temporal receptive field

The structure of the temporal receptive field (TRF) profile was analyzed using a similar functional descriptor as for the SRF profile. The TRF profile obtained by slicing through the STRF at a particular frequency has an alternating arrangement of excitation and inhibition. The TRF profiles of collicular neurons typically have short excitation (or inhibition) followed by long inhibition (or excitation) (e.g., solid line in Fig. 2B), and their envelopes are, therefore, not symmetric about the peak point. For example, the envelope of the TRF profile shown by the dashed line in Fig. 2B is not symmetric about the peak of the temporal envelope (vertical line) because it has a sharp onset and slower off-response. Because of this temporal asymmetry, the TRF profile is not well described by a symmetric Gabor function.



View larger version (31K):
[in this window]
[in a new window]
 
FIG. 2. Asymmetry analysis of the temporal receptive field (TRF) profiles. A: typical STRF showing a short excitatory onset response and a long inhibitory offset response. The TRF profile is obtained by taking a temporal cross-section about the center frequency (x0) (solid line in B) and its envelope is extracted with the Hilbert transform (dashed line in B). The envelope shows a strong asymmetry about its peak point, which is designated by the vertical line. C: the distribution of asymmetry index ({alpha}s) for our sample of neurons is displaced toward positive values (blue histogram). After performing a time-warping transformation, temporal envelopes are nearly symmetric and the asymmetry indices are tightly distributed about 0 (red histogram). D: the TRF profile of A (black line) was fitted with a skewed Gabor function (red line) which takes into account the temporal asymmetry of the TRF profile.

 

The degree of temporal asymmetry was measured for all contralateral responsive neurons in our ICC sample (n = 93 of 99) with an asymmetry index, {alpha}t (see METHODS). The TRF profile in Fig. 2B is skewed to the left and it therefore has a positive asymmetry index (0.935). Figure 2C (blue histogram) illustrates the distribution of asymmetry indices, obtained for the dynamic moving ripple sound. The population distribution shows a bias toward positive values (mean ± SD: 1.93 ± 1.64; observed range: 0.30–9.7; t-test, P < 0.001), indicating that the temporal envelopes and TRF profiles are skewed toward zero delay. Accordingly, the temporal responses profiles of most ICC neurons exhibit a short primary response (excitatory or inhibitory) followed by a long secondary response of opposite sign (inhibitory or excitatory, respectively). Such timing differences between the onset and offset of the receptive field are consistent with asymmetric preferences to ramped auditory stimuli observed both physiologically (Lu et al. 2001Go) and psychoacoustically (Neuhoff 1998Go; Patterson 1994).

Considering the observed temporal asymmetry, we modified the Gabor model so that it accounts for the observed timing profiles by incorporating a time-warping factor that skews the time axis and allows us to model the TRF with a symmetric Gabor function (DeAngelis et al. 1999Go). The time-skewing function was defined as

(12)
where {beta} is the skewing factor (observed range: 0.45–0.68), t is the uncompressed time-axis, and T is the corrected temporal axis. The TRF profile is then fitted by a Gabor function of the form

(13)
where K, T0, D, Fm0, and Q are free parameters. K corresponds to the strength of the temporal response; T0 is the peak latency of the TRF profile; D reflects the time-skewed duration of the response; the best temporal modulation frequency is described by Fm0; and Q is the phase of a sinusoid component about T0. During the fitting procedure, each parameter was adjusted iteratively until the optimal parameters in Eqs. 12 and 13 are found by minimizing the mean square error between the model and the measured TRF profile (Press et al. 1995Go). An example fitted TRF profiles is illustrated in Fig. 2D. The fitted TRF profile (solid red line) captures the structure of the measured TRF profile (solid black line). Further analysis of the entire population confirms the validity of the temporal receptive field asymmetry and the appropriateness of the time-skewing parameter. We recomputed the asymmetry index of all neurons using the time-warped TRF profiles (Fig. 2C; red histogram), which resemble symmetric Gaussian functions (not shown). The time-warped asymmetry indices were near zero (time-warped mean ± SE = 0.083 ± 0.014) and were significantly smaller than for the unwarped TRF (time-unwarped, 1.93 ± 0.17; paired t-test, P = 1). Thus the time-warping factor accurately accounts for the observed temporal receptive field asymmetry observed for all ICC neurons.

Gabor-STRF model

The analysis of the TRF and SRF profiles shows that the temporal and spectral receptive field dimensions of auditory neurons can in principle be independently approximated by temporal and spectral Gabor functions. Does this approach generalize for the STRF? Can we model the auditory STRF by a product of Gabor TRF and SRF profiles? If so, what conditions must be satisfied?

In terms of time and frequency response interactions, auditory STRFs can be divided into two fundamental types: separable and inseparable (Adelson and Bergen 1985Go; DeAngelis et al. 1995Go; Depireux et al. 2001Go; Miller et al. 2002Go; Reid et al. 1991Go; Sen et al. 2001Go). Time-frequency separability of the STRF occurs whenever the STRF can be described as the product of a SRF profile and a TRF profile, in which case the SRF and TRF profiles are independent of each other. If a separable STRF is taken into the Fourier domain, the ripple transfer function (RTF) is symmetric about the zero temporal modulation frequency axis (Depireux et al. 2001Go; Escabí and Schreiner 2002Go; Miller et al. 2002Go; Sen et al. 2001Go). However, inseparable STRFs cannot be broken down into two independent time and frequency functions. The representations of these STRFs in the Fourier domain can therefore show conspicuous asymmetries (Depireux et al. 2001Go; Escabí and Schreiner 2002Go; Miller et al. 2002Go; Sen et al. 2001Go).

Many auditory STRFs have some inseparable features, including, time-frequency oriented subregions or multiple asymmetrically aligned excitatory and inhibitory receptive field components. Such structural features may be necessary to encode specific structural components in natural signals, such as consonant-vowel transitions in speech, and to dynamically track changes in the frequency spectrum of complex signals, such as frequency-modulated sweeps.

In the previous discussions, we showed that it is relatively easy to model auditory receptive fields by independent Gabor profiles (spectral and temporal) if they are time-frequency separable; however, this procedure is not directly applicable for inseparable STRFs. One way to overcome this difficulty is to first decompose an inseparable STRF (Fig. 3A) into several separable STRF components (Fig. 3, B and C). Each of the separable STRF components can then be fitted by a time-frequency separable Gabor (Fig. 3, D and E). Finally, the fitted resultant STRF is approximated by the sum of each separable fitted STRF component (see METHODS, Eq. 3; Fig. 3). This procedure is realized using a singular value decomposition (SVD) to determine numerically the smallest number of independent time-frequency dimensions of the STRF (Depireux 2001; Press et al. 1995Go; Theunissen 2000).

We determined the number of independent STRF components required for the Gabor STRF model numerically by finding those components that exceed a significance criterion of P < 0.01 (Fig. 4C). Figure 4C describes the relationship between the measured spike rate and the level of the noise for dynamic moving ripples. The level of the noise increases as function of the spike rate. The magnitude of the first (red *), second (blue {diamond}), and third (green {circ}) STRF singular values are plotted against the noise-threshold level; of which 100% of the first STRF components exceeded the noise level. By comparison, only 39.7% of the second, 7.5% of the third STRF components exceeded the significance criterion (solid black line in Fig. 4, B and C). The total energy contribution of the first and second singular value components accounts for 78.9 ± 15.7 and 6.2 ± 5.0% of the STRF energy, respectively. The third component, however, only contributes 2.3 ± 1.8% of the total STRF energy. Therefore the first and second singular values are typically sufficient for describing the spectro-temporal structure of ICC receptive fields.

Validating the Gabor STRF model

As with any model, its overall utility ultimately depends on its ability to account for observed empirical results. Specifically, we are interested in determining how well the separable Gabor STRF model accounts for receptive field structure of inferior colliculus neurons. Does the model adequately account for spectral and/or temporal receptive field structures? If so, how well does it account for joint spectro-temporal receptive field characteristics? We devised four metrics to independently quantify the spectral, temporal, and spectro-temporal goodness of fit of the model. Differences in receptive field shape between the model and neural data were quantified individually for the SRF and TRF profiles as well as for the STRF. The spectral similarity index (SIs), temporal similarity index (SIt), and spectro-temporal similarity index (SI) each independently measure how well the model accounts for the structure of the SRF, TRF, and STRF, respectively. Each SI is equivalent to a correlation coefficient between the data and model, and, therefore, they assume numerical values between negative and positive one (DeAngelis et al. 1999Go; Escabí and Schreiner 2002Go; Miller et al. 2002Go). Errors due to energy differences between the model and data were characterized with an energy error metric—which we computed as a normalized mean square error (MSE; see METHODS) from the residual errors (difference between Gabor STRF model and the original STRF; Fig. 5, third column). This metric assumes values between zero and one, where zero indicates that the model provides a perfect fit and a value of one is indicative of a poor fit.



View larger version (62K):
[in this window]
[in a new window]
 
FIG. 5. Representative fits of the Gabor STRF model for 5 inferior colliculus neurons. Measured STRFs (A–E, left), fitted STRFs (STRFm, middle), and error STRFs (right) are shown. The SRF and TRF profiles are shown on the right and top of measured and fitted STRFs. The measured, fitted, and error STRFs in each row are plotted using identical color scale. A and B: typical inseparable STRFs. C: typical separable STRF. D: typical inhibitory/separable STRF. E: poorly fitted STRF. Action potential traces are shown for reference at the far right.

 

Figure 5 illustrates example fits of the STRF Gabor model of five ICC neurons and the residual errors between the model and data (third column). In most instances, the model accounts for the spectral, temporal, and spectro-temporal receptive field structure exceptionally well. For instance, the measured SI values (spectral SI = 0.992; temporal SI = 0.992; spectro-temporal SI = 0.967) and MSE (0.043) show that a strongly nonseparable STRF (Fig. 5A; separability index = 0.692) can be adequately fit by the model. Not surprisingly, the structure of separable STRFs (Fig. 5C) is easily captured by the model (spectral SI = 0.993; temporal SI = 0.966; spectro-temporal SI = 0.976; MSE = 0.022); however, the number of STRF components required to fit a separable STRF is typically lower than for a nonseparable STRF (correlation between number of components and separability index: r = –0.679 ± 0.077, P < 0.001).

The example STRFs of Fig. 5, A–C, were exceptionally clean with little additive noise. Other neurons had higher levels of noise (Fig. 5D), and yet, the model was able to account for their STRF structure (spectral SI = 0.955; temporal SI = 0.975; spectro-temporal SI = 0.941; MSE = 0.079).

Although the model was able to account for the structure of many neurons, it could not fit all receptive field structures. The neuron of Fig. 5E, for example, has multiple excitatory peaks that are displaced along the spectral axis. The measured SI values and MSE (spectral SI = 0.857; temporal SI = 0.970; spectro-temporal SI = 0.762; MSE = 0.434) indicate that the model accounts reasonably well for the temporal RF structure, which has a simple on-off TRF profile; however, the model can not fully account for the multiple excitatory spectral peaks observed in the original SRF. This happens because the spectral oscillations of the STRF are strictly positive valued, whereas the Gabor model requires oscillatory components with negative and positive values. Accordingly, the model fails to account for the STRF structure because of its inability to model the SRF profile of the neuron.

The distribution for the three-similarity indices and the normalized MSE of all neurons are illustrated in Fig. 6. Overall the Gabor STRF model fully accounts for much of the spectral, temporal, and spectro-temporal structure of inferior colliculus neurons. In both instances, the mean spectral and temporal SIs (Fig. 6, A and B) are close to unity (0.938 ± 0.088 and 0.933 ± 0.075, respectively), suggesting that the shapes of the TRF and SRF profiles are readily accounted for by the Gabor model. Furthermore, the spectral and temporal SIs are not significantly different (paired t-test, P > 0.57), indicating that Gabor TRF and SRF models are equally well suited for describing the temporal and spectral receptive field profiles. The mean value of the spectro-temporal SI (0.846 ± 0.125; Fig. 6C) is lower than spectral and temporal SI (paired t-test; P < 0.001 and P < 0.001, respectively). This reduction in SI is accounted for by the fact that independent multiplicative errors are propagated from the SRF and TRF profiles to the STRF in the model, leading to a reduction in the spectro-temporal SI (using the spectral and temporal SI, the expected spectro-temporal SI assuming independent profiles is 0.938 x 0.933 = 0.875). Finally, the residual errors of the model (Fig. 6D) are typically small, as suggested by the MSE energy error metric (mean ± SD = 0.185 ± 0.126), and were typically not significantly different from random noise ({chi}2 test; P < 0.01 for 58 of 93 neurons; critical value, = 36.2).



View larger version (22K):
[in this window]
[in a new window]
 
FIG. 6. Gabor STRF error analysis. Distribution of spectral similarity index (SIs; A), temporal similarity index (SIt; B), STRF similarity index (SI; C), and the energy error metric (MSE; D). The spectral and temporal SI quantify shape similarity between the measured and modeled SRF and TRF profiles, respectively. Both means are near unity suggesting that the Gabor model can adequately account for the shape of the SRF and TRF profiles. The STRF similarity index, assumes values that are slightly lower than for the SRF and TRF (C) because shape errors from the Gabor TRF and SRF models are propagated to the Gabor STRF model. The overall goodness of fit was measured with the energy error metric (lower values correspond to better fits), which typically assumed small values (D).

 

Spectral response preferences

Spectral response preferences of auditory neurons are typically determined with isolated pure-tones of varying frequency. The SRF is an extension of the methods used to study frequency response preferences using sound stimuli with spectral structure (Kowalski et al. 1996Go; Schreiner and Calhoun 1994Go; Versnel and Shamma 1998Go). This descriptor allows us to study spectral integration properties of single neurons to dynamic broadband sounds with a rich spectral structure. Spectral selectivity is captured by four parameters of the Gabor function SRF (Eq. 11)— center frequency (x0), SRF bandwidth (BW), best ripple density ({Omega}0), and spectral phase (P). The center frequency and bandwidth determine the central location and width of the SRF profile; the best ripple density determines the number of excitatory or inhibitory peaks in the SRF, and the spectral phase determines their alignment relative to the center frequency. Individually, each of these parameters reflects structural properties of the neuronal response area. The center frequency determines the central position of the SRF, whereas the bandwidth determines its spectral extent or selectivity. The ripple density accounts for the interleaving pattern of excitation and inhibition observed in many neurons, whereas the spectral phase determines the exact position of the excitatory and inhibitory SRF subregions.

Due to some frequency bias in the sampling of ICC, the contralateral receptive field of the studied neurons covered a range of center frequencies from 1.47 to 5.3 oct. (between 1.393 and 20 kHz)— of which 64.5% were located in the range from 4 to 5 octaves (between 8 and 16 kHz; Fig. 7A). While the center frequency of the neuron determines the position along the primary sensory epithelium that preferentially activates the neuron, the spectral bandwidth accounts for the range of frequencies over which the neuron integrates spectral information, including both excitatory and inhibitory features. SRF bandwidths ranged from 0.14 to 4.8 octaves—although most neurons had bandwidths below ~2.0 octaves (93%). The SRF bandwidth follows a unimodal distribution with mean 0.988 octaves and median 0.654 octaves (Fig. 7C).



View larger version (37K):
[in this window]
[in a new window]
 
FIG. 7. Distributions of spectral STRF parameters. A: center frequency (x0); B: the best ripple density ({Omega}0); C: bandwidth of the SRF profile (BW), and D: the spectral phase (P) all assume unimodal distributions.

 

Auditory neurons can also respond selectively to oscillatory patterns of the stimulus spectrum (Kowalski et al. 1996Go; Schreiner and Calhoun 1994Go). Such selectivity arises via alternating excitatory and inhibitory subfields of the SRF profile. These excitatory and inhibitory RF features must overlap on and off features of the stimulus spectrum for the neuron to respond. Therefore such spectral selectivity is reflected in the SRF profile by alternating on and off subfields of the SRF profile, analogous to spatial grating selectivity in the visual system (Cai et al. 1997Go; DeAngelis et al. 1995Go, 1999Go). This form of spectral selectivity is captured by the Gabor model in the best ripple density parameter. The ripple density (units of cycles/octave) represents the number of spectral peaks in the stimulus spectrum existing over an octave range of frequencies. The best ripple density is defined as the number of stimulus spectral peaks that produces a maximal neural response. Alternately, it can also be thought of as the number of interleaved excitatory and inhibitory subunits of the SRF existing over a single octave (Escabí and Schreiner 2002Go; Klein et al. 2000Go; Miller et al. 2002Go; Schreiner and Calhoun 1994Go). Most neurons in our sample preferred low ripple densities (Fig. 7B; mean = 0.609 cycles/octave; median = 0.406 cycles/octave), indicating that they preferred broad spectral features of the dynamic moving ripple sound. The range of best ripple densities extended from nearly 0 (0.022 cycles/octave) to 2.113 cycles/octave although all neurons were tested up to 4 cycles/octave.

Finally, the spectral phase of the SRF profile determines the alignment of excitatory and inhibitory features relative to the center frequency of the neuron. Conceptually, a spectral phase shift corresponds to a frequency shift of the actual SRF maximum (not the envelope peak or center frequency). A positive phase value shifts the maximum of the spectral profile to lower frequencies; a negative phase shifts the SRF maximum to higher frequencies. Most of the STRFs (78.5%) have positive spectral phases, indicating that neurons favor lower frequencies than the center frequency (Fig. 7D).

The SRF profile allows us to study its arrangement in terms of spectral excitation and inhibition. The behavior of each neuron can also be interpreted directly in the ripple density or frequency domain (Kowalski et al. 1996Go; Miller et al. 2002Go; Schreiner and Calhoun 1994Go). To do this, the SRF is converted into a spectral modulation transfer function (sMTF). The sMTF measures the neurons response (spikes · s1 · dB1) as a function of the applied ripple density. Using the Gabor model representation of the SRF profile (Eq. 11), the corresponding sMTF is obtained by applying a Fourier transform magnitude (FTM) to the SRF profile

(14)
where all symbols are defined as in Eq. 11. The parameter A, determines the peak magnitude of the MTF or equivalently the gain of the neuron from stimulus to response (units spikes/s/dB). It is related to the magnitude of the SRF through the relationship: . The sMTF acquires the structure of a Gaussian function with the center {Omega}0 and standard deviation . The bandwidth of the sMTF is defined as the width of the sMTF that accounts for 85% of the total energy under the Gaussian curve. This parameter determines the range of spectral oscillations (cycles/octave) in a stimulus that can potentially activate the neuron. According to this criterion, the tail points at the level of 1/e of the Gaussian sMTF peak value delineate the bandwidth of the sMTF. Compared to the bandwidth of the SRF profile, the bandwidth of the sMTF (4/{pi}/BW) is inversely proportional to the bandwidth of the SRF profile (BW).

Figure 8, A–C, shows representative sMTFs of three single neurons in the ICC. To facilitate comparisons, each sMTF was normalized so that their total energy is equal to one; — shows the normalized sMTFs from Eq. 14, - - - corresponds to the normalized sMTFs obtained directly from measured SRF profiles. The Gabor sMTF model (Eq. 14) accounts for the structure and energy of the actual sMTFs quite well as depicted by the — and - - - in Fig. 8.



View larger version (25K):
[in this window]
[in a new window]
 
FIG. 8. Representative spectral modulation transfer functions (sMTF). — and - - -, the fitted and measured sMTFs, respectively. All sMTFs are normalized for unit energy. A: a typical lowpass sMTF with the best ripple density ({Omega}0 = 0 cycles/octave) and bandwidth (1.30 cycles/octave at upper 8.68-dB cutoff or 1.14 cycles/octave at upper 6-dB cutoff). B (best ripple density: 1.30 cycles/octave; bandwidth: 2.44 cycles/octave at upper 8.68-dB cutoff; 1.87 cycles/octave at upper 6-dB cutoff) and C (best ripple density: 1.30 cycles/octave, bandwidth: 1.27 cycles/octave at upper 8.68-dB cutoff; 1.07 cycles/octave at upper 6-dB cutoff) show typical sMTFs with bandpass filter characteristics. D: the composite population sMTF for the inferior colliculus (ICC) assumes a lowpass filter characteristic with a best ripple density of zero and bandwidth 0.995 cycles/octave (at upper 8.68-dB cutoff) or 0.662 cycles/octave (at upper 6-dB cutoff).... and – · –, the upper 6- and 8.68-dB cutoff, respectively.

 

Neurons were individually classified according to their spectral filtering characteristics. These can, in theory, take the form of lowpass, bandpass, or highpass filtering response pattern. Neurons in our sample only exhibited lowpass (Fig. 8A) and bandpass (Fig. 8, B and C) spectral selectivity. The criterion for classifying each neuron from the sMTF consisted of comparing the sMTF bandwidth of each neuron in relation to its best ripple density. Specifically, we required that the measured best ripple density ({Omega}0) be greater than half the sMTF bandwidth for bandpass neurons. This requirement guarantees that bandpass neurons have a residual DC level response of less than half the sMTF peak magnitude; whereas lowpass neurons will have a significant DC response with >50% of the peak response magnitude. Figure 8A illustrates this procedure for a typical sMTF with lowpass selectivity (same as Fig. 5A), which shows a nonoscillatory on-spectral response pattern. Its sMTF indicates that the structure of the STRF along the spectral dimension is dominantly excitatory or inhibitory. A neuron with bandpass filter characteristics is illustrated by the examples of Fig. 8C (same as Fig. 5B). This neuron has an SRF with strong alternating excitatory and inhibitory subfields. An intermediate scenario occurs for the neuron of Fig. 8B (same as Fig. 2A), which shows a significant DC level response in the sMTF; however, the neuron exhibits weak inhibitory sidebands and, consequently, a best ripple density that is offset from zero. In the STRF domain, this neurons shows a strong pattern of excitation and a significant, but subtle, inhibitory subregion. According to our criterion, we found that 80 of 93 neurons exhibited lowpass response preferences; 83 neurons (13 bandpass and 70 lowpass) had best ripple densities offset from zero (as for Fig. 8B) and 69 had best ripple densities <1 cycle/octave. Thirteen neurons exhibited bandpass selectivity, and no neurons had highpass response preferences.

Each individual sMTF tells us about the spectral selectivity of individual neurons and tells us little about the overall spectral filtering capabilities of the inferior colliculus. Therefore, we determined the overall spectral selectivity of the inferior colliculus by computing a population sMTF. The population sMTF of the inferior colliculus (Fig. 8D) was obtained by averaging the amplitude-normalized sMTFs of all single neurons. Using the criterion defined for single unit sMTFs, we find that the spectral selectivity of the ICC (in the sampled frequency range) is lowpass with a bandwidth of 0.995 cycles/octave (at upper 8.68 dB cutoff; according to the 1/e bandwidth criterion) or 0.662 cycles/octave (at upper 6 dB cutoff) and centered about a best ripple density of zero cycles/octave. Thus the ICC as a whole has a significant preference for broadband stimuli.

Temporal response preferences

Neurons in the ICC show a diverse range of response preferences to temporally modulated stimuli (e.g., Krishna and Semple 2000Go; Langner and Schreiner 1988Go; Ramachandran et al. 1999Go; Rees and Møller 1983Go). While numerous studies have identified the output-response characteristics of ICC neurons to simple time-varying stimuli, the receptive field structure leading to these response preferences has previously not been studied. Temporal response characteristics of ICC neurons can be interpreted by four parameters of the temporal Gabor model (Eq. 13)—the best temporal modulation frequency (Fm0), the peak latency (T0), the response duration (D), and the temporal phase (Q). Together, the peak latency and response duration determine the locality and width of the TRF profile, respectively; the best temporal modulation frequency and temporal phase determine the rate and alignment of the temporal oscillation of the TRF profile.

Figure 9 illustrates distributions for these parameters for the contralateral receptive field. The absolute value of the best temporal modulation frequency ranged from 0 to 255.5 Hz and the distribution peaks at 30 Hz (Fig. 9A). Thus although numerous neurons can respond selectively to exceedingly fast temporal modulations of the dynamic moving ripple, most neurons preferred low modulation rates.



View larger version (26K):
[in this window]
[in a new window]
 
FIG. 9. Distributions for temporal STRF parameters. A–D: the best temporal modulation frequency (Fm0), the peak latency (T0), the response duration (D), and the temporal phase (Q), respectively.

 

The peak latency is defined as the time of maximal neural response (excitation or inhibition) following the onset of stimulation, whereas the response duration determines the time period over which the neurons integrate acoustic information. From the distributions in Fig. 9B, the peak latency was usually <20 ms (range: 3.5–27.4 ms; mean: 10.1 ms; median: 8.5 ms) and is consistent with previous observations using pure tone and noise stimuli (Krishna and Semple 2000Go; Langner and Schreiner 1988Go). The response durations extended over a broad range (observed range: 1.8–82.6 ms), although most neurons typically had short response durations (mean: 12.1 ms, median = 6.2 ms).

Finally, the temporal phase determines the arrangement of excitation and inhibition of the TRF profile, relative to the peak latency or centroid position—which is determined from the TRF envelope. Positive temporal phases shift the TRF profile to the left of the peak latency; negative values shift the TRF profile to longer latencies. The temporal phase distribution (Fig. 9D) shows that 78.5% of temporal phases are positive, thus indicating that the peaks of the TRF profiles are typically shifted to the left of the peak derived from the temporal envelope. Therefore excitation typically precedes inhibition.

The TRF profile allows us to study the timing of the neural response and the temporal arrangement of excitation and inhibition. The behavior of each neuron can also be interpreted and studied directly in the frequency domain. By converting the TRF profile (measured at the center frequency) into the Fourier domain, we can obtain the temporal modulation transfer function (tMTF) of each neuron. The tMTF characterizes the time-locked response of the neuron as a function of the temporal modulation frequency. Using the Gabor function TRF profile (Eq. 13), the tMTF can be represented by a Gaussian function of the form

(15)
where Fm0 and D are as in Eq. 13 and the tMTF is expressed in units of spikes/sec/dB. The parameter A corresponds to response strength. To facilitate comparisons, each tMTF was normalized for unit energy. The criterion for choosing the bandwidth of the tMTF and for classifying them according to lowpass and bandpass selectivity follows the same procedure as for the sMTF (see previous section). Thus the duration of the TRF profile (D) is inversely proportional to the bandwidth of the tMTF (4/{pi}/D).

Figure 10 shows three representative inferior colliculus tMTFs. The examples of Fig. 10, A and B, have a significant DC level response and are therefore classified as having low-pass sensitivity to the temporal modulation frequency. While the first neuron has its strongest response at zero frequency, the latter neuron has a best temporal modulation frequency of 130.3 Hz. Both neurons responded over a large range of modulation frequencies as suggested by their response bandwidths. The bandwidths of the tMTF for Fig. 10, A and B, are 350.0 Hz (at upper 8.68 dB cutoff or 324.7 Hz at upper 6 dB cutoff) and 245.4 Hz (at upper 8.68 dB cutoff or 223.8 Hz at upper 6 dB cutoff), respectively.