|
|
||||||||
1Biomedical Engineering Program and 2Department of Electrical and Computer Engineering, University of Connecticut, Storrs, Connecticut 06269-2157; and 3W. M. Keck Center for Integrative Neuroscience, University of California, San Francisco, California 94143
Submitted 25 September 2002; accepted in final form 3 March 2003
| ABSTRACT |
|---|
|
|
|---|
60% of
collicular neurons are well described by a time-frequency separable Gabor STRF
model, whereas the remaining neurons exhibited obliquely oriented or multiple
excitatory/inhibitory subfields that require a nonseparable Gabor fitting
procedure. Parametric analysis reveals distinct spectro-temporal tradeoffs in
receptive field size and modulation filtering resolution. Comparisons between
an identical model used to study spatio-temporal integration areas of visual
neurons further shows that auditory and visual STRFs share numerous structural
properties. We then use the Gabor STRF model to compare quantitatively
receptive field properties of contra- and ipsilateral inputs to the ICC. We
show that most interaural STRF parameters are highly correlated bilaterally.
However, the spectral and temporal phases of ipsi- and contralateral STRFs
often differ significantly. This suggests that activity originating from each
ear share various spectro-temporal response properties such as their temporal
delay, bandwidth, and center frequency but have shifted or interleaved
patterns of excitation and inhibition. These differences in converging
monaural receptive fields expand binaural processing capacity beyond
interaural time and intensity aspects and may enable colliculus neurons to
detect disparities in the spectro-temporal composition of the binaural
input. | INTRODUCTION |
|---|
|
|
|---|
Auditory receptive fields are typically derived with isolated pure tones
that are presented at varying frequencies and intensities or by measuring
neural sensitivity to narrowband time-varying stimuli (e.g.,
Krishna and Semple 2000
;
Langner and Schreiner 1988
;
Ramachandran et al. 1999
;
Rees and Møller 1983
).
Recently, the auditory spectro-temporal receptive field (STRF), a linear model
representation of the integration area of a neuron, has expanded these
classical methods. The auditory STRF has the advantage that it simultaneously
describes spectral and temporal stimulus attributes that preferentially
activate a neuron and can be used to identify the spectral arrangement and
temporal dynamics of neural excitation and inhibition of a neuron during
dynamic broadband stimulation (Aersten et
al. 1980
; deCharms et al.
1998
; Depireux 2001;
Escabí and Schreiner
2002
; Klein et al.
2000
; Miller et al.
2002
; Nelken et al.
1997
; Sen et al.
2001
; Theunissen et al.
2000
). In particular, the STRF technique is useful for predicting
neuronal response patterns to complex auditory stimuli, including natural
sounds (Aersten et al. 1980
;
Klein et al. 2000
;
Sen et al. 2001
;
Theunissen et al. 2000
), and
can accurately account for spatial selectivity profiles that contribute to
sound localization (Schnupp et al.
2001
).
In the visual system, the direct counterpart of the auditory STRF is the
spatio-temporal receptive field. Here the spectral dimension (which extends
along the primary sensory epithelium receptor surface of the cochlea) is
replaced by spatial dimensions along the retinal sensory epithelium
(Cai et al. 1997
;
DeAngelis et al. 1995
;
De Valois and Cottaris 1998
;
Shamma 2001
). Visual
neurophysiologists have used Gabor and Gamma functions as quantitative
descriptors of visual STRFs (Cai et al.
1997
; DeAngelis et al.
1993a
,
1999
; Jones and Palmer
1987a
,b
).
Advantages for fitting visual STRFs by quantitative functions include:
improved estimates of the spatio-temporal structure of visual response areas
and the removal of estimation noise. Furthermore, these model STRFs can be
used to study the arrangements of excitatory and inhibitory neural inputs and
to extract physiologically meaningful parameters from neural data (DeAngelis
et al. 1993a
,
1999
). Although it has been
suggested that auditory and visual STRFs have remarkably similar time-varying
structure (deCharms et al.
1998
; Shamma
2001
), only a few studies have quantitatively evaluated the
spectro-temporal structure of auditory STRFs
(Depireux et al. 2001
;
Escabí and Schreiner
2002
; Miller et al.
2002
; Sen et al.
2001
). However, these studies did not quantitatively compare the
structure of the auditory STRF directly with their visual counterpart.
In this study, we present a time-frequency Gabor STRF model to fit auditory
STRFs in the central nucleus of cat's inferior colliculus (ICC). Spectral and
temporal Gabor functions are used to model spectral receptive field (SRF) and
temporal receptive field (TRF) profiles of ICC neurons, respectively. Each
STRF is then fitted by a weighted sum of products of time-frequency separable
Gabor functions. From the definition of a Gabor function, nine physiologically
meaningful parameters are extracted: the center frequency, the best ripple
density, the best temporal modulation frequency, the peak latency, the
bandwidth of the SRF profile, the response duration, the response strength,
and the spectral and temporal phases. These parameters are used to quantify
spectral, temporal, and time-frequency response characteristics to dynamic
moving ripple stimuli (Escabí and
Schreiner 2002
; Miller et al.
2002
). This Gabor STRF model is a direct extension of receptive
field models used to study the structure of visual receptive fields in the
primary visual cortex (DeAngelis et al.
1993a
,b
,
1999
) and provides a basis for
comparing the structure of auditory and visual STRFs. In particular, we apply
this methodology to compare STRF properties of contra- and ipsilateral inputs
to ICC neurons. We demonstrate specific aural STRF differences that suggest
binaural filtering mechanisms beyond intra-aural time and level
sensitivity.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Physiological recording methods have been presented in detail elsewhere
(Escabí and Schreiner
2002
). Briefly, cats (n = 4) were initially anesthetized
with a mixture of ketamine HCl (10 mg/kg) and acepromazine (0.28 mg/kg im). A
surgical state of anesthesia was induced with
30 mg/kg pentobarbital
sodium (Nembutal) and maintained throughout the surgery with supplements via
an intravenous infusion line. Body temperature was measured and maintained at
37.5°C. The overlying cerebrum and part of the bony tentorium was
removed to expose the ICC via a dorsal approach. During the unit recordings,
animals were maintained in an areflexive state via continuous infusion of
ketamine (24 mg · kg1 ·
h1) and diazepam (0.41 mg ·
kg1 · h1) in
lactated Ringer solution (14 mg ·
kg1 · h1).
The infusion rate was adjusted according to physiologic criteria (heart rate,
breathing rate, temperature, and peripheral reflexes). All surgical methods
and experiment procedures follow National Institutes of Health and U.S.
Department of Agriculture guidelines.
Neural data was acquired from n = 99 single units in the ICC with
parylen-coated tungsten microelectrodes (Microprobe, Potomac, MD; 13
M
at 1 kHz) that were advanced into the central nucleus with a
hydraulic microdrive (David Kopft Instruments, Tujunga, CA). Action potential
traces were recorded onto a digital audio tape (Cygnus Technologies CDAT16;
Delaware Water Gap, PA) at a sampling rate of 24.0 kHz (41.7-µs resolution)
and spike sorted off-line with a Bayesian spike sorting algorithm
(Lewicki 1994
).
Acoustic stimuli
Dynamic moving ripple (DMR) stimuli
(Escabí and Schreiner
2002
) were presented with the animal in a sound-shielded chamber
(IAC, Bronx, NY) with stimuli delivered via a closed, binaural speaker system
(electrostatic diaphragms from Stax). The Dynamic Moving Ripple sound is
specifically designed to dynamically activate the primary sensory epithelium
and to probe the physiologically relevant range of spectral and temporal
stimulus modulations of neurons in an unbiased fashion. Sounds were presented
binaurally with an independent sound sequence to each earfrom which
independent contra- and ipsi-lateral STRFs were computed via spike-triggered
averaging (Escabí and Schreiner
2002
).
In three experiments, the DMR stimulus was presented for a period of
1020 min (Escabí and
Schreiner, 2002
). In one experiment, a two-repeat 4-min sequence
of the DMR (8 min total) was presented. In all experiments, stimuli covered
the same range of spectral and temporal parameters and were presented at
3070 dB above the neurons response threshold.
Gabor STRF model
STRFs were decomposed into a superposition of time-frequency separable
functions from which we could model and fit each component by a
spectro-temporal Gabor function (product of Gaussian and cosine;
Fig. 3). Measured STRFs were
first decomposed using a singular value decomposition (SVD)
(Depireux et al. 2001
;
Press et al. 1995
;
Theunissen et al. 2000
) into a
sum of separable STRF components (STRFi)
![]() | (1) |
i, in descending rank order according to energy;
and * denotes the Hermitian transpose. Each STRF component,
STRFi, is obtained by the vector product
![]() | (2) |
i is the ith singular value of
STRF(t, x) and determines the energy of the ith STRF
component. ui and vi
are the ith unitary orthogonal vectors of U and V,
respectively. Conceptually, these correspond to the spectral and temporal
receptive field profiles of each component STRF (e.g., shown on the
top and right of Fig. 3,
B and C). The dominant spectral and temporal
receptive field profiles, u1 and v1,
account for
80% of the total STRF energy, and we therefore use these to
quantify spectral and temporal response characteristics throughout.
|
According to the SVD procedure, every STRFi component
is time-frequency separable (although the entire STRF may be nonseparable).
Therefore each component can be modeled by the product of a spectral and a
temporal waveform, which we approximate by a Gabor function. Thus the fitted
STRF model is expressed as a weighted sum of a finite set of N of
statistically significant separable Gabor components (typically, N =
1 or 2)
![]() | (3) |
Level of noise
Auditory STRFs are estimated from real neural data by a spike-triggered
average method (Escabí and
Schreiner 2002
) that is inherently noisy. Measurement noise
corresponds to random deviations from the expected STRF that would result from
an infinite amount of averaging. These variations result from unexpected
variations in the neural response and from finite data averaging due to the
finite experiment recording periods (Klein
et al. 2000
; Theunissen 2000). Therefore to minimize the effects
of noise, it is necessary to consider only those independent time-frequency
components of the Gabor STRF model that significantly contribute to the STRF's
energy and structure.
To determine the maximum number of independent dimensions of the STRF that
contribute to its structure (N in Eq. 3), it is essential to
quantify the STRF noise level. Singular values that exceed the measured noise
level typically contribute significantly to the neural response and should
therefore be incorporated into the Gabor STRF model; alternately, singular
values that fall below the noise level contribute largely to the noise and can
therefore be ignored. A significant noise level (P < 0.01) was
determined empirically via a bootstrap STRF re-estimation procedure for a
random Poisson firing neuron of identical spike rate as the neuron under
investigation. Twenty-five randomly constructed STRFs,
STRFr (e.g., Fig.
4A), were simulated by correlating a random Poisson spike
train of firing rate,
, with the dynamic moving ripple noise stimulus.
The first singular value (
r1) of each
random-STRF, STRFr, was obtained directly by performing a
SVD. For each of the 25 trials (shown by vertical red circles in
Fig. 4B), the measured
level of noise was randomly distributed. Therefore the desired threshold noise
level for a specific spike rate (solid line in
Fig. 4B) was
determined as the sum of the mean of
r1
and 2.57 times its SD (P < 0.01). The mean ± SD of
r1 were calculated from the 25 simulated
samples by a bootstrap resampling technique
(Efron and Tibshirani 1993
).
All first-order STRFs considered here were above the estimated noise
level.
|
Similarity index
The Gabor STRF model can potentially account for much of the structure of collicular receptive fields, however, the utility of the model needs to be quantitatively evaluated. We devised three metrics to validate the goodness of fit of the model. We evaluated the goodness of fit of SRF and TRF profiles independently and for the entire STRF.
To compare the receptive field structure of the model and data, we devised
the spectral similarity index (SIs), temporal similarity
index (SIt) and spectro-temporal similarity index (SI).
The spectral SI, SIs, accounts for differences in shape
between original and model SRF profiles; SIt is used to
compare the original and model TRF profiles; the spectro-temporal SI, SI,
measures shape differences between original and model STRFs. Individually
these metrics correspond to a correlation analysis performed between the model
and original data (DeAngelis et al.
1999
; Escabí and
Schreiner 2002
; Miller et al.
2002
) and can be expressed as
![]() | (4) |
![]() | (5) |
![]() | (6) |
,
corresponds to the vector correlation, and || ·
|| designates the vector norm operator. Because the STRF is formally
defined by a two-dimensional matrix of spectral and temporal samples, Eq.
6 could not be evaluated directly since it requires vector inputs.
Therefore the statistically significant samples of the STRF that exceeded a
significance criterion of P < 0.002, were converted into a
unidimensional vector, from which the SI was determined using Eq. 6
(Escabí and Schreiner
2002Because all three similarity indices are effectively correlation coefficients between the real data and model waveforms, they assume a value of one whenever the waveforms inside their arguments are identical in shape, zero if the waveforms have nothing in common and negative one if the waveforms have identical shapes but differ by a negative sign.
Normalized mean square error
A fourth metric was defined that quantifies the relative difference in
energy between the fitted (STRFm) and the measured STRF
(STRF). The normalized mean square error (MSE) is defined as the energy of the
difference STRF normalized by the energy of a measured STRF
(DeAngelis et al. 1999
)
![]() | (7) |
Temporal asymmetry index
Initial evaluation of the temporal receptive field envelope revealed that
timing profiles of ICC neurons are characterized by sharp transient onset. We
therefore quantitatively evaluated the structure of the temporal response
envelope. To evaluate the degree of temporal asymmetry in the TRF profile, we
define an asymmetry index (
t) as the skewness of
the temporal envelope (Bliss
1967
)
![]() | (8) |
t significantly less than 0 indicates that the
TRF profile is skewed to the right; and a
t
significantly greater than 0 indicates the TRF profile is skewed to the
left. Separability index
An inherent aspect of the Gabor model is that it is composed of multiple
receptive field components, each of which is a time-frequency separable
function. If the receptive field contains only one singular value, the
receptive field is time-frequency separable; that is, it can be described by a
multiplicative product of a temporal and spectral receptive field profile as
in Eq. 2. Hypothetically, such a neuron would encode spectral and
temporal information independently. If, alternately, the receptive field has
multiple significant singular values, the receptive field will exhibit
time-frequency inseparable structure. This can manifest as obliquely oriented
STRF features or multiple asymmetrically aligned excitatory and inhibitory
receptive field subregions. Neurons with such receptive field arrangements
most likely prefer sound stimuli with dynamically changing frequency
components, and, consequently, the spectral and temporal dimensions for such
neurons cannot be treated independently of each other. This effect becomes
more pronounced if the higher-order singular values account for a large
proportion of the receptive field energy. Thus we can define a separability
index by considering the proportion of energy provided by first singular value
in relationship to the cumulative energy of the higher-order singular values.
We define the separability index (
d) as
![]() | (9) |
1 and
i are the first- and
higher-order singular values of the STRF (Eq. 1), and N is
the number of statistically significant singular values used in the Gabor STRF
model. Conceptually,
d is defined as the normalized
energy of the first singular value (relative to the total energy of the model
STRF) minus the normalized energy of the higher-order singular values.
Separability index values range from 0 to 1; where 1 corresponds to a
perfectly separable STRF and values close to zero designate a highly
inseparable receptive field arrangement. | RESULTS |
|---|
|
|
|---|
Structure of the spectral receptive field
The spectral receptive field (SRF) profile is a model representation of the
frequency integration area of auditory neurons
(Calhoun and Schreiner 1998
;
Kowalski et al. 1996
;
Miller et al. 2002
;
Schreiner and Calhoun 1994
;
Versnell and Shamma 1998). This descriptor can be used to quantify neuronal
responses to sounds with complex spectra (such as for formant transitions in
speech and spectral resonances in animal vocalizations) and to study the
receptive field arrangement of excitation and inhibition along the
cochleotopic dimension of the stimulus. Most studies using this descriptor
largely focused on qualitatively identifying general integration properties
(such as the arrangement of spectral excitation and inhibition) and only for
stimuli with static temporal characteristics. By slicing the STRF at a fixed
latency (solid lines in Fig. 1, B
and C) we can study the dynamic behavior of the SRF
profile for complex stimuli with time-varying structure. Specifically, we
would like to identify a model representation of the STRF that quantitatively
captures the general characteristics of the SRF profile and its associated
dynamics. When the latency is >40 ms, there is no discernible SRF structure
for the STRF shown in Fig.
1A. At shorter latencies, however, SRF profiles can
exhibit pure excitation, inhibition, or an alternating arrangement of
excitation and inhibition. The phase of SRF profiles changes continuously so
that the excitatory bandwidths and center frequencies change with increasing
latency. Consequently, there is no direct analytic equation to model the SRF
profile at all latencies.
|
One step toward solving this problem is to break up the SRF profile into an
envelope and a carrier component via the Hilbert transform
(Cai et al. 1997
;
Daugman 1985
; DeAngelis et al.
1993a
,
1999
; Jones and Palmer
1987a
,b
;
Marcelja 1980
). The envelope,
Es(x), is computed by the vector sum of
the SRF profile, SRF(x), and its Hilbert transform,
H[SRF(x)]
![]() | (10) |
Although the SRF profile depends strongly on the latency of the STRF, the
spectral envelope assumes a nearly invariant structure at all latencies. The
envelopes of the SRF profiles (dashed lines in
Fig. 1, B and
C) are approximately Gaussian functions and can be
conveniently defined by their bandwidth and center frequency. The bandwidth of
the SRF profile is defined as the width of the envelope at a response level
that is 1/e relative to the absolute maximum of the envelope,
capturing
85% of the energy in a Gaussian the SRF envelope. The center
frequency is defined as the peak value of the spectral envelope. As expected
for the SRF profiles of Fig. 1, B
and C, the measured bandwidths and center frequencies
along the excitatory and inhibitory cross-sections are in close agreement:
bandwidth = 1.00 and 0.89 octaves (octave is defined as log2
(f/fr), fr
= 500 Hz is a reference frequency), respectively; center frequency = 4.37 and
4.42 octaves.
The spectral receptive field structure was modeled at each time point as
the product of a Gaussian envelope and a sinusoidal carrier. Qualitatively,
the Gaussian function defines the center and extent over which the neuron
integrates spectral information, whereas the sinusoid carrier component is
necessary to account for the interleaved patterns of excitation and
inhibition. This functional form of the SRF profile, a Gabor function, is a
direct extension of the receptive field models used to study spatio-temporal
integration in the visual system (Cai et
al. 1997
; Daugman
1985
; DeAngelis et al.
1993a
; Jones and Palmer
1987a
,b
;
Marcelja 1980
). The Gabor
function can capture numerous receptive field aspects and can be used to
extract physiologically meaningful parameters directly from the neuron's
receptive field.
At each time point, the SRF profile was fitted by a Gabor function taking
the general form
![]() | (11) |
0, and P are
free parameters. The parameter K models the strength of the spectral
response in unit of spikes · s1 ·
dB1. x0 is the center
frequency or the central position of the SRF envelope in units of octaves; BW
is the bandwidth of the SRF which accounts for the spectral extent of the
receptive field;
0 is the best ripple density (units of
cycles/octaves) that models the distance between the excitatory and inhibitory
lobes; P is the spectral phase of the SRF profile with respect to the
center frequency of the Gaussian envelope. This parameter accounts for the
alignment of excitation and inhibition relative to the peak of the SRF
envelope. The optimal parameters in Eq. 11 can be obtained by
minimizing the mean square error between the Gabor function and the measured
SRF profile (Press et al.
1995Structure of the temporal receptive field
The structure of the temporal receptive field (TRF) profile was analyzed using a similar functional descriptor as for the SRF profile. The TRF profile obtained by slicing through the STRF at a particular frequency has an alternating arrangement of excitation and inhibition. The TRF profiles of collicular neurons typically have short excitation (or inhibition) followed by long inhibition (or excitation) (e.g., solid line in Fig. 2B), and their envelopes are, therefore, not symmetric about the peak point. For example, the envelope of the TRF profile shown by the dashed line in Fig. 2B is not symmetric about the peak of the temporal envelope (vertical line) because it has a sharp onset and slower off-response. Because of this temporal asymmetry, the TRF profile is not well described by a symmetric Gabor function.
|
The degree of temporal asymmetry was measured for all contralateral
responsive neurons in our ICC sample (n = 93 of 99) with an asymmetry
index,
t (see METHODS). The TRF profile
in Fig. 2B is skewed
to the left and it therefore has a positive asymmetry index (0.935).
Figure 2C (blue
histogram) illustrates the distribution of asymmetry indices, obtained for the
dynamic moving ripple sound. The population distribution shows a bias toward
positive values (mean ± SD: 1.93 ± 1.64; observed range:
0.309.7; t-test, P < 0.001), indicating that the temporal
envelopes and TRF profiles are skewed toward zero delay. Accordingly, the
temporal responses profiles of most ICC neurons exhibit a short primary
response (excitatory or inhibitory) followed by a long secondary response of
opposite sign (inhibitory or excitatory, respectively). Such timing
differences between the onset and offset of the receptive field are consistent
with asymmetric preferences to ramped auditory stimuli observed both
physiologically (Lu et al.
2001
) and psychoacoustically
(Neuhoff 1998
; Patterson
1994).
Considering the observed temporal asymmetry, we modified the Gabor model so
that it accounts for the observed timing profiles by incorporating a
time-warping factor that skews the time axis and allows us to model the TRF
with a symmetric Gabor function (DeAngelis
et al. 1999
). The time-skewing function was defined as
![]() | (12) |
is the skewing factor (observed range: 0.450.68),
t is the uncompressed time-axis, and T is the corrected
temporal axis. The TRF profile is then fitted by a Gabor function of the form
![]() | (13) |
Gabor-STRF model
The analysis of the TRF and SRF profiles shows that the temporal and spectral receptive field dimensions of auditory neurons can in principle be independently approximated by temporal and spectral Gabor functions. Does this approach generalize for the STRF? Can we model the auditory STRF by a product of Gabor TRF and SRF profiles? If so, what conditions must be satisfied?
In terms of time and frequency response interactions, auditory STRFs can be
divided into two fundamental types: separable and inseparable
(Adelson and Bergen 1985
;
DeAngelis et al. 1995
;
Depireux et al. 2001
;
Miller et al. 2002
;
Reid et al. 1991
;
Sen et al. 2001
).
Time-frequency separability of the STRF occurs whenever the STRF can be
described as the product of a SRF profile and a TRF profile, in which case the
SRF and TRF profiles are independent of each other. If a separable STRF is
taken into the Fourier domain, the ripple transfer function (RTF) is symmetric
about the zero temporal modulation frequency axis
(Depireux et al. 2001
;
Escabí and Schreiner
2002
; Miller et al.
2002
; Sen et al.
2001
). However, inseparable STRFs cannot be broken down into two
independent time and frequency functions. The representations of these STRFs
in the Fourier domain can therefore show conspicuous asymmetries
(Depireux et al. 2001
;
Escabí and Schreiner
2002
; Miller et al.
2002
; Sen et al.
2001
).
Many auditory STRFs have some inseparable features, including, time-frequency oriented subregions or multiple asymmetrically aligned excitatory and inhibitory receptive field components. Such structural features may be necessary to encode specific structural components in natural signals, such as consonant-vowel transitions in speech, and to dynamically track changes in the frequency spectrum of complex signals, such as frequency-modulated sweeps.
In the previous discussions, we showed that it is relatively easy to model
auditory receptive fields by independent Gabor profiles (spectral and
temporal) if they are time-frequency separable; however, this procedure is not
directly applicable for inseparable STRFs. One way to overcome this difficulty
is to first decompose an inseparable STRF
(Fig. 3A) into several
separable STRF components (Fig. 3,
B and C). Each of the separable STRF components
can then be fitted by a time-frequency separable Gabor
(Fig. 3, D and
E). Finally, the fitted resultant STRF is approximated by
the sum of each separable fitted STRF component (see METHODS,
Eq. 3; Fig. 3). This
procedure is realized using a singular value decomposition (SVD) to determine
numerically the smallest number of independent time-frequency dimensions of
the STRF (Depireux 2001; Press et al.
1995
; Theunissen 2000).
We determined the number of independent STRF components required for the
Gabor STRF model numerically by finding those components that exceed a
significance criterion of P < 0.01
(Fig. 4C).
Figure 4C describes
the relationship between the measured spike rate and the level of the noise
for dynamic moving ripples. The level of the noise increases as function of
the spike rate. The magnitude of the first (red *), second (blue
), and
third (green
) STRF singular values are plotted against the
noise-threshold level; of which 100% of the first STRF components exceeded the
noise level. By comparison, only 39.7% of the second, 7.5% of the third STRF
components exceeded the significance criterion (solid black line in
Fig. 4, B and
C). The total energy contribution of the first and second
singular value components accounts for 78.9 ± 15.7 and 6.2 ±
5.0% of the STRF energy, respectively. The third component, however, only
contributes 2.3 ± 1.8% of the total STRF energy. Therefore the first
and second singular values are typically sufficient for describing the
spectro-temporal structure of ICC receptive fields.
Validating the Gabor STRF model
As with any model, its overall utility ultimately depends on its ability to
account for observed empirical results. Specifically, we are interested in
determining how well the separable Gabor STRF model accounts for receptive
field structure of inferior colliculus neurons. Does the model adequately
account for spectral and/or temporal receptive field structures? If so, how
well does it account for joint spectro-temporal receptive field
characteristics? We devised four metrics to independently quantify the
spectral, temporal, and spectro-temporal goodness of fit of the model.
Differences in receptive field shape between the model and neural data were
quantified individually for the SRF and TRF profiles as well as for the STRF.
The spectral similarity index (SIs), temporal similarity
index (SIt), and spectro-temporal similarity index (SI)
each independently measure how well the model accounts for the structure of
the SRF, TRF, and STRF, respectively. Each SI is equivalent to a correlation
coefficient between the data and model, and, therefore, they assume numerical
values between negative and positive one
(DeAngelis et al. 1999
;
Escabí and Schreiner
2002
; Miller et al.
2002
). Errors due to energy differences between the model and data
were characterized with an energy error metricwhich we computed as a
normalized mean square error (MSE; see METHODS) from the residual
errors (difference between Gabor STRF model and the original STRF;
Fig. 5, third column). This
metric assumes values between zero and one, where zero indicates that the
model provides a perfect fit and a value of one is indicative of a poor
fit.
|
Figure 5 illustrates example fits of the STRF Gabor model of five ICC neurons and the residual errors between the model and data (third column). In most instances, the model accounts for the spectral, temporal, and spectro-temporal receptive field structure exceptionally well. For instance, the measured SI values (spectral SI = 0.992; temporal SI = 0.992; spectro-temporal SI = 0.967) and MSE (0.043) show that a strongly nonseparable STRF (Fig. 5A; separability index = 0.692) can be adequately fit by the model. Not surprisingly, the structure of separable STRFs (Fig. 5C) is easily captured by the model (spectral SI = 0.993; temporal SI = 0.966; spectro-temporal SI = 0.976; MSE = 0.022); however, the number of STRF components required to fit a separable STRF is typically lower than for a nonseparable STRF (correlation between number of components and separability index: r = 0.679 ± 0.077, P < 0.001).
The example STRFs of Fig. 5, AC, were exceptionally clean with little additive noise. Other neurons had higher levels of noise (Fig. 5D), and yet, the model was able to account for their STRF structure (spectral SI = 0.955; temporal SI = 0.975; spectro-temporal SI = 0.941; MSE = 0.079).
Although the model was able to account for the structure of many neurons, it could not fit all receptive field structures. The neuron of Fig. 5E, for example, has multiple excitatory peaks that are displaced along the spectral axis. The measured SI values and MSE (spectral SI = 0.857; temporal SI = 0.970; spectro-temporal SI = 0.762; MSE = 0.434) indicate that the model accounts reasonably well for the temporal RF structure, which has a simple on-off TRF profile; however, the model can not fully account for the multiple excitatory spectral peaks observed in the original SRF. This happens because the spectral oscillations of the STRF are strictly positive valued, whereas the Gabor model requires oscillatory components with negative and positive values. Accordingly, the model fails to account for the STRF structure because of its inability to model the SRF profile of the neuron.
The distribution for the three-similarity indices and the normalized MSE of
all neurons are illustrated in Fig.
6. Overall the Gabor STRF model fully accounts for much of the
spectral, temporal, and spectro-temporal structure of inferior colliculus
neurons. In both instances, the mean spectral and temporal SIs
(Fig. 6, A and
B) are close to unity (0.938 ± 0.088 and 0.933
± 0.075, respectively), suggesting that the shapes of the TRF and SRF
profiles are readily accounted for by the Gabor model. Furthermore, the
spectral and temporal SIs are not significantly different (paired
t-test, P > 0.57), indicating that Gabor TRF and SRF
models are equally well suited for describing the temporal and spectral
receptive field profiles. The mean value of the spectro-temporal SI (0.846
± 0.125; Fig.
6C) is lower than spectral and temporal SI (paired
t-test; P < 0.001 and P < 0.001,
respectively). This reduction in SI is accounted for by the fact that
independent multiplicative errors are propagated from the SRF and TRF profiles
to the STRF in the model, leading to a reduction in the spectro-temporal SI
(using the spectral and temporal SI, the expected spectro-temporal SI assuming
independent profiles is 0.938 x 0.933 = 0.875). Finally, the residual
errors of the model (Fig.
6D) are typically small, as suggested by the MSE energy
error metric (mean ± SD = 0.185 ± 0.126), and were typically not
significantly different from random noise (
2 test; P
< 0.01 for 58 of 93 neurons; critical value,
= 36.2).
|
Spectral response preferences
Spectral response preferences of auditory neurons are typically determined
with isolated pure-tones of varying frequency. The SRF is an extension of the
methods used to study frequency response preferences using sound stimuli with
spectral structure (Kowalski et al.
1996
; Schreiner and Calhoun
1994
; Versnel and Shamma
1998
). This descriptor allows us to study spectral integration
properties of single neurons to dynamic broadband sounds with a rich spectral
structure. Spectral selectivity is captured by four parameters of the Gabor
function SRF (Eq. 11) center frequency
(x0), SRF bandwidth (BW), best ripple density
(
0), and spectral phase (P). The center frequency
and bandwidth determine the central location and width of the SRF profile; the
best ripple density determines the number of excitatory or inhibitory peaks in
the SRF, and the spectral phase determines their alignment relative to the
center frequency. Individually, each of these parameters reflects structural
properties of the neuronal response area. The center frequency determines the
central position of the SRF, whereas the bandwidth determines its spectral
extent or selectivity. The ripple density accounts for the interleaving
pattern of excitation and inhibition observed in many neurons, whereas the
spectral phase determines the exact position of the excitatory and inhibitory
SRF subregions.
Due to some frequency bias in the sampling of ICC, the contralateral
receptive field of the studied neurons covered a range of center frequencies
from 1.47 to 5.3 oct. (between 1.393 and 20 kHz) of which 64.5% were
located in the range from 4 to 5 octaves (between 8 and 16 kHz;
Fig. 7A). While the
center frequency of the neuron determines the position along the primary
sensory epithelium that preferentially activates the neuron, the spectral
bandwidth accounts for the range of frequencies over which the neuron
integrates spectral information, including both excitatory and inhibitory
features. SRF bandwidths ranged from 0.14 to 4.8 octavesalthough most
neurons had bandwidths below
2.0 octaves (93%). The SRF bandwidth follows
a unimodal distribution with mean 0.988 octaves and median 0.654 octaves
(Fig. 7C).
|
Auditory neurons can also respond selectively to oscillatory patterns of
the stimulus spectrum (Kowalski et al.
1996
; Schreiner and Calhoun
1994
). Such selectivity arises via alternating excitatory and
inhibitory subfields of the SRF profile. These excitatory and inhibitory RF
features must overlap on and off features of the stimulus spectrum for the
neuron to respond. Therefore such spectral selectivity is reflected in the SRF
profile by alternating on and off subfields of the SRF profile, analogous to
spatial grating selectivity in the visual system
(Cai et al. 1997
; DeAngelis et
al. 1995
,
1999
). This form of spectral
selectivity is captured by the Gabor model in the best ripple density
parameter. The ripple density (units of cycles/octave) represents the number
of spectral peaks in the stimulus spectrum existing over an octave range of
frequencies. The best ripple density is defined as the number of stimulus
spectral peaks that produces a maximal neural response. Alternately, it can
also be thought of as the number of interleaved excitatory and inhibitory
subunits of the SRF existing over a single octave
(Escabí and Schreiner
2002
; Klein et al.
2000
; Miller et al.
2002
; Schreiner and Calhoun
1994
). Most neurons in our sample preferred low ripple densities
(Fig. 7B; mean = 0.609
cycles/octave; median = 0.406 cycles/octave), indicating that they preferred
broad spectral features of the dynamic moving ripple sound. The range of best
ripple densities extended from nearly 0 (0.022 cycles/octave) to 2.113
cycles/octave although all neurons were tested up to 4 cycles/octave.
Finally, the spectral phase of the SRF profile determines the alignment of excitatory and inhibitory features relative to the center frequency of the neuron. Conceptually, a spectral phase shift corresponds to a frequency shift of the actual SRF maximum (not the envelope peak or center frequency). A positive phase value shifts the maximum of the spectral profile to lower frequencies; a negative phase shifts the SRF maximum to higher frequencies. Most of the STRFs (78.5%) have positive spectral phases, indicating that neurons favor lower frequencies than the center frequency (Fig. 7D).
The SRF profile allows us to study its arrangement in terms of spectral
excitation and inhibition. The behavior of each neuron can also be interpreted
directly in the ripple density or frequency domain
(Kowalski et al. 1996
;
Miller et al. 2002
;
Schreiner and Calhoun 1994
).
To do this, the SRF is converted into a spectral modulation transfer function
(sMTF). The sMTF measures the neurons response (spikes ·
s1 · dB1) as
a function of the applied ripple density. Using the Gabor model representation
of the SRF profile (Eq. 11), the corresponding sMTF is obtained by
applying a Fourier transform magnitude (FTM) to the SRF profile
![]() | (14) |
. The
sMTF acquires the structure of a Gaussian function with the center
0 and standard deviation
. The bandwidth of the
sMTF is defined as the width of the sMTF that accounts for 85% of the total
energy under the Gaussian curve. This parameter determines the range of
spectral oscillations (cycles/octave) in a stimulus that can potentially
activate the neuron. According to this criterion, the tail points at the level
of 1/e of the Gaussian sMTF peak value delineate the bandwidth of the
sMTF. Compared to the bandwidth of the SRF profile, the bandwidth of the sMTF
(4/
/BW) is inversely proportional to the bandwidth of the SRF profile
(BW). Figure 8, AC, shows representative sMTFs of three single neurons in the ICC. To facilitate comparisons, each sMTF was normalized so that their total energy is equal to one; shows the normalized sMTFs from Eq. 14, - - - corresponds to the normalized sMTFs obtained directly from measured SRF profiles. The Gabor sMTF model (Eq. 14) accounts for the structure and energy of the actual sMTFs quite well as depicted by the and - - - in Fig. 8.
|
Neurons were individually classified according to their spectral filtering
characteristics. These can, in theory, take the form of lowpass, bandpass, or
highpass filtering response pattern. Neurons in our sample only exhibited
lowpass (Fig. 8A) and
bandpass (Fig. 8, B and
C) spectral selectivity. The criterion for classifying
each neuron from the sMTF consisted of comparing the sMTF bandwidth of each
neuron in relation to its best ripple density. Specifically, we required that
the measured best ripple density (
0) be greater than half
the sMTF bandwidth for bandpass neurons. This requirement guarantees that
bandpass neurons have a residual DC level response of less than half the sMTF
peak magnitude; whereas lowpass neurons will have a significant DC response
with >50% of the peak response magnitude.
Figure 8A illustrates
this procedure for a typical sMTF with lowpass selectivity (same as
Fig. 5A), which shows
a nonoscillatory on-spectral response pattern. Its sMTF indicates that the
structure of the STRF along the spectral dimension is dominantly excitatory or
inhibitory. A neuron with bandpass filter characteristics is illustrated by
the examples of Fig.
8C (same as Fig.
5B). This neuron has an SRF with strong alternating
excitatory and inhibitory subfields. An intermediate scenario occurs for the
neuron of Fig. 8B
(same as Fig. 2A),
which shows a significant DC level response in the sMTF; however, the neuron
exhibits weak inhibitory sidebands and, consequently, a best ripple density
that is offset from zero. In the STRF domain, this neurons shows a strong
pattern of excitation and a significant, but subtle, inhibitory subregion.
According to our criterion, we found that 80 of 93 neurons exhibited lowpass
response preferences; 83 neurons (13 bandpass and 70 lowpass) had best ripple
densities offset from zero (as for Fig.
8B) and 69 had best ripple densities <1 cycle/octave.
Thirteen neurons exhibited bandpass selectivity, and no neurons had highpass
response preferences.
Each individual sMTF tells us about the spectral selectivity of individual neurons and tells us little about the overall spectral filtering capabilities of the inferior colliculus. Therefore, we determined the overall spectral selectivity of the inferior colliculus by computing a population sMTF. The population sMTF of the inferior colliculus (Fig. 8D) was obtained by averaging the amplitude-normalized sMTFs of all single neurons. Using the criterion defined for single unit sMTFs, we find that the spectral selectivity of the ICC (in the sampled frequency range) is lowpass with a bandwidth of 0.995 cycles/octave (at upper 8.68 dB cutoff; according to the 1/e bandwidth criterion) or 0.662 cycles/octave (at upper 6 dB cutoff) and centered about a best ripple density of zero cycles/octave. Thus the ICC as a whole has a significant preference for broadband stimuli.
Temporal response preferences
Neurons in the ICC show a diverse range of response preferences to
temporally modulated stimuli (e.g.,
Krishna and Semple 2000
;
Langner and Schreiner 1988
;
Ramachandran et al. 1999
;
Rees and Møller 1983
).
While numerous studies have identified the output-response characteristics of
ICC neurons to simple time-varying stimuli, the receptive field structure
leading to these response preferences has previously not been studied.
Temporal response characteristics of ICC neurons can be interpreted by four
parameters of the temporal Gabor model (Eq. 13)the best
temporal modulation frequency (Fm0),
the peak latency (T0), the response duration (D),
and the temporal phase (Q). Together, the peak latency and response
duration determine the locality and width of the TRF profile, respectively;
the best temporal modulation frequency and temporal phase determine the rate
and alignment of the temporal oscillation of the TRF profile.
Figure 9 illustrates distributions for these parameters for the contralateral receptive field. The absolute value of the best temporal modulation frequency ranged from 0 to 255.5 Hz and the distribution peaks at 30 Hz (Fig. 9A). Thus although numerous neurons can respond selectively to exceedingly fast temporal modulations of the dynamic moving ripple, most neurons preferred low modulation rates.
|
The peak latency is defined as the time of maximal neural response
(excitation or inhibition) following the onset of stimulation, whereas the
response duration determines the time period over which the neurons integrate
acoustic information. From the distributions in
Fig. 9B, the peak
latency was usually <20 ms (range: 3.527.4 ms; mean: 10.1 ms;
median: 8.5 ms) and is consistent with previous observations using pure tone
and noise stimuli (Krishna and Semple
2000
; Langner and Schreiner
1988
). The response durations extended over a broad range
(observed range: 1.882.6 ms), although most neurons typically had short
response durations (mean: 12.1 ms, median = 6.2 ms).
Finally, the temporal phase determines the arrangement of excitation and inhibition of the TRF profile, relative to the peak latency or centroid positionwhich is determined from the TRF envelope. Positive temporal phases shift the TRF profile to the left of the peak latency; negative values shift the TRF profile to longer latencies. The temporal phase distribution (Fig. 9D) shows that 78.5% of temporal phases are positive, thus indicating that the peaks of the TRF profiles are typically shifted to the left of the peak derived from the temporal envelope. Therefore excitation typically precedes inhibition.
The TRF profile allows us to study the timing of the neural response and
the temporal arrangement of excitation and inhibition. The behavior of each
neuron can also be interpreted and studied directly in the frequency domain.
By converting the TRF profile (measured at the center frequency) into the
Fourier domain, we can obtain the temporal modulation transfer function (tMTF)
of each neuron. The tMTF characterizes the time-locked response of the neuron
as a function of the temporal modulation frequency. Using the Gabor function
TRF profile (Eq. 13), the tMTF can be represented by a Gaussian
function of the form
![]() | (15) |
/D). Figure 10 shows three representative inferior colliculus tMTFs. The examples of Fig. 10, A and B, have a significant DC level response and are therefore classified as having low-pass sensitivity to the temporal modulation frequency. While the first neuron has its strongest response at zero frequency, the latter neuron has a best temporal modulation frequency of 130.3 Hz. Both neurons responded over a large range of modulation frequencies as suggested by their response bandwidths. The bandwidths of the tMTF for Fig. 10, A and B, are 350.0 Hz (at upper 8.68 dB cutoff or 324.7 Hz at upper 6 dB cutoff) and 245.4 Hz (at upper 8.68 dB cutoff or 223.8 Hz at upper 6 dB cutoff), respectively.
|