## Abstract

Neurons in the dorsal cochlear nucleus (DCN) exhibit nonlinearities in spectral processing, which make it difficult to predict the neurons’ responses to stimuli. Here, we consider two possible sources of nonlinearity: nonmonotonic responses as sound level increases due to inhibition and interactions between frequency components. A spectral weighting function model of rate responses is used; the model approximates the neuron's rate response as a weighted sum of the frequency components of the stimulus plus a second-order sum that captures interactions between frequencies. Such models approximate DCN neurons well at low spectral contrast, i.e., when the SD (contrast) of the stimulus spectrum is limited to 3 dB. This model is compared with a first-order sum with weights that are explicit functions of sound level, so that the low-contrast model is extended to spectral contrasts of 12 dB, the range of natural stimuli. The sound-level–dependent weights improve prediction performance at large spectral contrast. However, the interactions between frequencies, represented as second-order terms, are more important at low spectral contrast. The level-dependent model is shown to predict previously described patterns of responses to spectral edges, showing that small changes in the inhibitory components of the receptive field can produce large changes in the responses of the neuron to features of natural stimuli. These results provide an effective way of characterizing nonlinear auditory neurons incorporating stimulus-dependent sensitivity changes. Such models could be used for neurons in other sensory systems that show similar effects.

## INTRODUCTION

The goals of receptive-field modeling are to provide a summary of the stimuli to which a sensory neuron responds and to predict the neuron's responses to new stimuli. Receptive fields of auditory neurons have been based on both tone response maps (tuning curves; Fig. 1*A*) and reverse-correlation analysis using broadband stimuli (de Boer and de Jongh 1978). The latter results in a spectrotemporal receptive field (STRF), interpreted as a first-order weighting of stimulus energy across time and frequency that predicts the neuron's response (spiking) probability (Aertsen and Johannesma 1981; Eggermont et al. 1983b; Escabi and Read 2003).

First-order models similar to the STRF provide a convenient and robust summary of the stimulus selectivity of auditory neurons in many situations. For example, they have been applied to tracking receptive-field changes during behavioral tasks (Fritz et al. 2003), to comparing effective receptive fields for different kinds of stimuli (Theunissen et al. 2000), and to defining the optimal stimulus for a neuron (deCharms et al. 1998). The strongest test of a receptive field model is whether it can predict responses to stimuli, preferably to stimuli not used in construction of the model. In this sense, STRFs work well for peripheral auditory neurons, such as auditory nerve fibers and ventral cochlear nucleus (VCN) neurons (de Boer and de Jongh 1978; Temchin et al. 2005; Yu and Young 2000). However, prediction by these models is generally poor at higher levels of the auditory system (Eggermont et al. 1983a; Escabi and Schreiner 2002; Machens et al. 2004; Nelken et al. 1997; Sen et al. 2001; Theunissen et al. 2000; Versnel and Shamma 1998; Yeshurun et al. 1989). A further drawback of STRF models is that the model that emerges from the analysis is different when derived from different stimuli, such as broadband noise versus birdsong (Escabi and Schreiner 2002; Theunissen et al. 2000), thus creating an ambiguity about the true receptive field of a neuron. The goal of this work is to investigate possible reasons for the above-cited failures and to provide an improved way of characterizing receptive fields of auditory neurons.

The STRF has a frequency and a time axis, variations along which show the neuron's frequency tuning and its modulation filtering properties, respectively. The frequency selectivity of neurons in the cochlear nucleus and auditory nerve can be characterized by a weighting-function model (Young and Calhoun 2005; Yu and Young 2000), which attempts to predict only the average discharge rate of the neuron to a spectrally stationary stimulus, i.e., only its frequency integration properties. Such a frequency-weighting function model can be related to the STRF assuming separability of STRFs in time and frequency (Young et al. 2005). It should be noted that often the STRF is separable into a function of frequency and a function of time (Depireux et al. 2001; Qiu et al. 2003), at least at levels of the auditory system up to the inferior colliculus.

A weighting-function model is advantageous because it is feasible to incorporate second-order components into the model, providing insights into nonlinear properties. Such second-order terms capture the interactions between energy at different frequencies to model one form of nonlinearity in neurons’ responses. In the cochlear nucleus, second-order terms improve the performance of the weighting-function model (Yu 2003; Yu and Young 2000), but in the dorsal cochlear nucleus (DCN), significant nonlinearities remain even when second-order terms are incorporated.

A second source of nonlinearity in auditory receptive fields is that the neurons’ responses change with sound level (Nagel and Doupe 2006; Nelken et al. 1997). This is apparent in Fig. 1 where the tone response map and the weighting functions change with the overall sound level of the stimulus. The change in weighting functions with level at 3-dB contrast for the example neuron in Fig. 1 is more subtle than the changes observed in another neuron shown later in the paper in ⇓⇓⇓⇓⇓Fig. 7. The importance of sound level is also suggested by changes in both the gain and prediction performance of weighting functions, depending on stimulus contrast (Reiss et al. 2007).

It is well known that auditory receptive fields vary significantly with sound level as seen in studies with tones (as in Fig. 1), where response maps or tuning curves cover a wider frequency range and show stronger inhibition at higher sound levels (Spirou and Young 1991; Sutter et al. 1999). This strong inhibition represents a nonlinearity, which implies that an STRF model or a spectral receptive field will change depending on the sound level of the stimuli. Such behavior is seen in quadratic models of neurons in the cochlear nucleus (Nelken et al. 1997; Yu 2003) but has not been explicitly studied in STRFs. Evidence of the importance of sound level is provided by the fact that weighting-function models predict responses of DCN principal cells more accurately for stimulus sets with low spectral contrast, i.e., stimuli whose spectral shapes deviate from a reference level by a limited amount. One of the motivations for the work described here is to develop better models of spectral processing using stimuli with a variety of spectral contrasts and at a variety of levels. Better systems models should lead to a better understanding of stimulus feature selectivity or what a neuron “encodes.”

Herein, we develop a receptive-field model for DCN principal neurons that incorporates sound level as an explicit parameter in the weighting function. We show that such models predict responses to stimuli with large contrast, typical of natural sounds. However, for stimuli with low spectral contrast, the effects of interactions between different frequencies are more important and a level-dependent model is unnecessary.

## METHODS

### Surgical procedures

Experiments were conducted on a total of 14 adult cats (3–4 kg) with infection-free ears and clear tympanic membranes. Animal-use protocols were approved by the Johns Hopkins Animal Care and Use Committee. Cats were tranquilized with xylazine [2 mg, administered intramuscularly (im)] and anesthetized with ketamine (40 mg/kg im). Atropine (0.1 mg im) was given to control mucous secretion. A tracheal tube was inserted. Cats were decerebrated by aspirating through the brain stem between the superior colliculus and thalamus, after which anesthesia was discontinued. Core body temperature was maintained at about 38°C using a regulated heating blanket and lactated Ringer solution was given intravenously to maintain fluid volume.

The DCN was exposed by opening the skull and dura above the cerebellum and aspirating the part of the cerebellum overlying the DCN. Platinum–iridium microelectrodes were advanced into the DCN under visual control and single neurons were isolated and recorded extracellularly. Action potentials were detected with a Schmitt trigger and spike times recorded with a precision of 10 μs.

### Experimental protocol

Recordings were made in a sound-attenuating chamber. Acoustic stimuli were delivered to the ipsilateral ear by an electrostatic speaker coupled to a hollow ear bar. The bulla was vented through a length of PE-90 tubing. The speaker was calibrated in situ using a probe tube placed about 2 mm from the eardrum. The calibration was essentially flat with fluctuations of <10 dB from 0.5 to 30 kHz. Within the bandwidth used for the analysis of each neuron (1.25 octaves, subsequently discussed), the SD of the calibrations around their values at the BFs of the neurons was about 2 dB. No correction for the calibration was applied during the analysis. The effect of fluctuations in the calibration on the analysis are small and have been ignored.

All data are from well-isolated single neurons, judged from the separation of the action-potential amplitude from noise and other action potentials and the presence of a refractory period. Isolated neurons were characterized using a combination of tones and broadband noise. Rate versus level functions were collected for best-frequency (BF) tones and noise by presenting 200-ms stimulus bursts (10-ms rise/fall times) once per second over an 80- to 100-dB range of sound levels. Type IV neurons were classified as having moderate spontaneous rates and BF-tone rate–level functions with excitation at low sound levels and inhibition at high sound levels (Shofner and Young 1985). Only neurons located along the electrode track before a BF gradient shift, which indicates a transition from DCN to VCN, were classified as DCN neurons. This paper describes data from DCN type IV neurons only.

The acoustic stimuli described in the next section were presented at a rate of one stimulus per 1.1 s. The stimulus duration was 399 ms. Each set of stimuli was presented over a range of sound levels, spaced at 5–10 dB, beginning near threshold. Response rate was computed as the number of spikes during the stimulus divided by the duration.

### Stimuli

The random spectral shape (RSS) stimuli used here are similar to those used before (Young and Calhoun 2005; Yu and Young 2000). Each stimulus consists of a sum of tones spaced logarithmically at 1/64th octave. The tones are grouped into frequency bins of 1/8th octave and all 8 tones within a bin have the same amplitude; the sound level *S*(*f*) in each frequency bin is the sum of the energies of these tones. The starting phases of the tones were randomized to avoid a click at stimulus onset. The phases were randomized for each stimulus. Linear 10-ms onset and offset ramps were added to the time-domain signal. The stimuli were not corrected for spectral irregularities in the speaker calibration.

The RSS stimuli had a bandwidth of 6.125 octaves, centered on 5.75 kHz. Each RSS set consisted of 410 stimuli. In 400 of these, the dB amplitudes of the bins *S*(*f*) were selected pseudorandomly from an approximately Gaussian distribution with 0 mean and SD of 12, 6, or 3 dB; the SD is subsequently called “spectral contrast.” *S*(*f*) is the dB level, relative to a reference sound level, of the sound in the bin centered on frequency *f*. The remaining 10 stimuli had the reference sound level in all bins, i.e., *S*(*f*) = 0 dB for all *f*. Stimuli were organized into successive plus–minus pairs, so that the dB levels of the first stimulus of the pair *S*_{i}(*f*) were inverted in the second stimulus *S*_{i}_{+1}(*f*) = −*S*_{i}(*f*), for *i* odd. These plus–minus pairs were used to separate the estimation of even- and odd-order terms, as described previously (Reiss et al. 2007). The 10 all-0-dB stimuli were used to estimate the reference rate *R*_{0} in the quadratic model described in the next section. Note that the all-0-dB stimuli are not “flat,” in the usual sense. Because of the logarithmic spacing of tones, this spectrum actually has a 1/*f* shape.

Data from four type IV neurons were included for which the 6- and 12-dB contrast stimulus sets were different from the earlier description. They were 3.75 octaves wide with a periodic structure that repeated every 1.25 octaves (9 frequency bins), and had 210 stimuli (including 10 all-0-dB stimuli) in each set. These stimuli were resampled during presentation to be centered at the BF of the neuron. Additionally, these sets did not have plus–minus pairs and thus their corresponding quadratic model was computed as in Young and Calhoun (2005). The weighting functions computed with the two types of stimuli behaved in the same fashion.

### Quadratic weighting function model

The average discharge rate *r* is modeled using a quadratic weighting function as follows (1) The first-order weights *w*_{j} are the “gains” of the neuron in spikes/(s dB) for sound energy in the corresponding frequency bin (*f*_{j}). The second-order weights *m*_{jk} are the gains of the neuron in spikes/(s dB^{2}) for joint sound energy in the *j*th and *k*th bins. The parameters of this model were estimated by minimizing the chi-square error between the rates predicted by the model and the actual rates, using the method of normal equations. Even- and odd-order terms in *Eq. 1* were handled separately, for stimuli constructed with plus–minus pairs (Reiss et al. 2007). The assumptions and motivations for the quadratic model have been discussed previously (Young and Calhoun 2005).

### Testing validity and generality of the model

To test the quality of the fit, the model was estimated from 75% of the data points and then tested by using it to predict the remaining 25% of the data points. Confidence intervals were calculated using ≥200 bootstraps (Efron and Tibshirani 1993), each time estimating the model from 75% of the data and predicting responses to the remaining 25% of stimuli. The measure of prediction performance was the fraction of variance, defined as (2) where *r*_{j} is the actual rate for the *j*th stimulus, r̂_{j} is the rate computed by the model, and r̄ is the mean rate. *fv* has a maximum value of 1, when the model fits the data perfectly, and decreases as the error increases. It is zero when the mean rate fits as well as the model and can go negative for poor models. Here, *fv* was not limited at zero.

### Level-dependent weight model

To test the importance of weight variation with sound level, we developed a model in which the weights are explicitly a function of stimulus level. The quadratic model contains a form of weight variation with sound level. By regrouping the terms in *Eq. 1*, the quadratic model can be rewritten as a first-order weight summation with level-dependent weights as follows (3) where the weight at frequency *f*_{j} is now a constant (*w*_{j}) plus a linear function of the sound levels in all of the bins (the summation in brackets). Thus the quadratic model applies a correction for sound level, but it is capable of only a first-order correction. Here, we compare the quadratic model with a level-dependent weighting function model (LDWM), in which the weights vary with sound level in a less-constrained way. In the LDWM (4) The symbols are as before, except that the *g*_{j}(*S*_{j}) represents first-order weights that vary with the stimulus level at frequency *f*_{j}, denoted by *S*_{j}. The second-order interactions between different frequencies (i.e., terms in *m*_{jk} for *j* ≠ *k* in *Eqs. 1* and *3*) are not explicitly present in this model. The weight in a particular frequency bin is assumed to be a function of the sound level in that bin only. Although this assumption reduces the performance of the model at low spectral contrast, it was made to keep down the number of parameters that have to be estimated. Figure 2 shows an example of a weight function *g*_{j}(*S*_{j}) as a function of bin frequency *f*_{j} (abscissa) and the stimulus level in that bin *S*_{j} (ordinate). The weights as a function of stimulus energy in the bin at BF, *g*_{BF}(*S*_{BF}), are shown as the red line on the back wall of the plot. In both fitting the model and computing its response to a stimulus, *Eq. 4* is used and the weight in each frequency bin is determined from the stimulus energy in the bin by interpolation as subsequently described. This local linear variation of weights makes the LDWM locally quadratic, meaning that the weights vary with level as in *Eq. 3*, but only for the diagonal terms, *m*_{jj} and not the cross-frequency terms *m*_{jk} for *j* ≠ *k*. However, the slope of the weight variation also changes with level, to differentiate *Eq. 4* from *Eq. 3*.

The functions *g*_{i}(*S*_{i}) are piecewise linear and are specified by a matrix of weights **W** in successive segments, defined by the elbow points in the vector **e** (5) The elbows are placed every Δ dB and the weights within each Δ dB step are linearly interpolated based on the weights at the two ends of the step. Because the stimuli had some sound levels outside the endpoints of **e**, linear extrapolation of the gains was done at these levels, continuing the slope of the segment adjacent to the boundary of **e**. Because the rate for * S* =

**0**is

*R*

_{0}, by definition (

*Eq. 4*) the weight at 0 dB cannot be estimated from the data. Thus the elbow points nearest 0 dB were usually offset from 0, i.e., placed at ±Δ/2. The highest elbow was usually 10–15 dB above the highest reference level in the data and similarly for the lowest elbow; this limit was necessary to guarantee sufficient data to estimate the gains.

The LDWM was fit to a neuron's responses to several sets of RSS stimuli with different contrasts and overall sound levels. However, in the LDWM, the stimulus energies are all expressed relative to a single reference or all-0-dB stimulus component level, which is the **s** = **0** stimulus for the LDWM. Thus when used with *Eq. 4* the stimuli from an RSS set with reference (all-0-dB) level *A* dB SPL are corrected for the reference sound level *B* of the LDWM by adding *A* − *B* to the *S*_{i} of the RSS set. The model was fit by minimizing the chi-square error between rates and model predictions using a gradient-descent algorithm (the Matlab function *lsqcurvefit* available in the Matlab Optimization Toolbox). When computing rates from the model, its output was thresholded at 0, disallowing negative firing rates. The number of parameters estimated for the LDWM is the number of elements in the vector **e** times the number of frequency bins, which is of the order of 100. To maximize the ratio of data to parameters, the LDWM weights were computed over a continuous range of frequencies (≥1.25 octaves wide) symmetric around the BF of the neurons. To compare performances, the corresponding quadratic models were computed over the same range of frequencies. This range is wide enough to include all the significant nonzero weights in most neurons (Yu 2003).

## RESULTS

### Variation of gain with stimulus amplitude in different frequency bins

Two assumptions are important to the LDWM: first, that the weight varies with the stimulus level and second that the stimulus level in a particular frequency bin is the primary determinant of the weight in that bin. The plausibility of these assumptions is supported by Fig. 3, which shows first-order weights (*w*_{i} in *Eq. 1*) computed from subsets of the overall RSS stimulus set by constraining the amplitudes in one frequency bin. A subset of 100 of the 400 RSS stimuli from a set with spectral contrast of 12 dB was chosen, these being the ones with the smallest 100 amplitudes in a particular frequency bin, say *f*_{C}. The result was to constrain the amplitudes *S*(*f*_{C}) to range from approximately −3.8 to +3.8 dB, with small variation depending on the choice of *f*_{C}. Because the bin amplitudes are independent, this selection did not systematically change the distribution of amplitudes in the other bins, and those remained at 12-dB spectral contrast. The first-order weights were then recomputed for the chosen subset of stimuli. In Fig. 3, these constrained weights are compared with the weights for the full 3- and 12-dB RSS stimulus sets, for three constraint bins, as indicated in the figure legend. The symbols mark the bin with the stimulus-amplitude constraint.

Figure 3 shows that constraining the stimulus amplitude in one bin causes the first-order weights to change substantially from those computed with 12-dB contrast stimuli (heavy dotted line); it is important that the change occurs primarily in the constrained bin. For example, the weights in bins 6 (circle) and 7 (left triangle) increase significantly when those bins are constrained. The weight in bin 5 (diamond) does not change when it is constrained, perhaps because the 3- and 12-dB contrast stimuli give roughly the same weight in bin 5. In this example, the weights in constrained bins approach the weight size for 3-dB contrast (heavy dashed line). In other neurons, the weight in the constrained bin does not always approach the 3-dB-contrast value, but the change in weight due to constraint is always in the constrained bin. This result shows that the estimated weight *w*_{C} in a particular bin is affected by the presence of high energy [large *S*(*f*_{C})] in that bin. Although the constrained weights do not equal the weights for 3-dB spectral contrast, this analysis shows that the effects of changes in level are largely local, confined to the frequency bin itself, and provides a motivation for the LDWM formulation.

### Level-dependent weight model (LDWM)

The LDWM of *Eq. 4* was fit to data from 21 DCN type IV neurons. Generally, data from multiple spectral contrasts and reference levels were used depending on the data available. For comparison, quadratic models were fit to the same data. Usually the LDWM was fit to the whole data set and the quadratic model was fit separately to each RSS data set (i.e., a set of RSS stimuli at one reference level and spectral contrast).

Figure 4 shows the weights for a type IV DCN neuron (BF = 21.1 kHz) in a three-dimensional (3D) plot (Fig. 4*A*). The same weights are plotted as contours of weight versus frequency at various sound levels (Fig. 4*B*). These sound levels are the elbow points for the piecewise linear fit of the weights (*e*_{k} in *Eq. 5*). In this case, the reference level was set below the threshold of the neuron (−20-dB SPL per component). At levels just above threshold (17–23 dB re reference in Fig. 4*B*), the weights are positive and narrowly tuned near BF. At higher levels, the neuron is inhibited by frequencies at and below BF and is excited by frequencies just above BF. This pattern of first-order weight variation is observed in quadratic models of about 60% of DCN type IV neurons (Yu 2003). Looking across sound level at a fixed frequency, it is clear that the weight variation with sound level is not linear, as assumed in the quadratic model (*Eq. 1*). Furthermore, the weights in each frequency bin change differently, disallowing a separable model consisting of a frequency tuning function multiplied by a single nonlinear function of level.

The SDs of the model parameters were estimated by bootstrap, where the estimation was done 100 times based on 75% of the data, randomly chosen without replacement from the full set. The SDs of the weights are shown as error bars in Fig. 4*B*; they are small because this is a highly overdetermined system of equations (∼100 parameters estimated from ∼2,000 equations). The robustness of the estimation algorithm was also tested by starting from random initial values for the weights and repeating the gradient descent; the resulting weight estimates had small variations, comparable to the error bars in Fig. 4*B*.

The quality of the model was tested using the *fv* (*Eq. 2*) when predicting responses to the 25% of data not included in the model estimation. Figure 4*C* shows a distribution of *fv* values for this neuron. The performance is quite good for all spectral contrasts and sound levels, except for the cases with *fv* near 0 (arrow). These were all cases of 3-dB spectral contrast at a sound level near threshold where the rate responses were small and the threshold nonlinearity was not well fit by the model. Cases with 3-dB spectral contrast at higher reference levels gave good performance and lie in the peak centered near 0.8.

A second example of an LDWM is shown in Fig. 5. In this case the model was estimated from data collected with 3- and 12-dB spectral contrasts at only one reference level, so the range of levels over which the LDWM was estimated is small compared with the neuron in Fig. 4. Again, the weights are nonmonotonic with level, decreasing at the highest level. Clearly the quadratic model will not be able to fit this weight variation. In Fig. 5*B*, the performance of the LDWM in Fig. 5*A* is compared with that of three quadratic models in predicting data obtained with 3-, 6-, and 12-dB spectral contrast (in each case the quadratic model was fit to data at the same spectral contrast). At 3 dB, both models work well (*fv* = 0.77 for the LDWM and 0.87 for the quadratic model). At 6 and 12 dB, the LDWM does better than the quadratic model (at 6-dB contrast, *fv* = 0.66 LDWM and 0.20 quadratic; at 12-dB contrast, *fv* = 0.47 LDWM and 0.0 quadratic). An important point to note is that the LDWM does better than the quadratic model in predicting responses to the 6-dB stimuli, even though stimuli of 6-dB spectral contrast were not used in fitting the LDWM. The *fv* values given in the previous sentences are bootstrap averages and not the values for the examples in the figure.

### Relative performance of the LDWM and quadratic model

Over all the 21 neurons studied, the LDWM generally performed better at predicting rates for 12-dB spectral contrast, whereas the quadratic model generally performed better at 3-dB spectral contrast. Figure 6 shows the *fv* values for rate predictions at the two spectral contrasts with the two models. The *fv* values for the quadratic model's fits are shown along the abscissa; they are better for 3-dB contrast (**×** symbols, median 0.71) than for the 12-dB contrast (circles, median 0.26, significantly different *P* ≈ 6 × 10^{−6} by rank-sum). For the LDWM on the ordinate, the data have similar medians (0.49 for 12 dB and 0.46 for 3 dB, NS). However, it is better to compare *fv* values within a neuron; for this comparison, notice that the **×** symbols, which show data for 3-dB contrast, are mostly below the dashed line (41/58), meaning better performance for the quadratic model; the circles, for 12-dB contrast, are mostly above the line (26/38; different at *P* < 10^{−4} by χ^{2}), meaning better performance for the LDWM.

Finally in comparing prediction performance of the models directly, for 12-dB contrast the median performance of the LDWM (0.49) is significantly greater (*P* ≈ 0.0003 by rank-sum) than that of the quadratic model (0.26). On the other hand, for 3-dB contrast the median performance of the quadratic model (0.71) is significantly greater (*P* ≈ 0.0033 by rank-sum) than that of the LDWM (0.46). Thus the quadratic model does better for small contrast, whereas the LDWM does better for large contrast.

### Quadratic models derived from the LDWM

In a previous paper (Reiss et al. 2007) it was shown that the weights of the quadratic model are larger for responses to stimuli with 3-dB contrast (as in Fig. 1*B*) than with 12-dB contrast. If the LDWM is an accurate measure of the neuron's receptive field, then it should predict this change in weight amplitude with spectral contrast. A test of this idea is to fit the LDWM to a set of rate responses to RSS stimuli, compute artificial rate responses from this model for 3- and 12-dB RSS stimulus sets, and then fit the quadratic model to the two sets of artificial data. Figure 7 shows this calculation for the same neuron as in Fig. 4. Quadratic model weights computed from the actual and model data are compared; results from the 3-dB contrast are in the top half of the figure and for 12-dB contrast in the bottom half. The first-order weights computed from actual rate data (dotted lines in Fig. 7, *A* and *C*) and model data (solid lines in Fig. 7, *A* and *C*) agree qualitatively in that they have the same excitatory and inhibitory regions, although the weight values often differ by >1 SD. The two kinds of weights also change shape in the same way as stimulus level changes (indicated by different colors). More important, as with actual data, the weights computed from the model data are significantly larger for the 3-dB contrast (*top row*) compared with the 12-dB contrast (see following text). The second-order weights (Fig. 7, *B* and *D*) are compared at three sound levels and show a similar qualitative agreement. Note that the on-diagonal second-order weights (terms *m*_{jj}) are better reproduced than the off-diagonal weights. Apparently the effects of the off-diagonal weights are small, although they may account for some of the difference between the data and model in Fig. 7, *A* and *C*.

For this neuron, the quadratic model weights (Fig. 7) are similar in shape to the cross-sectional weights of the LDWM (Fig. 4*B*). Note, however, that the magnitude of the weights for the LDWM cannot be directly compared with the weights of the quadratic model. The quadratic model's weights are always expressed with reference to the all-0-dB stimulus of the RSS stimulus set. However, for the LDWM, the weights are expressed with respect to a particular fixed reference level for all stimuli. As explained in methods, the two reference levels are not necessarily the same. Changing the LDWM's reference will change the magnitude of the weights of the LDWM.

## DISCUSSION

### Variation in weights with sound level

The LDWM differs from previous STRF and weight-function models by explicitly accounting for the sound levels of the frequency components of a stimulus when calculating a weighted sum across frequency. As such it provides a method of evaluating the importance of stimulus level in the formulation of auditory receptive-field models. The results across a population of DCN type IV neurons show a nonmonotonic variation of weights with sound level in which weights first increase with level and then decrease rapidly and become negative, or approach negative values, at levels a few 10s of dB above threshold (Figs. 2, 4, and 5). It is important to note that the variation of weights with level is different for different frequencies and thus cannot be represented by a separable model of frequency and level tuning. A test of separability done by performing a singular value decomposition (SVD) on the matrix of LDWM weights (Sen et al. 2001) indicates inseparability in frequency and level.

The nonmonotonic behavior of LDWMs seems to be sufficient to account for the apparent nonlinearity of DCN type IV neurons (Reiss et al. 2007; Yu 2003; Yu and Young 2000) observed with RSS stimuli or with natural spectra such as head-related transfer functions, which show a similar 10- to 20-dB range of variation of component stimulus levels as the RSS stimuli (Musicant et al. 1990; Rice et al. 1992). In both cases the quadratic model does a poor job of predicting responses to stimuli with approximately 12-dB spectral contrast (Fig. 6). The quadratic model assumes a linear dependence of weights on level (*Eq. 3*), which is inadequate to fit the weights plotted in Figs. 2, 4, and 5. As a result the calculation of weights for such neurons represents an averaging between larger weights for small stimulus-level deviations and smaller weights for large deviations. The result is a poorly fitting quadratic model that is a compromise between the small- and large-deviation regimes causing weights to be smaller for 12- than for 3-dB spectral contrast. For stimuli with 3-dB spectral contrast, the quadratic model is adequate because the small level-deviations fall within a range where the weights are linearly dependent on level.

The considerations in the previous paragraph do not explain why the quadratic model does better than the LDWM at 3-dB spectral contrast (Fig. 6). Presumably the quadratic model includes important interactions between frequencies (the terms involving *m*_{jk} for *j* ≠ *k* in *Eqs. 1* and *3*) that are not included in the LDWM explicitly. At 12-dB contrast, the effect of sound level is the dominant effect and the LDWM does better, but at 3-dB contrast the effect of sound level is smaller and is well approximated by the quadratic model, so the interaction among frequencies becomes the important effect.

### Comparison with tone response maps and implicit cross-frequency interactions

The dimensions of the LDWM, frequency and level, match with those of the classical receptive field characterization, i.e., tone response maps (e.g., Fig. 1*A*). In spite of this match the two are fundamentally different characterizations. The LDWM gives the sensitivity of the neuron to changes in stimulus energy in a particular frequency bin at a particular energy level in a broadband sound. Such sensitivity cannot be derived using tones. The differences between the spectral weighting functions (Fig. 1*B*) and the tone response map profiles (Fig. 1*A*) clearly underline this fact. Tone response maps provide information about the sensitivity of the neuron to narrowband stimuli in the absence of energy in surrounding frequencies. Thus the independence of frequency channels shown in Fig. 3 should not be taken to mean that the LDWM can be derived using tones or narrowband stimuli separately in each frequency channel. Nelken and colleagues (1994a,b) investigated a similar question using multiunit activity in auditory cortex. They found that the responses were determined mainly by the single-tone tuning with strong modulation by two-tone interactions and only weak modulation by additional tones. Thus tone and broadband tuning are expected to differ substantially, as in Fig. 1, with most of the difference occurring in the transition from one frequency component to two.

Furthermore, the nature of the stimuli (broadband) used to derive the LDWM creates implicit frequency interactions. The LDWM weight in a particular frequency bin at a particular level is the sensitivity of the neuron to changes in stimulus energy in that bin and level when energy in all the other bins is at the average energy level of all the stimuli. Thus an average cross-frequency interaction is present in the LDWM model and this interaction effectively changes with reference level. As a result, the second-order models derived from the LDWM (Fig. 7, *B* and *D*) have weak cross-frequency terms in spite of having no such explicit interactions in the model. Frequency response maps on the other hand are devoid of any such interactions.

### Contrast and luminance dependence in vision

A considerable amount of literature exists that addresses the issue of contrast and luminance gain control in the early visual system (Bonin et al. 2006; Mante et al. 2005; Shapley and Victor 1981; Zaghloul et al. 2005). Luminance adaptation occurs at the level of the retina, and contrast gain control is present in the retina and enhanced in later stages. In all these cited studies, gain control has been studied in terms of a spatiotemporal or temporal filter, and not just spatial filters that are analogous to the frequency weighting functions in the present study. Contrast gain control in vision also shows similar results in that the sizes of the filters change with stimulus contrast, whereas similar shape is maintained. Additionally, these filters have shorter integration times with increased contrast. However, unlike our study, prediction performance of the linear filters (followed by a static nonlinearity) usually remained equally good at all contrast sizes; for example, in the LGN, mean explained variances are consistently >70% (Bonin et al. 2006). Thus even in early vision, although fairly linear, contrast gain control requires determination of the filters for each different contrast and mean luminance to obtain a predictive model. Thus finding a single predictive model for all contrasts together requires a luminance-dependent weight model much like the LDWM. However, independence of contrast and luminance gain controls (Mante et al. 2005) suggests other ways of combining data from different luminance and contrasts into a single model. Such independence of contrast and sound level is absent in DCN type IV neurons.

### Implications for STRFs

Because the STRF, minus its temporal component, is similar to the first-order weight function of the quadratic model (Young et al. 2005), the results shown here imply that STRFs should also depend on stimulus contrast and stimulus level. In addition, the predictive ability of STRFs should improve for stimuli with lower contrast. STRFs derived from stimuli with “natural” contrasts, on the scale of 12 dB, may reflect a compromise process similar to that postulated earlier for weights, providing an explanation for the poor prediction performance of STRFs (Machens et al. 2004) and the stimulus dependence of STRF shape (Theunissen et al. 2000; Valentine and Eggermont 2004). STRFs represent a linear function of the stimulus parameters, but those parameters are obtained with a nonlinear transformation (energy or envelope) of the stimulus (Escabi and Read 2003; Theunissen et al. 2000), like dB energy in the present study. It may be possible to improve the prediction performance of STRFs by properly choosing the nonlinear measure, as in the square-law function in models of complex cells in visual cortex (Carandini et al. 2005). However, it is doubtful that such a strategy could capture both the nonmonotonicity of the LDWM and the benefits of frequency interactions, demonstrated at low spectral contrast. Finally, this discussion is based on the spectral nonlinearity alone. It may be that nonlinearities are present in temporal interactions as well, which could additionally affect performance of STRFs.

### Sources of nonlinearity in the DCN

The nonlinear behavior typical of DCN neurons is not seen in the inputs to the DCN from the auditory nerve (Young and Calhoun 2005) nor in neurons of the VCN (Yu 2003; Yu and Young 2000). Thus the nonlinearity of DCN principal cells is a computational property of its interneuronal circuits. Nonmonotonicity of rate responses across sound level is a defining feature of DCN principal-cell (type IV) responses (Spirou and Young 1991) and has been attributed to inhibitory inputs from so-called type II interneurons, vertical cells (Voigt and Young 1990). Previous analyses of DCN nonlinearity led to the conclusion that nonlinear responses are observed in DCN principal neurons for stimuli that activate the type II interneurons (Nelken and Young 1997; Nelken et al. 1997). Although these interneurons project to the VCN (Ostapoff et al. 1999; Wickesberg and Oertel 1988; Zhang and Oertel 1993) and inhibitory responses are seen in VCN neurons in vivo (e.g., Caspary et al. 1994; Ingham et al. 2006; Kopp-Scheinpflug et al. 2002), inhibitory effects seem to be weaker in VCN and specific effects of type II inhibition in VCN have not been identified in vivo. Thus the nonmonotonicity of type IV responses produced by inhibitory inputs from vertical cells remains the most likely source of the nonlinearity of the DCN output representation. The role of other inhibitory inputs (Davis and Young 2000; Reiss and Young 2005) is not clear.

### Different degrees of edge sensitivity in type IV neurons: predictions of the LDWM

A previous paper (Reiss and Young 2005) identified three classes of DCN type IV neurons according to their sensitivity to steep rising spectral edges, such as the lower-frequency edge of a noise band or the upper-frequency edge of a noise notch. These three groups were generally characterized by different responses to broadband noise, as seen in rate versus level functions. Figure 8, *A*–*C* shows rate–level functions for three neurons studied here whose properties correspond to the groups defined by Reiss and Young. All three neurons have nonmonotonic tone rate–level functions (solid lines) that are typical of type IV neurons; the noise rate–level functions (dashed) define the three groups, as subsequently described. The corresponding LDWMs are plotted in Fig. 8, *D*–*F*.

The LDWM was used to predict the (unknown) responses of these three neurons to a broadband noise with a 30-dB notch (or stopband) positioned at various frequencies relative to the neuron's BF; the predictions are plotted in Fig. 8, *G*–*I*, which plot discharge rate versus the upper-frequency edge of the notch, as in Reiss and Young. The example in Fig. 8*A* has a weak response to noise, typical of the largest group of type IV neurons, which give a peak of discharge rate when the rising edge of the notch is aligned on BF. This is what the LDWM predicts for this neuron (Fig. 8*G*). The second example, in Fig. 8, *B*, *E*, and *H*, has very strong noise responses; these neurons give inhibitory responses to the notch without the rate peak when the edge is at BF (Fig. 8*H*). Finally, the third example (same neuron as in Fig. 4) has strongly nonmonotonic noise responses (Fig. 8*C*) and an excitatory response to the notch (Fig. 8*I*). In the 21 neurons studied, the LDWM predicted a rate peak at the spectral edge in 13 cases, notch inhibition in four cases, notch excitation in two cases, and a rate peak at the falling spectral edge in two cases. This distribution is similar to that reported previously (Reiss and Young 2005). These responses to notch sweeps are usually not well predicted by quadratic models (Reiss 2005).

The differences between the three LDWMs in Fig. 8 are subtle, involving the depth and strength of inhibitory inputs. These results illustrate that small differences in the receptive field can lead to large differences in responses to properly chosen stimuli. The fact that the LDWM shows differences in responses to edge stimuli that correspond to those predicted for these neurons provides support for the usefulness of the LDWM.

### Advantages and limitations of the LDWM

The LDWM has the drawback of all weighting-function models relative to STRFs, in that it does not contain information about time-domain responses. Nevertheless, it is an efficient way to characterize the spectral characteristics of a neuron. Although the number of stimuli used in deriving the LDWMs in this study was usually about 1,000, such models can be derived from far fewer stimuli. In fact, an advantage of the LDWM over a series of quadratic models covering the same range of sound levels is that the latter requires more stimuli because more parameters must be estimated. Of course, this efficiency occurs because the LDWM lacks explicit interactions between frequencies. Inspection of Fig. 7, *B* and *D* shows that the second-order weights derived from model data have smaller interactive terms (off-diagonal elements) than those from the actual data. Interactions across frequency can be explicitly added to the LDWM, at a cost of many additional parameters.

One alternative to models like the LDWM is a network model based on the hypothesized organization of the DCN. Network models replicate many aspects of DCN type IV responses (Hancock and Voigt 1999; Reiss and Young 2005; Zheng and Voigt 2006), although estimating parameters of the network from principal cell responses suffers from a lack of uniqueness. Further because of incomplete understanding of the circuitry and properties of the circuit elements in the DCN (Davis and Young 2000; Reiss and Young 2005) and higher auditory nuclei, such models require additional assumptions. Systems models like the LDWM do not suffer from such problems.

## GRANTS

This work was supported by National Institute on Deafness and Other Communication Disorders Grants DC-00115, DC-05211, and DC-00441.

## Acknowledgments

We thank S. M. Chase and B. J. May for comments on the manuscript and M. J. Anderson for help with the surgery.

Present address of L. Reiss: Department of Speech Pathology and Audiology, University of Iowa, Iowa City, IA 52242.

## Footnotes

The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “

*advertisement*” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

- Copyright © 2007 by the American Physiological Society