## Abstract

We characterize primary auditory cortex (AI) units using a neural model for the detection of frequency and amplitude transitions. The model is a generalization of a model for the detection of amplitude transition. A set of neurons, tuned in the spectrotemporal domain, is created by means of neural delays and frequency filtering. The sensitivity of the model to frequency and amplitude transitions is achieved by applying a 2-dimensional rotatable receptive field to the set of spectrotemporally tuned neurons. We evaluated the model using data recorded in AI of anesthetized ferrets. We show that the model is able to fit the responses of AI units to variety of stimuli, including single tones, delayed 2-tone stimuli and various frequency-modulated tones, using only a small number of parameters. Furthermore, we show that the topographical order in maps of the model parameters is higher than in maps created from response indices extracted directly from the responses to any single stimulus. These results suggest a possible ordered organization of a simple rotatable spectrotemporal receptive field in the mammalian AI.

## INTRODUCTION

Fast frequency and amplitude transitions are important features of the auditory input, and many studies have demonstrated the sensitivity of the auditory system to these transients (e.g., for amplitude transitions: Eggermont 1993; Kitzes et al. 1978; Phillips 1988; Rees and Møller 1983; Schreiner and Langner 1988a; Suga 1971; and for frequency transitions: Heil et al. 1992a-c; Kowalski et al. 1995; Mendelson et al. 1993; Nelken and Versnel 2000; Phillips et al. 1985; Schreiner and Mendelson 1990; Shamma et al. 1993; Tian and Rauschecker 1994, 1998).

Many physiological studies suggest that both the timing and the strength of neural responses for amplitude and frequency changes are correlated with the dynamics of the physical change (Barth and Burkard 1993; Heil 1997a,b; Heil and Irvine 1996, 1997, 1998a,b; Nelken and Versnel 2000; Phillips 1988, 1998; Phillips and Burkard 1999; Phillips et al. 1995). These results led us recently to propose a neural model for the detection of amplitude transients. The basic operation of this model is the calculation of the smoothed time derivative of the log-compressed envelope of the stimulus (Fishbach et al. 2001). In that model, a standard neural representation of the auditory input is being progressively delayed by a sequence of neurons that form a “delay layer.” The time-derivative computation is implemented as a weighted sum of the activity along the delay layer, with weights that form an ON-OFF receptive field. In that study we have shown that the model is able to reproduce and predict physiological responses to amplitude transients collected at multiple levels of the auditory pathways using a variety of experimental procedures. In addition, we have demonstrated the ability of the model to reproduce the effect of amplitude transients on several psychoacoustical phenomena.

To account for neural responses to frequency transitions we add, in the present work, a spectral dimension to the temporal delay layer by forming an array of neurons, which differ in their best frequencies and temporal delays. The sensitivity of the model to frequency and amplitude transitions is achieved by applying a 2-dimensional receptive field to the spectrotemporally tuned neurons. The receptive field, which in its basic form is a separable function of a Gaussian in the spectral dimension and a 1st-order derivative of a Gaussian in the temporal dimension, can be rotated in the spectrotemporal plane. We compare the model responses to published responses of primary auditory cortex (AI) units to a variety of stimuli such as tone bursts, 2-tone complexes, and linear and exponential FM sweeps. Our results suggest that the model is able to fit the responses of AI units to these stimuli, adjusting only 4 parameters for each unit. Moreover, we demonstrate that the model parameters have a more ordered topographical organization than the standard parameters that are extracted from responses of the units to a single class of stimuli.

These results, along with the topographical order that is being revealed by the 4 parameters of the model, suggest that a simple well-ordered rotatable spectrotemporal receptive field (STRF) may capture some fundamental aspects of the computation performed by neurons in the mammalian AI.

## METHODS

### Neural model principles

The model is a generalization of a model for the detection of amplitude transitions, which is described in detail elsewhere (Fishbach et al. 2001). In short, the basic operation of that model is the calculation of the first-order time derivative of the log-compressed envelope of the stimulus. To be able to compute the time derivative, we hypothesize the existence of a temporal delay dimension, along which the stimulus is progressively delayed. We construct the delay dimension using a layer of neurons with ascending membrane time constants (τ_{m}); each neuron is modeled by a standard version of the integrate-and-fire model (I&F). Our I&F makes use of a kernel function in the form 1 where *x* represents the time elapsed since the occurrence of a synaptic input. The kernel function, when convolved with the neuron's presynaptic input, determines its postsynaptic potential (Gerstner 1999a). Higher membrane time-constant values induce greater delay in the neuron's response (Agmon-Snir and Segev 1993). The output of the delay layer neurons converges to a single output neuron by a set of connections with various efficacies that reflects a receptive field of a Gaussian derivative. This combination of excitatory and inhibitory connections embodies the operation of time-derivative computation.

In the current model, we add a spectral dimension to the temporal delay layer by forming an array of neurons, which differ in their best frequencies and temporal delays. Accordingly, the one-dimensional (1D) temporal receptive filed is replaced by a 2-dimensional (2D) STRF.

Figure 1 presents a schematic diagram of the model and the flow of data along the different model components. Exact implementation details are given in the appendix. An example of a tone burst with linear ramps, which was one of the inputs tested, is displayed in Fig. 1*A*.

neural representation. The neural representation (NR) (Fig. 1*B*) is a simplified approximation of the expected excitatory representation of sound by neurons of a subcortical auditory station; e.g., type II and type III neurons in the ventral cochlear nucleus (Young and Voigt 1982) or type V and type I neurons in the inferior colliculus (IC) (Ramachandran et al. 1999). This representation is generated using a simple preprocessing stage that includes band-pass filtering, demodulation to extract the temporal envelope, nonlinear compression, and low-pass filtering. A bank of 81 band-pass filters are used; the filters' center frequencies *f _{i}* are logarithmically spaced around the middle frequency (

*MF*) of the model, which is adjustable. The frequency response magnitude of the

*i*th band-pass filter in dB is given by 2 where

*T*determines the filter bandwidth. The low-pass filtering is achieved by convolving the log-envelope with an alpha kernel function (

*Eq. 1*) with a time constant (τ

_{m}) in the millisecond range.

frequency-delay layer. The preprocessed input is fed to the frequency-delay layer of the model, which consists of an array of neurons *U _{i,j}* with ascending characteristic frequencies (

*f*) in one dimension, and ascending membrane time constants (τ

_{i}_{j}) in the 2nd dimension. Each unit

*U*receives input from only one NR filter

_{i},j*N*, and represents a population of neurons with identical characteristics.

_{i}*U*is modeled as an analog variable by convolving the neuronal representation

_{i,j}*N*(

_{i}*t*) with a kernel (Gerstner 1999b) whose time constant is τ

_{j}. I&F kernel functions and membrane time-constant values are shown for several units (Fig. 1

*C*). The membrane potential of each neuron in the frequency-delay layer is then saturated using a sigmoidal function 3 where

*F*

_{max}is the maximal instantaneous output firing rate (225 spikes/s) and

*D*is a scaling factor that determines the dynamic range of the transformation.

receptive field. The frequency-delay layer neurons are connected to a single neuron using inhibitory and excitatory connections with various efficacies (Fig. 1*D*) that reflect the receptive field shape. For each frequency channel, the receptive field has the same shape as the receptive field of the purely temporal model (Fishbach et al. 2001), that is, an on-off weighting of the delayed versions of the stimulus envelope, implementing a derivative operation on the temporal envelope. However, the delays over which the derivative operation is performed are shifted successively along the delay dimension as a function of frequency. This shift is modeled by the term *R*_{τ} (*Eq. 4*). Finally, the results from each frequency channel are summed up together with weights that depend on the distance (in octaves) from the MF of the receptive field 4 where and where μ is the mean delay of the receptive field, σ_{f} is the frequency integration bandwidth, σ_{τ} is the width of the delay time window, and α is the slope of the frequency-dependent shifts along the delay dimension (in octaves/ms) that produce the sensitivity to the direction of frequency-modulated (FM) tones. We use this shearing transformation of the receptive field instead of a simple rotation transformation because the latter would destroy the temporal characteristics of the STRF in each individual frequency channel. However, because this transformation resembles a rotation transformation in the context of our receptive field, α is termed throughout this study as the pseudo-rotation parameter. Note that the *MF* of the model is not necessarily equal to the best frequency (BF) of the model, as defined by the frequency of a tone burst that yields the lowest intensity threshold; high values of α and σ_{f} may yield a BF that significantly deviates from the STRF's middle frequency.

edge-detector neuron. The model's output neuron (Fig. 1*E*) is a single I&F neuron with a noisy integration and membrane time constant τ_{ed} (Gerstner 1999a). Because the model demonstrates sensitivity to sharp transitions in amplitude and frequency, the model's output neuron is referred to as an edge-detector neuron. PARAMETERS OF THE MODEL. The responses of the model are adjusted to fit the response of a specific neuron by adjusting 4 parameters. Two of the parameters relate to the STRF: α, the pseudorotation parameter, and σ_{f}, the frequency integration bandwidth (*Eq. 4*). The other 2 parameters are the slope (*T*) of the symmetrical band-pass filter of the neural representation *N _{i}* (

*Eq. 2*), and

*D*, the scaling factor of the delay layer saturation transformation (

*Eq. 3*). In addition, the middle frequency of the model (

*MF*) and the threshold of the edge-detector neuron were varied. However, these parameters were not manipulated independently; instead, their values were always set to best approximate the threshold and the BF of the neuron that was fitted after the values of the 4 adjustable parameters are set. There are 8 additional fixed parameters of the model, 4 of which are parameters of the I&F model. These parameters and their values are listed in full in the appendix. Their specific values have only minor or redundant effect on the responses of the model.

### Simulation paradigms and analysis

We evaluated the adequacy of the model to match the responses of AI units by matching the responses of the model with the reported responses of a neuron to a variety of stimuli, using a single set of parameters per neuron. We used the data of Nelken and Versnel (2000), who recorded multiunit clusters in the AI of barbiturateanesthetized ferrets. To attain a reliable comparison between the experimental and the simulated results, we simulated the responses of the model to the same set of stimuli and used the same data analysis methods as in Nelken and Versnel (2000).

stimuli. The stimuli are described in detail by Nelken and Versnel (2000). In short, pure tones, of 200-ms duration and 10-ms rise/fall time, were used to characterize the BF of the model (in octaves relative to the model *MF*), and to obtain a rate-level function at the BF. Two-tone stimuli consisted of a 1st tone, which varied in frequency, followed 50 ms later by a 2nd tone at BF and intensity of 20 dB above the model BF threshold; both tones were of 200-ms duration and 10-ms rise/fall time. This paradigm was used to estimate the excitatory and inhibitory response regions of the model.

Nelken and Versnel used 6 different FM stimulation paradigms in 3 pairs. Within each pair, the fine structure of the rate of change of frequency was manipulated, being either linear with frequency (linear FM sweeps) or linear with log frequency (logarithmic FM sweeps, sometimes also called exponential FM sweeps in the literature). Between pairs of FM paradigms, the coarse structure of the frequency trajectory was manipulated. The 1st FM frequency trajectory started with a fixed low frequency followed by an upward frequency sweep to a fixed high-frequency, which was then followed by a downward sweep back to the low frequency. The sweep range was typically 4 octaves (oct) and its center frequency was chosen around the BF. The total stimulus duration was 500 ms. These paradigms were called by Nelken and Versnel LH-lin (for the linear version) and LH-log (for the logarithmic version). In the 2nd pair of FM paradigms, the stimulus started at the high frequency, swept to the lower one, and then up again. These paradigms were annotated as HL-lin and HL-log. In the last pair of FM paradigms, the stimulus started at the low frequency for 10 ms and than swept up to the high frequency on one trial and on the next trial the stimulus started at the high frequency for 10 ms and then swept to the low frequency. Thus there was a period of silence of about 1 s between the upward and downward sweeps. These paradigms were annotated as N-lin and N-log for the linear and logarithmic versions, respectively. For all the FM paradigms, 10 FM velocities were used, equally distributed between 0.2 and 2 kHz/ms for the linear variants, and between 30 and 300 oct/s for the logarithmic variants.

The responses of the model to the LH and the HL paradigms used by Nelken and Versnel (2000) are identical, which is inconsistent with the experimental results. This is attributed to the fact that the model integrates stimulus information over a time window of at most a few tens of milliseconds, and therefore responds to a frequency transition regardless of the stimulus history that precedes its integration time window. In contrast, cortical activity seemed to undergo a fast adaptation during the stimulation period, presumably because the upward and downward sweeps were presented as part of a continuous sound in these paradigms. Because of this discrepancy between the experimental and the simulated results, the model was tested with N-log and N-lin stimuli only. In cases for which these data were missing, we used the neural responses for the upward sweep of the LH stimulus and the responses to the downward sweep of the HL stimulus. This method of analyzing responses to FM of similar shape was also used by Heil et al. (1992a).

data analysis. The data analysis methods are described in full in Nelken and Versnel (2000), and are reviewed here only briefly. Two-tone responses were summarized as the *M* index, measuring the asymmetry of the side-band inhibition (Shamma et al. 1993) where *R*_{>BF} and *R*_{<BF} are the total number of spikes elicited by both tones when the 1st tone is above BF and below BF, respectively.

FM responses were summarized using 2 indices; the 1st is the directional sensitivity (Shamma et al. 1993) where *R*_{↑} and *R*_{↓} represent the total number of spikes evoked by upward and downward sweeps, respectively.

The second FM index is the velocity sensitivity, which is the weighted average of all the FM velocities, each weighted by the total number of spikes it evoked where *x* runs over all velocities tested, and *R*(*x*) is the total number of spikes evoked by the sweep at that velocity. The directional and velocity sensitivity for the log FM stimuli are annotated as *C-log* and *V-log*, respectively, and for the linear FM stimuli they are annotated as *C-lin* and *V-lin*.

Standard deviations of response indices were estimated using bootstrap by sampling with repetitions the responses to each experimental condition. The resulting fictive data sets were summarized using the response indices (*M, C*, and *V*). This process was repeated 100 times. The SD for each index was derived from the 25-75% percentile interval of the bootstrap distribution by comparing this interval to the 25-75% percentile interval of a normal distribution with SD = 1. This indirect method was used to overcome possible effects of outliers of the bootstrap distribution.

A brute force search was used to find the parameter set (α, σ_{f}, *T, D*) that minimizes the target functionϕ, given by where ϕ_{R} measures the fit error of the model with regard to the raw responses and is given by where and are the total number of spikes elicited by the *k*th stimulus by the model and experimental multiunit cluster, respectively; and *N _{k}* is the number of different experimental conditions used for the

*k*th stimulus. ϕ

_{I}measures the model's fit error with regard to the response indices (

*M, C, V, C-lin, V-lin*) and is given by where and are the model and experimental response indices, respectively, and

*a*is the SD of . Sensitivity analysis showed that the shape of the target function around its minima was reasonably compact, leading to well-defined parameter values.

_{k}Functional maps of the distribution of experimental response indices and of the model parameters on the cortical surface were produced as follows. A Voronoi tessellation of the cortical surface around the penetration points was performed and each of the polygons was shaded according to the value of the point in its center.

The tendency of the experimental response indices and of the model parameters to cluster was measured using a procedure first suggested by Nelken and Versnel (2000) and improved in Rotman et al. (2001). For every pair of penetrations, the topographical distance (*d _{i}*) and the absolute difference in their parameter values (

*p*) were calculated, thus forming a sequence of

_{i}*n*× (

*n*- 1)/2 pairs

*S*= {

*d*,

_{i}*p*}, where

_{i}*n*is the total number of penetrations. Next, the pairs were divided into 10 overlapping subsets

*S*,

_{k}*S*

_{1}⊂

*S*

_{2}⊂... ⊂

*S*

_{10}=

*S*, such that

*S*contains all the pairs for which

_{k}*d*is smaller than or equal to the

_{i}*k*× 10% percentile of all the distances {

*d*}. We denote by {, } pairs such that {, } ∈

_{i}*S*.

_{k}When a map is highly clustered, it is expected that for small values of *k*, {} will be small on the average (except for possible large values that may originate at cluster borders), whereas for large values of *k*, {} will consist of both small and large values. To quantify this, the 80% percentile of {} is calculated for each *k*. The 80% percentile is used because it captures the behavior of the majority of {} but is not corrupted by occasional large values of {} expected at cluster borders. Confidence intervals for this measure were calculated using bootstrap, by randomly distributing the parameter values between the penetrations so that no clustering was possible except by chance.

Statistical tests are considered as significant for *P* < 0.05. In some cases, exact *P* values are reported as a rough indication for the strength of the results.

## RESULTS

### Responses to tone bursts and comparison with the 1D model

The main new features of the 2D model, which differentiate it from the 1D model, are the adjustable parameters of the model's STRF, α and σ_{f}. For the STRF in its basic form (α = 0) the responses of the 2 models to any stimulus with stationary spectra are comparable. This implies that the 2D model is able to reproduce the responses of the 1D model to amplitude transitions for STRF with α = 0. Because we use a pseudorotation that maintains the temporal characteristics at each frequency channel of the STRF (see *Eq. 4*), the 2D model reproduces the responses of the 1D model to amplitude transitions of spectrotemporally separable sounds for any value of α.

### Responses to frequency sweeps

The model is capable of reproducing many of the physiological characteristics of responses of AI neurons to linear and exponential FM sweeps. In particular, the model is capable of showing various degrees of sensitivity to the velocity and to the direction of the FM sweep.

Figure 2 illustrates the way the model responds to FM sweeps and the effect of the STRF on the directional sensitivity of the model. The frequency trajectory of an upward N-log FM sweep is shown in Fig. 2*A*. For clarity, we consider a simplified configuration of the model, which consists of 2 NR filters (Fig. 2*B*), 2 frequencies by 3 delays (2 × 3) frequency-delay layer (Fig. 2*C*) and accordingly a 2 × 3 STRF (Fig. 2*D*). The best frequencies of the lower and upper NR filters are indicated in Fig. 2*A* by the lower and upper dashed lines, respectively. Given that the simplified NR filters are identical except for a translation on a logfrequency axis, their responses to the stimulus are identical except for a time shift that corresponds to the frequency shift between the center frequencies of the 2 filters divided by the FM velocity (Fig. 2*B*).

The simplified frequency-delay layer consists of 2 series of neurons with time constants of 3, 5, and 7 ms (Fig. 2*C*); each series is fed with the output of one NR filter. Figure 2*D* illustrates 3 different STRFs with different α values; α = 0 (STRF1), α > 0 (STRF2), and α < 0 (STRF3). Note that the STRFs for α ≠ 0 are not rotations of the STRF with α = 0: such rotations would cause the weights in all the “corners” of STRF2 and STRF3 to be nonzero. Instead, the weights for the 2 frequency channels are shifted with respect to each other, such that for α > 0, higher-frequency channels have weights shifted to larger delay values, whereas for α < 0, higherfrequency channels have weights shifted to smaller delay values. The responses of neurons at the 2 center frequencies, but with different delays, are identical except for the time shift that is caused by the NR filters. Zero or positive α (STRF1 and STRF2) preserve or enlarge the time shift caused by the upward frequency change of the stimulus. On the other hand, negative α (STRF3) compensates for this time shift, aligning the time derivatives calculated in each frequency series of the delay neurons. This synchronization causes the membrane potential of the edge-detector neuron to peak at a higher value. Thus for negative α, the total area of the membrane potential above a high enough threshold (which is monotonically related to the expected number of spikes) will be larger (Fig. 2*E*). Although negative α generally yields model neurons that are sensitive to upward sweeps (*negative C* according to the definitions used here), the direction sensitivity of the model is a complex function of all parameters and, under some conditions, a downward-sensitive neuron can be generated with negative α. For example, a small value of *D*, the scaling factor of the delay layer saturation transformation (*Eq. 3*), combined with a narrow NR filter bandwidth (*Eq. 2*) and negative α, may yield a very short excitation of the edge detector in response to an upward FM sweep, generating only a small number of spikes. In such cases, a downward FM sweep will cause a more prolonged excitation period. Because of the refractoriness of the edge-detector neuron, such a set of parameters may produce more spikes for a downward than for an upward FM sweep.

An important property of the model, which is illustrated by Fig. 2, is that the detection of frequency and amplitude transitions uses the same basic computation, that is, calculating the time derivative of the envelope of a band-pass filtered version of the input. This computational principle may explain some reported relations between the responses of cortical neurons to frequency transitions and amplitude transitions. Phillips et al. (1985) recorded the responses of cat AI units to narrow excursion FM sweeps. Each sweep consists of a 200-ms tone of a fixed frequency with which the sweep started, a linear FM sweep of 2 kHz, and then a 400-ms tone of a fixed frequency with which the sweep ended. The center frequency of the sweep was varied over a 4-9 kHz range centered at the unit's BF. Phillips et al. report that the directional sensitivity of AI neurons to narrow excursion FM sweeps depends on the center frequency of the sweep. In particular, they found a good congruence between the directional sensitivity, measured at some center frequency, and the slope of the frequency response map, measured at that frequency. Figure 3*A* illustrates this finding, as replotted from Phillips et al. (1985). Figure 3*B* demonstrates that the model reproduces this phenomenon very accurately. In the barbiturate-anesthetized cats, most responses to pure tones occur at sound onset. The model generates these responses as derivatives of the instantaneous amplitude envelope of the tones, and therefore the frequency tuning curve measures the amplitude of the temporal derivatives in each frequency channel. The model accounts for the data of Phillips et al. (1985) by summing up such temporal derivatives in each frequency channel, thus linking the computations underlying the detection of frequency transitions and amplitude transitions in AI neurons.

### Responses to two-tone complexes

The model responses to 2-tone complexes are suppressed when the OFF-BF tone is close to the model's BF, in agreement with the responses of AI neurons to these stimuli. Moreover, the model is able to show a full range of asymmetrical responses with respect to the frequency of the OFF-BF tone. Several factors are believed to contribute to the 2-tone suppression phenomenon along the auditory pathway. At the level of the auditory nerve the 2-tone suppression phenomenon is thought to be related to nonlinear effects of the basilar membrane, whereas at higher levels of the auditory pathway neural inhibitory mechanism between neuron with adjacent BFs are thought to enhance the suppression effect.

The simplified neural representation of the model does not include nonlinearities that can account for 2-tone suppression, and hence the model's 2-tone suppression is formed elsewhere. In fact, the model supplies an alternative explanation to the enhancement of the 2-tone suppression by higher levels of the auditory pathways. Figure 4 illustrates the way the model gives rise to the 2-tone suppression phenomenon. Figure 4, *A, B*, and *C* sketch 3 different scenarios of 2-tone stimuli used by Shamma and his collaborates (Kowalski et al. 1995; Shamma et al. 1993) and by Nelken and Versnel (2000). In all 3 cases the BF tone (T2) starts 50 ms after the OFF-BF tone (T1). The frequency of T1 is far from BF in Fig. 4*A*, close to BF in Fig. 4*B*, and on BF in Fig. 4*C*. For simplicity, we consider a simplified configuration of the model in which the only significant input to the model comes from the NR filter at the model *MF* (very small σ_{f}; see *Eq. 4*). The output of this NR filter to the 3 different 2-tone combinations is plotted in Fig. 4, *D, E*, and *F*. In Fig. 4*D* the T1 tone has no effect on the output of the NR filter because its frequency is far from BF. The response of the model is therefore attributed only to the delayed T2 tone. When the frequency of T1 is near (but not at) BF, its effect on the NR filter output is larger but still attenuated (Fig. 4*E*). However, because of the log compression of the neural representation the amplitude envelope of the combined tones is almost the same as of that of T2 when presented alone. After the temporal differentiation performed by the STRF, the membrane potential of the edge-detection neuron will show 2 positive deflections (Fig. 4*H*), each smaller than the deflection caused by T2 alone (Fig. 4*G*). When the frequency of T1 is at BF, it is not attenuated at all by the NR filter, and because of the log compression of the NR, T2 has almost no effect on the output of the NR filter (Fig. 4*F*). This causes the model responses in this case (Fig. 4*I*) to be dominated by the responses to the T1 tone. These are similar to the model responses when the T1 frequency is far from BF (Fig. 4*G*), except for an advance in latency of 50 ms. Figure 4*J* shows a schematic diagram, replotted from Shamma et al. (1993), that shows typical responses of an AI neuron to such 2-tone stimuli. The recorded spikes are attributed to T2 or T1 according to their latencies (thin and thick lines, respectively). Taking into consideration that the simulated spikes are proportional to the integrated membrane potential of the edge-detector neuron above some positive threshold level (dashed line in Fig. 4 *G*-*I*), it is clear that the model follows very closely both the timing and the strength of responses of AI neurons to 2-tone complexes (compare the simulated responses to T2 and T1 in Fig. 4, *G*-*I* with the experimental responses at T1 frequencies pointed by the arrows in Fig. 4*J*). The description of the model responses to 2-tone complexes becomes more complicated when the STRF integrates NR filters other than the *MF* filter. In these cases, α plays a role in determining the asymmetry of the suppression with respect to T2 frequency, by synchronizing responses across NR filters in a way similar to the effect α has on the directional sensitivity of the model (Fig. 2).

### Evaluation of the model: fit of AI multiunit data

Nelken and Versnel (2000) recorded a total of 212 multiunit clusters in the AI of 6 ferrets. For the purpose of fitting the model responses to the cluster responses we considered only clusters whose responses to a sufficient stimuli set were recorded (single tones, 2-tone complexes, and at least one pair out of the 3 FM stimulation paradigms). After this screening process, we discarded data from 3 ferrets with <10 multiunit clusters per ferret. This left a total of 74 multiunit clusters from 3 ferrets (*experiments 162, 166*, and *168* of Nelken and Versnel 2000). The responses of each multiunit cluster to the entire set of stimuli (see methods) were fitted by adjusting the 4 adjustable parameters of the model. Figure 5 demonstrates the agreement between the experimental responses of one AI cluster and the simulated responses of the model with one set of parameters that best fitted that cluster. Although there may be differences in the overall strength of response between the experimental and simulated responses (Fig. 5*A*), the model follows the general pattern of the experimental responses and matches reasonably well the experimental response indices that summarized them (*M, C*, and *V*).

Figure 6 shows the ability of the model to fit the 74 multiunit clusters by comparing the simulated versus experimental response indices (*M, C*, and *V*) that summarize the responses to 2-tone complexes and to linear and logarithmic FM sweeps. The vertical error bars represent the SD of the experimentally derived indices as estimated by bootstrap (see methods). The SD for the simulated data are not shown because they are arbitrarily determined by the integration noise of the edgedetector neuron (Fig. 1*E*). The model is able to fit well all of the experimental indices except for the linear velocity sensitivity (Fig. 6*D*), for which the model shows a small but consistent undershoot.

The distribution of the 74 values of each of the 4 model parameters used to fit the 74 multiunit clusters is shown in Fig. 7. The histograms of σ_{f}, *D*, and *T* (Fig. 7, *A, C*, and *D*, respectively) are plotted on a log-scale, whereas Fig. 7*B* plots the histogram of the absolute values of α. The circles in each plot indicate the parameter values that divide each distribution to groups of equal size. The use of these values is described below. The σ_{f} and α parameters, describing the shape of the STRF, are unimodally distributed. The parameter *T*, which determines the frequency selectivity of the NR filters, is clearly bimodally distributed. The parameter *D*, which determines the saturation level of the delay neurons, also shows a tendency for 2 modes. The 4 adjustable parameters are pairwise uncorrelated except for *T* and σ_{f}, which are weakly correlated on a log- log scale (*r*_{72} = 0.26, *P* < 0.05).

### Correlation between response indices: experimental versus simulated data

Shamma et al. (1993) and Kowalski et al. (1995) recorded AI units in barbiturate-anesthetized ferrets and showed a weak but significant positive correlation between the *M* and the *C* response indices using 2-tone complexes and FM sweeps following the LH-log paradigm as defined by Nelken and Versnel (2000) (Fig. 8, *A* and *B*, respectively). Nelken and Versnel (2000) also reported a significant correlation between the *M* index and the *C* index measured using the LH-log paradigm, but this correlation was absent for *C* indices computed using the other FM paradigms. For the subset of experimental data used here the correlation between *M* and *C-log* is significant (*r*_{72} = 0.35, *P* < 0.0023), but the correlation was not signifi-cant for *C-lin*, reproducing the same pattern as that of the full data. The simulated response indices show a small but significant positive correlation for *C-log* (*r*_{72} = 0.39, *P* < 0.0006) and a weaker, although still significant correlation, for *C-lin* (*r*_{63} = 0.28, *P* < 0.05).

Kowalski et al. (1995) recorded from the ferret AI and anterior field (AAF) and tested the relations between the unit excitatory bandwidth at a level of 20 dB above threshold (*BW20*) and the velocity sensitivity of the unit to logarithmic FM sweeps. Kowalski et al. (1995) found a significant correlation between these indices at the AAF but not at the AI (Fig. 8, *D* and *E*). The scatter plots relating BW20 and V-log had a wedge shape, so that in both cases *BW20* had a limiting effect on *V-log*: at higher values of *BW20* the dynamic range of *V-log* is smaller than at lower values of *BW20*. A similar phenomenon was reported by Mendelson et al. (1993) in cat AI and by Heil et al. (1992a) in Field L of chicks, and was found in the subset of experimental data used here as well as in the simulated data.

We tested whether these relationships between the response indices can be attributed to the model, or whether these correlations are attributable to some other mechanism, and the model reproduces these phenomena by merely adjusting its parameters to best match the experimental results. For this purpose we selected representative values of the 4 adjustable parameters of the model (6 values for σ_{f}, 6 for α, 5 for *D*, and 4 for *T*) and constructed a canonical set of 720 model neurons by using all the possible combinations of these values. The representative values of each parameter were chosen to divide the parameter distribution (as a result of the fitting procedure of the 74 multiunit clusters) to sets of equal size (see Fig. 7). Each model neuron in the canonical set was presented with the entire set of stimuli and the response indices were calculated as described in METHODS. This method enables us to test the properties of the model while minimizing the influence of the particular choice of parameters and maintaining a reasonable distribution of the parameters. Figure 8, *C* and *F* show the scatter plot of the *M* versus *C-log* and of the *BW20* versus *V-log* response indices, respectively, for the canonical set. Clearly, the model reproduces the experimental phenomena regardless of the specific choice of parameters. Note that 118 out of the 720 parameter combinations in the canonical set yielded model neurons that had extremely narrow excitatory bandwidth and extreme asymmetry values (marked as crosses in Fig. 8, *C* and *F*). The significant correlations between the response indices in the canonical set was preserved when these nonphysiologically parameter combinations were excluded from the analysis.

The canonical set reproduced even finer features of the correlation structure of the experimental data. For example, when the canonical set was divided into 2 groups of equal size according to their velocity sensitivity *V-log*, the correlation between the *M* and *C-log* indices was higher for the group with the higher *V-log* values (*r*_{592} = 0.3 and *r*_{704} = 0.71, respectively). The same phenomenon held for the experimental response indices; the correlation between *M* and *C-log* for the low *V-log* group was relatively low (*r*_{35} = 0.27) and nonsignificant, whereas for the high *V-log* group the correlation was higher and significant (*r*_{35} = 0.47, *P* < 0.0018). It can be concluded that the model parameters form a correlation-free representation of the experimentally measured representation.

### Topographical organization: response indices versus model parameters

One of the main results of Nelken and Versnel (2000) is that the functional maps of the *C* index and to a smaller extent the *V* index are sensitive to the fine details of the FM stimulus, and that in many cases these maps do not show significant clustering. Because each multiunit cluster can be described not only by the experimental *C, V*, and *M* values, but also by the values of the 4 model parameters that best approximate it, we could plot topographical maps of the model parameters and compare them with the experimentally measured ones. Figures 9 and 10 show functional maps that are derived from the 2 experiments (*162* and *166*, respectively) for which the largest number of multiunit clusters were fitted by the model. Figure 9 compares the functional maps of the pseudo-rotation parameter α with the functional maps for the 3 experimental indices that are most correlated with it, *M, C-log*, and *C-lin*, as derived from *experiment 162*.

As already noted by Nelken and Versnel (2000), the maps for the directional sensitivity are sensitive to the stimulation paradigm. As a result, the *C-log* and *C-lin* functional maps are different from each other. Furthermore, the functional maps derived from these experimentally measured parameters did not show smooth topographical organization, as confirmed by the statistical clustering analysis described in METHODS. These results confirm the conclusions of Nelken and Versnel (2000). In contrast, the functional map derived from the rotation parameter of the model as fitted to each cluster shows a signifi-cant clustering (*P* < 0.05) along the cortical surface.

Figure 10 shows similar results for *experiment 166*. The functional maps derived from the *C-log* and *C-lin* indices differ, although sharing some common features (low *C* values at the posterior part and at the anterior-lateral edge of the mapped region). In contrast, the functional maps derived from the *V-log* and *V-lin* indices show much greater similarity, a phenomenon previously described by Nelken and Versnel (2000). None of the functional maps derived from the *M, C*, and *V* indices was significantly clustered at a significance level of 0.05; however, the *C-lin* and *M* maps showed borderline clustering (*P* < 0.075). As in *experiment 162*, the functional maps derived from the model parameters fitted to each cluster are much more organized. In fact, 3 (α, σ_{f}, and *D*) out of the 4 parameters of the model are significantly clustered on the cortical surface.

Figure 11 plots the probability of a map to be clustered by chance for the experimental and simulated response indices (derived for each location from the model fitted to the responses at that location) as well as for the model parameters and the linearly approximated model parameters (the latter are described below). The *P* values, calculated using the procedure described in METHODS, are also presented in Table 1. Figure 11 and Table 1 demonstrate that the number of significantly clustered maps for the model parameters outnumbers the signifi-cantly clustered maps induced by any of the other indices or parameters. In total, only 2 out of 15 maps computed for the 5 experimental response indices (*C-log, C-lin, V-log, V-lin*, and *M*) in the 3 experiments are significantly clustered (*V-lin* of *experiment 162*, and *M* of *experiment 168*). For the simulated response indices, 5 of the 15 maps are significantly clustered. Note that the simulated indices are already “smoothed” in the sense that all the data recorded from each cluster are considered in their calculation. The highest number of clustered maps, however, was found among the maps derived from model parameters: 8 out of the 12 maps computed for the 4 model parameters are significantly clustered.

The model parameters are the result of a nonlinear transformation from the response space to the model parameter space, which, as described in the previous section, decorrelates the experimental indices. The lack of analytical expression for this transformation impairs the ability to track down the origin of the improved topographical organization of the model parameters. In principle, there are 3 explanations as to why the model parameters show more clustering than the experimentally derived indices: across-cluster smoothing; within-cluster smoothing; or some real physiological process that is captured by the model and is not captured well by any of the experimentally derived indices.

The improved topographical organization of the model parameters with respect to the experimental indices does not arise from smoothing of the parameters across neighboring multiunit clusters. This is attributed the fact that the parameters of the model were separately fitted to the experimental data of each multiunit cluster, without any information regarding the topographical location of the cluster or its neighboring clusters.

The second explanation for the improved topographical organization of the model parameters is that the experimentally derived indices are in fact topographically organized, but noisy, resulting in an apparent loss of order. According to this explanation, the fitting procedure smooths the experimental response indices *within* each cluster (e.g., by averaging the *C* and *V* indices across the logarithmic and linear paradigms), causing the simulated response indices to become more topographically organized than the original response indices. Although the simulated response indices generally showed a more ordered topographical distribution than the experimental indices, this explanation is probably incomplete given that the topographical organization of the simulated response indices was worse than that of the model parameters (Fig. 11 and Table 1). To study this question further, we approximated the model parameters (fitted using the nonlinear procedure described above) using a linear regression on the measured response indices (*M, C-log, C-lin, V-log, V-lin*), resulting in a global linear approximation to the nonlinear fitting procedure. Withincluster smoothing would predict that the resulting “linearized” model parameters would also show significant clustering. The model parameters, fitted by the nonlinear procedure, were only moderately correlated with the linearized model parameters (α, *r*^{2} = 0.484; σ_{f}, *r*^{2} = 0.11; *D, r*^{2} = 0.357; *T, r*^{2} = 0.129). Most important, the linearized model parameters failed to produce significantly clustered maps (Fig. 11 and Table 1).

Table 1 also shows that the experiment for which only one model parameter is significantly clustered (*experiment 162*) is the experiment for which the model fits the experimental data less well than for the other 2 experiments. The fit error for the multiunit clusters of *experiment 162* are significantly higher than those of *experiment 166* (*t*_{55} = 2.27, *P* < 0.015) and from those of *experiment 168* (*t*_{34} = 3.05, *P* < 0.0022). The fit errors for the multiunit clusters of *experiments 166* and *168* do not differ significantly. Notably, *experiment 162* is the only experiment out of the 3 reported here that was not tested with the N-log and N-lin paradigms by Nelken and Versnel (2000). Therefore for all the multiunit clusters of *experiment 162* the model responses were matched with the neural responses for the upward portion of the LH stimulus and with the responses to the downward portion of the HL stimulus, as described in methods. The only difference between the N-log/N-lin paradigms and the first half of the LH/HL paradigms is the duration of the fixed tone at the beginning of the stimulus. For the N-log/N-lin this duration is fixed to 10 ms, whereas for the LH-lin/HL-lin it is set to 75 ms and for the LH-log/HL-log the duration varies in the range of 10-90 ms in accordance with the sweep velocity. As noted by Nelken and Versnel (2000), this difference by itself already gave rise to a measurable amount of adaptation in the neural responses to the following frequency sweep.

It is possible that the temporal proximity between the beginning of the stimulus and the start of the FM sweep makes the N-log/N-lin paradigms more suitable for the model fitting, in that the model lacks components that can account for adaptation effects. Therefore it is possible that the lack of N-log/N-lin data for *experiment 162* accounts for the modest ability of the model to fit the responses and to show significant parameter clustering for that experiment.

## DISCUSSION

We present in the current study a neural model for the detection of frequency and amplitude transitions. The basic computational operation of the model is a delayed integration of the amplitude time-derivative information across neighborfrequency channels. This computation is carried out by applying a rotatable 2D receptive field to a spectrotemporal representation of the acoustic information. The spectrotemporal representation is spanned by an array of neurons that differ in their best frequencies and temporal delays. We show that the model is able to reproduce the responses of AI neurons to single tones, 2-tone complexes, narrow-excursion FM transitions, and linear and logarithmic FM sweeps, as recorded from the barbiturate-anesthetized cat and ferret.

Although the model shows a good agreement with the data analyzed here, there are some aspects of the responses of cortical neurons that the model does not account for. Specifi-cally, AI neurons have longer time constants, as expressed for example in the relatively slow dynamics of STRFs measured using reverse correlation or moving ripple spectra (10-20 Hz; Depireux et al. 2001; Linden et al. 2003; Miller et al. 2002; Schnupp et al. 2001), or as expressed in the dependency of AI responses on the stimulus history at a time scale of about ±1 s (Brosch and Schreiner 1997; Calford and Semple 1995; Heil et al. 1992a; Malone et al. 2002; Nelken and Versnel 2000; Ulanovsky et al. 2003). These phenomena cannot be reproduced in the model because the model integrates acoustic input over a time window of ≤10 ms. Furthermore, to keep the model as simple as possible the model components are I&F neurons that lack mechanisms such as synaptic depression and facilitation. As a result, the responses of the model neurons depend on a relatively short stimulus history of about 10 ms. The success of the model as it stands suggest that multiple time constants coexist in cortical neurons. Whether including mechanisms with longer time constants, such as synaptic depression, to the model will adequately address this problem remains an open question (Ulanovsky et al. 2003).

Part of the motivation for our study stemmed from the assumption that transient information may be used by the auditory system as important cues for the task of primary segmentation of complex auditory scenes (Fishbach et al. 2001). Some psychoacoustical studies demonstrate that auditory perception is sensitive to the gradient of amplitude transitions and that a larger gradient enables easier separation of auditory components (Bregman et al. 1994; Turner et al. 1994; also see reanalysis of these studies in Fishbach et al. 2001). We presented in our previous study a neural model for the detection of auditory temporal transitions (edges) and demonstrated that the model is able to reproduce and predict various physiological and psychoacoustical responses to amplitude transitions.

The application of a rotatable receptive field to a 2D spectrotemporal auditory space, as presented in this study, is a natural generalization of the previously suggested temporal model. The 2D model studied here preserves the temporal properties of the 1D model and therefore it suggests a common unifying mechanism for a vast physiological and some psychoacoustical responses to acoustic transitions.

Although the model is abstract, all of its components are physiologically plausible. Even though the existence of a neural frequency-delay layer is not well substantiated, there is evidence that validates the use of such an auditory mechanism. Hattori and Suga (1997) report that the latency (ranging from 4 to 12 ms) of neurons in the IC of unanesthetized mustached bats is topographically organized orthogonally to the tonotopic organization of the IC, forming a frequency versus latency map. A similar organization of onset latencies in the cat IC was reported by Schreiner and Langner (1988b). Both the range of values and the topographic organization of the latency in the bat and in the cat IC are consistent with the model's frequencydelay layer. However, more research is needed to establish a direct link between these findings and the proposed model.

### Relationships to STRF models

During recent years, a number of methods have been proposed for constructing spectrotemporal receptive fields of auditory neurons, using synthetic (e.g., deCharms et al. 1998; Depireux et al. 2001; Miller et al. 2002) and natural (Sen et al. 2001; Theunissen et al. 2000) sounds. The STRF obtained using these techniques is the best linear model that approximates the responses of a neuron to the stimulus ensemble used for the estimation process. It is important to note that the STRF estimated using these techniques is very different from the STRF used in our model. The STRF in our model is being applied to the model's spectrotemporal representation of the acoustic stimulus. This representation is spanned by the frequency-delay neurons that have adjustable bandwidth and their output undergoes 2 compressive nonlinearities (*Eqs. 2* and *3*). Because of these nonlinearities, an STRF of a synthetic neuron simulated by the model, measured using the above-mentioned techniques, will not necessarily coincide with the model's STRF as represented by the weights of the frequency-delay layer (Fig. 1*D* and *Eq. 4*). Moreover, the STRFs used in our study constitute a restricted subset of all possible STRFs, and can be characterized by only 2 free parameters (*Eq. 4*). Although this probably constrains the ability of the model to fully describe the properties of a neuron, it allows a more robust characterization of part of its spectrotemporal properties based on a limited set of stimuli.

These differences between the model's STRF and a linearly estimated STRF have an important implication to the ability of the model to replicate experimental data. For example, the model is able to demonstrate sensitivity to very fast frequency transitions (>150 oct/s), which is common to many AI neurons (Kowalski et al. 1995; Nelken and Versnel 2000; and see Fig. 6*C*). The ability of the model to respond optimally to frequency transitions of such a high velocity is partly attributed to the relatively short time constants (3-9.8 ms; see the appendix) that are used to span the frequencydelay layer; however, the fast dynamics of the model are further boosted by the nonlinearities the delay layer undergoes. This property of the model is illustrated in Fig. 12*A*, which plots the average velocity sensitivity of the model (*V-log*) as a function of the filter bandwidth (*T*) and saturation scaling (*D*) parameters of the model's delay layer. For this plot we use the data of the 720 model neurons in the canonical set we have constructed using representative values for each of the 4 model parameters (see Figs. 7 and 8). Therefore each data point in Fig. 12*A* is the averaged *V-log* of the same 36 configurations of the model's STRF (6 values for α by 6 values for σ_{f}). Despite the fact that the STRF configurations at each data point are the same, the model's velocity sensitivity changes systematically as a function of the delay-layer filter bandwidth (*T*) and saturation scaling (*D*) parameters. Note that the average velocity sensitivity increases as the saturation transformation becomes steeper (smaller values of *D*), which implies that in the absence of this nonlinear operator the average velocity sensitivity of the model would be lower.

Typically, the STRFs estimated for AI neurons show much more sluggish characteristics than what is needed to produce sensitivity to high-velocity transitions (e.g., deCharms et al. 1998; Depireux et al. 2001; Miller et al. 2001). To verify this qualitative observation, we used 3 of the STRFs displayed in Depireux et al. (2001) and calculated their response to the stimuli used by Nelken and Versnel (2000). The STRFs and their corresponding velocity sensitivity (*V-log*) are shown in Fig. 12, *B*-*D*. The velocity sensitivity for these STRFs is indeed much lower than the typical values estimated for AI neurons. A similar observation was made by Bar-Yosef et al. (2002), who showed that AI neurons in the cat are sensitive to fluctuations in stimulus envelopes on a time scale that is much faster than predicted by most STRF studies.

### The model does more than fitting the data

The fact that the model is able to fit the responses of AI neurons to a variety of stimuli suggests that the model may capture some fundamental aspect of the computation that is being performed by neurons in the primary auditory cortex. Moreover, we show that a correlation-free combination of the model parameters reproduces the correlations between the experimental response indices, as reported in several studies. Thus these correlations are being reproduced by the intrinsic properties of the model and not by a particular choice of parameters. These findings support the claim that the model parameters better characterize the recorded neurons than the response indices that are extracted from the responses to a single type of stimulus. In this respect, the model performs a nonlinear transformation of the neural characteristics from a nonorthogonal experimentally measured coordinate system to a much more orthogonal coordinate system of the underlying computation.

The concept of a model-based characterization of the recorded neurons is further bolstered by the clustered topographical organization of the model parameters along the cortical surface. In 2 out of the 3 experiments we have analyzed, 7 out of the 8 possible maps for the 4 model parameters show a significant topographical clustering. In the other experiment (*162*), only the pseudo-rotation parameter was found to be significantly clustered. Interestingly, the quality of the model fit to the responses recorded from that experiment was significantly lower than that for the other 2 experiments. The relation between the quality of the fit and the quality of the topographical organization of the model parameters is not trivial, given that the experimental response indices showed equally weak topographical clustering in all 3 experiments. Also, as we already noted, the model parameters were adjusted separately to fit the responses of each multiunit cluster, without any information regarding its topographical location, nor the properties of its neighbors.

Primary sensory areas in the cortex of all mammals investigated to date demonstrated topographically ordered organization (Kaas 1997). Across the cortical surface, neurons that share a topographical proximity tend to have similar values of some functional properties. In many cases the sensory cortical maps relate to some dimension of the receptor organ. However, there are examples for cortical maps of computational properties (e.g., Bonhoeffer and Grinvald 1991; Kaas et al. 1979). This in mind, the fact that the model parameters reveal topographical organization that is substantially better than the topographical organization of the indices extracted from the responses the model fits strongly support the claim that the model parameters more fully and more simply characterize the recorded neurons than the experimental response indices. Thus we suggest that the model reflects some essential aspects of the computation performed by neurons in primary auditory cortex.

## APPENDIX: MATHEMATICAL EQUATIONS AND PARAMETERS OF THE MODEL

### Neural representation

The neural representation consists of a bank of 81 band-pass filters. The output of the *i*th filter is given by A1 where *E _{k}* and

*F*are the instantaneous amplitude level (in dB SPL) and the frequency of each of the

_{k}*n*input sinusoidals.

*MF*is the middle frequency of the model, τ

_{m}is set to 1 ms, and

*T*is one of the 4 adjustable parameters of the model.

### Delay layer

The operation of each unit *U _{i,j}*(

*t*) of the delay layer on its input

*N*(

_{i}*t*) is given by A2 In our simulations we used 69 τ

_{j}values equally spaced between 3 and 9.8 ms. The output of the units is saturated using the following sigmoidal transformation A3 where

*F*

_{max}is set to 225 spines/s and

*D*is one of the adjustable parameters of the model.

### Receptive field

The delay layer units are connected to a single neuron. The neuron's input *I*(*t*) is given by A4 where A5 and where μ_{τ} is the mean delay of the receptive field, set to 6.4 ms; σ_{τ} is the width of the delay time window, set to 0.2 ms; α, the pseudo-rotation parameter, and σ_{f}, the frequency integration bandwidth, are 2 of the 4 adjustable parameters of the model.

### Edge-detection neuron

The neuron is modeled as a simple leaky integrator with a voltage threshold (*T*), an absolute refractory period (δ^{abs} = 1 ms), and a refractoriness function A6 with a constant *K* → ∞, λ = 1.5 ms, and where *S*(*t*) is the positive step function (Gerstner 1999a).

The membrane potential *M*(*t*) of the neuron is given by A7 where {*s*_{1},..., *s _{n}*} represents the set of firing times of the neuron, the membrane time constant τ

_{ed}is set to 3 ms, and ξ(

*x*) is a random Gaussian noise with a zero mean and SD σ = 0.2

*T*.

## DISCLOSURES

This research was supported by a grant from the Human Frontiers Science Program.

## Acknowledgments

The authors thank S. Shamma in whose laboratory the experiments were conducted and H. Versnel who participated in the data collection and commented on a previous version of this manuscript.

## Footnotes

The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “

*advertisement*” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

- Copyright © 2003 by the American Physiological Society