To understand the neural representation of broadband, dynamic sounds in primary auditory cortex (AI), we characterize responses using the spectro-temporal response field (STRF). The STRF describes, predicts, and fully characterizes the linear dynamics of neurons in response to sounds with rich spectro-temporal envelopes. It is computed from the responses to elementary “ripples,” a family of sounds with drifting sinusoidal spectral envelopes. The collection of responses to all elementary ripples is the spectro-temporal transfer function. The complex spectro-temporal envelope of any broadband, dynamic sound can expressed as the linear sum of individual ripples. Previous experiments using ripples with downward drifting spectra suggested that the transfer function is separable, i.e., it is reducible into a product of purely temporal and purely spectral functions. Here we measure the responses to upward and downward drifting ripples, assuming reparability within each direction, to determine if the total bidirectional transfer function is fully separable. In general, the combined transfer function for two directions is not symmetric, and hence units in AI are not, in general, fully separable. Consequently, many AI units have complex response properties such as sensitivity to direction of motion, though most inseparable units are not strongly directionally selective. We show that for most neurons, the lack of full separability stems from differences between the upward and downward spectral cross-sections but not from the temporal cross-sections; this places strong constraints on the neural inputs of these AI units.
Only a few general organizational features are known in primary auditory cortex (AI). They include a spatially ordered tonotopic axis (Evans et al. 1965), bands of alternating binaural response properties (Imig and Adrian 1977; Middlebrooks et al. 1980), and a variety of other response features that change systematically along the isofrequency planes such as thresholds (Heil et al. 1994; Schreiner et al. 1992), bandwidths (Schreiner and Sutter 1992), FM selectivity (Heil et al. 1992; Mendelson et al. 1993; Shamma et al. 1993), and asymmetry of response areas (RAs; the span of frequencies that influence, both through excitation and inhibition, the response of a cell) (Shamma et al. 1993). To derive a functionally coherent picture of these maps, it is necessary to integrate these features within a comprehensive descriptor of the unit responses; one that can be quantitatively derived and employed to predict responses to novel stimuli.
Traditionally measured response areas are inadequate because they rarely include response dynamics and cannot be used to predict responses quantitatively. An alternative is the response field (RF) (Schreiner and Calhoun 1994; Shamma et al. 1995), a static, purely spectral function analogous to the RA except for the use of broadband sounds (but see Nelken et al. 1994; Sutter et al. 1996). A dynamic generalization of the RF is the spectro-temporal response field (STRF), a characteristic function of a neuron obtained using broadband sounds (Aertsen and Johannesma 1981; deCharms et al. 1998; Eggermont 1993 and references therein;Escabi and Schreiner 1999; Kowalski et al. 1996a; Kvale and Schreiner 1995;Theunissen et al. 2000). A schematic of an idealized STRF is illustrated in Fig. 1. Qualitatively, its spectral axis reflects the range of frequencies that influence the response or firing rate of the neuron being characterized, and its temporal axis reflects how this influence changes as a function of time. Positive-valued regions of the STRF describe excitatory influence, and negative regions describe inhibitory influence. The interplay between the spectral and temporal axes can give multiple interpretations to the STRF, e.g., as a time-evolving spectral response field or a family of impulse responses labeled by frequency band.
Over the last few years, we have developed new methods to derive the STRFs and characterize the responses of both single and multiple units in the ferret AI (Kowalski et al. 1996a,b). These methods use “moving ripples”: time-varying broadband sounds with sinusoidal spectral envelopes that drift a constant velocity along the logarithmic frequency axis. Figure 2illustrates the spectrogram of such a stimulus. Neuronal responses are vigorous and well phase-locked to these spectral and temporal envelope modulations over a range of ripple velocities and densities. Measuring the amplitude and phase of the locked component of the response enables one to construct transfer functions. A transfer function can be inverse-Fourier transformed to obtain the STRF that characterizes a unit's dynamics and selectivity along the tonotopic axis.
In developing these measurement and analysis methods, we use two fundamental assumptions. The first is that the responses are substantially linear with respect to the time-varying spectral envelope of stimuli. In particular, this implies that the response to the spectro-temporally rich stimulus—whose envelope can always be described as the sum of multiple moving ripples—will be the sum of its responses to the individual ripple components. This assumption was confirmed by successfully predicting responses to the superposition of multiple ripples (Kowalski et al. 1996b).
The second important assumption deals with the separability of the temporal and spectral aspects of the responses. Specifically we have demonstrated in other reports that temporal and spectral transfer functions can be measured independently of each other and then combined with a simple product to compute the total transfer function (Kowalski et al. 1996a). The importance of this finding stems from its experimental implications for measuring the STRFs and theoretical consequences for the biophysical and functional models of the STRFs. On the experimental side, separability makes it possible to infer responses to all ripple velocities and peak densities based on only a pair of temporal and spectral transfer functions. Without this assumption, measuring the two-dimensional transfer function is difficult because of the extended times needed to collect adequate spike counts. On the theoretical side, separability suggests that certain features of the STRF (as we shall discuss in detail in the following text) are formed by independent (and likely sequential) spectral and temporal processing stages.
In our earlier study (Kowalski et al. 1996a), separability was validated for ripples moving only in one direction (spectral envelope moving downward in frequency), a notion also known as “quadrant separability.” In this report, we compare the separable functions (spectral and temporal) across upward and downward quadrants. If the functions are the same across quadrants, the responses are “fully separable” (i.e., they are separable); otherwise they are quadrant separable, which is a (specialized) form of inseparability.
Like quadrant separability, full separability has experimental and theoretical implications. On the experimental side, fully separable STRFs can be measured with either upward or downward moving ripples. Theoretically, fully separable responses imply an STRF that is fully decomposable into the product of a purely temporal impulse response and a purely spectral response field. It also implies a unit that responds equally well to upward and downward moving ripples and hence has necessarily a symmetric transfer function magnitude with respect to direction (Watson and Ahumada 1985). By contrast, cells that are only quadrant separable necessarily respond in asymmetric fashion with respect to direction, i.e., are direction sensitive.
We restrict our presentation in this paper to measurements with singly presented moving ripples in contrast to simultaneously presented ripples discussed in Klein et al. (2000).
There are several goals of this paper. We present a method of measuring the complete descriptor of the linear spectro-temporal properties of an auditory cell, the STRF. We describe examples of STRFs measured in AI and summarize the distribution of the STRF and transfer function parameters encountered. We show that there is a directional sensitivity in the response to the upward versus downward moving components of a sound's spectral envelope. This breaks the symmetry of full spectro-temporal separability and produces quadrant separability. We propose measures to quantify quadrant and full separability. Finally, we discuss the significance of the results and their relationship to results from similar auditory and analogous visual experimental paradigms.
Surgery and animal preparation
Data were collected from a total of 11 domestic ferrets (Mustela putorius) supplied by Marshall Farms (Rochester, NY). The ferrets were anesthetized with pentobarbital sodium (40 mg/kg) and maintained under deep anesthesia during the surgery. Once the recording session started, a combination of ketamine (8 mg · kg−1 · h−1), xylazine (1.6 mg · kg−1 · h−1), atropine (10 μg · kg−1 · h−1), and dexamethasone (40 μg · kg−1 · h−1) was given throughout the experiment by continuous intravenous infusion, together with dextrose, 5% in Ringer solution, at a rate of 1 ml · kg−1· h−1 to maintain metabolic stability. The ectosylvian gyrus, which includes the primary auditory cortex, was exposed by craniotomy and the dura was reflected. The contralateral ear canal was exposed and partly resected, and a cone-shaped speculum containing a miniature speaker (Sony MDR-E464) was sutured to the meatal stump. For more details on the surgery, see Shamma et al. (1993).
Action potentials from single units were recorded using glass-insulated tungsten microelectrodes with 5–7 MΩ tip impedance at 1 kHz. Neural signals were fed through a window discriminator, and the time of spike occurrence relative to stimulus delivery was stored using a computer. In each animal, electrode penetrations were made orthogonal to the cortical surface. In each penetration, cells were typically isolated at depths of 350–600 μm corresponding to cortical layers III and IV (Shamma et al. 1993). In many instances, it was difficult to isolate reliably a single unit for extended recordings, and hence several units were recorded instead. Such data were labeled “multiunit recordings” and are explicitly designated as such and separated from the single-unit records in all data presentations in the paper.
All stimuli are computer synthesized. For each unit isolated, initial tests are carried out using tonal stimuli to measure the basic frequency response at several intensities to determine the best frequency (BF) and response threshold. All other stimuli used in these experiments have broadband spectra with a sinusoidally modulated (or rippled) envelope. We used the knowledge of the cell's BF to adjust the frequency range of the broadband sound so that the cell's excitatory and inhibitory regions lay well within the frequency range of the sounds.
In practice, it is hard to generate noise and then shape it with filters to a desired dynamic spectral envelope, so we generate ripples over a range of five octaves by taking logarithmically spaced pure tones with random (temporal) phases. The amplitudeS(t, x) of each tone is then Equation 1where x = log2(f/f 0) is the number of octaves above the base frequencyf 0. The ripple envelope resembles a drifting one-dimensional grating as illustrated in Fig. 2. Five independent parameters characterize the ripple envelope: background level or loudness of the stimulus (L); AM of the ripple (ΔA) in percentage or decibels; ripple velocity (w) in units of cycles/s (or Hz); ripple density (Ω) in units of cycles/octave; and the initial phase of the ripple Φ. The spectra consist either of 20 or 100 tones per octave equally spaced along the logarithmic frequency axis or with a spacing of 1 tone/Hz with an amplitude decay producing equal power per octave. The spectra typically span five octaves (e.g., 0.25–8 kHz) with the range chosen such that the response area of the cell tested lay within the stimulus spectrum. The choice of a density of 20 or 100 tones per octave does not alter the cortical responses; hence we do not specify which density was used.
A single-ripple stimulus at overall level L dB SPL would typically be composed of N logarithmically spaced components, each at L−10 log10 (N) ≈ L−20 dB for N = 101. The overall stimulus level was chosen on the basis of threshold at BF; typically L was set 10–20 dB above threshold. High levels (L > 70 dB) were avoided to ensure the linearity of our stimulus delivery system. The amplitude of a single ripple was defined as the maximum percentage or logarithm change in the component amplitudes. Ripple amplitudes were either 90% (linear) or 10 dB (logarithmic) modulations.
The ripple velocities w and ripple densities Ω used were determined by the response properties of the neuron, but the typical range was ‖w‖ < 25 Hz (with some units requiring up to 100 Hz) and ‖Ω‖ < 1.6 cycles/octaves (with some units requiring up to 4 cycles/octaves). Single ripples were always presented with Φ = 0.
By the convention established in Eq. 1, a ripple whose spectral envelope is moving downward in frequency, as in Fig. 2, has positive w and positive Ω; equivalently, it can be described by a ripple with negative w and negative Ω, and an added phase shift of π, by Eq. 1 and the identity sin (α) = sin (−α + π). A ripple whose spectral peaks are moving upward in frequency has negative w and positive Ω, or by Eq. 1 and the same identity, positive w,negative Ω, and an added phase shift of π.
The stimulus bursts had an 8-ms rise/fall time and duration of 1.0 or 1.7 s, repeated every 3–4 s. All stimuli were gated and fed through an equalizer into the earphone. Calibration of the sound delivery system (to obtain a flat frequency response up to 20 kHz) was performed in situ with the use of a -in Brüel and Kjaer 4170 probe microphone. The microphone was inserted into the ear canal through the wall of the speculum to within 5 mm of the tympanic membrane. The speculum and microphone setup resembles closely that suggested by Evans (1979).
DEFINING THE STRF.
The fundamental tool to measure linearity and separability of primary cortical cell is to measure their STRF. The STRF is a spectro-temporal function STRF(t, x). The linear response ratey(t) of a cell is related to its STRF(t, x) and the spectro-temporal envelope of the stimulusS(t, x) by y(t) = ∫ ∫dt′dxS(t′ − t, x) · STRF(t, x), i.e., convolution along the time dimensiont and integration along the spectral dimension x.
The STRF is measured through its two-dimensional Fourier transform, or transfer function T(w, Ω) = ℱwΩ[STRF(t −x)], and then inverse transformed to compute the STRF, where the coordinates dual to t and x arew and Ω, respectively (see Fig.3). By measuring the sinusoidal component with temporal frequency w of the responsey wΩ(t) of a cell to a ripple of specific ripple velocity w and ripple density Ω, we can obtain the transfer function T(w, Ω) at one point in w − Ω space (Depireux et al. 1998) Equation 2This way, we derive the amplitude ‖T(w, Ω)‖ and phase Φ(w, Ω) of the complex transfer function T(w, Ω) by measuring the amplitude and phase of the (real) response of the cell. Note that the use of complex numbers is not theoretically necessary, but it does simplify the calculations in the transfer function space considerably. By the definition of the transfer function, it follows that the inverse Fourier transform of T(w, Ω) is the STRF of the cell Equation 3Because STRF(t, x) is real butT(w, Ω) is complex, there is complex conjugate symmetry Equation 4which also holds for the Fourier transform of any real function of t and x.
DEFINING AND ASSESSING SEPARABILITY.
Separability is an important property of the transfer functions. A fully separable transfer function is one that factorizes into a function of w and a function of Ω over all quadrants:T(w, Ω) = F(w) · G(Ω). This implies that STRF(t, x) is time-spectrum separable: STRF(t, x) = IR(t) · RF(x). In this case, one needs only measure the transfer function for all Ω at a convenient wand for all w at a convenient Ω.F(w) and G(Ω) are each complex-conjugate symmetric [F(−w) =F*(w), G(−Ω) =G*(Ω)] because IR(t) and RF(x) are real, so one needs only consider the positive values of each. This dramatically decreases the number of measurements needed to characterize the STRF.
A transfer function may also be only partially separable in that it is separable only for ripples moving in a given direction (upward vs. downward). In this case, the transfer function is called quadrant separable and can be expressed as the product of two independent functions Equation 5where the subscript 1 indicates the w > 0, Ω > 0 quadrant, and the subscript 2 the w < 0, Ω > 0 quadrant (see Fig. 3). Note that by reality of the STRF, the value of the transfer function in quadrants 3 (w < 0, Ω < 0) and 4 (w > 0, Ω < 0) is complex conjugate to the value in quadrants 1 and 2, respectively. In this case, the STRF is not separable in spectrum and time but is the linear superposition of two functions, one with support only in quadrant 1 (and 3) and one with support only in quadrant 2 (and 4).
Separability need not be an all-or-none property but rather can be assessed in a graded fashion. To do so, we apply singular value decomposition (SVD) of the matrix T of measured transfer-function values (Haykin 1996). T can be viewed as a matrix created by sampling the ideal transfer function at regularly spaced discrete values of w and Ω with random noise added to each sample. SVD decomposes T as Equation 6Here † denotes the Hermitian transpose and U, V are matrices containing “singular” row vectorsu i and υi corresponding to spectral and temporal cross-sections, respectively, of separable transfer functions. Thus the SVD can be viewed as decomposingT into a linear sum of n separable matrices, each weighted by its ability to approximate T as a weighted product of two vectors as in Eq. 6, as given by the “singular values” λ's. Because of the presence of noise in the measurement, the λ's are all expected to be nonzero with their values decreasing monotonically to a noise floor, which depends on the level of the noise.
With respect to this floor, the number of significant singular values depends on the nature of the measured transfer function T. The closer T is to being separable, the more dominant the first singular value λ1 will be over its counterparts, which share the residual error in a manner that depends on the precise nature of the inseparability. We have used this fact to define a single measure of the “distance” of the system from separability or alternatively the “degree of inseparability” αSVD Equation 7which is the proportion of T's total power (= Σiλ ), which is not accounted for by its best separable approximation. Values near zero indicate that only the first singular value has a large nonzero value (hence the STRF is separable). Values approaching 1 indicate an increasing dose of inseparability.
The handy measure of αSVD brands inseparability by its strength but otherwise reveals nothing of its nature. Therefore we examine the origin of inseparability by other means. Specifically we shall analyze three factors that give rise to inseparability.
1) The relative power in the first and second quadrants Equation 8where P 1 = power in quadrant 1 and P 2 = power in quadrant 2. Note that power is measured by summing the squared magnitudes of all transfer function values within the appropriate quadrant. An absolute value of αd near one implies strong selectivity of the responses to the direction of ripple movement and hence strong inseparability.
2) The asymmetry of the spectral transfer function around Ω = 0 is Equation 9where the quantity inside the large absolute value bars is the (complex) correlation between G 1(Ω) andG 2(Ω). Index αs values near one imply strong asymmetry (i.e., lack of correlation) in the transfer function to different directions and hence strong inseparability.
3) The asymmetry of the temporal transfer function aroundw = 0 is Equation 10where the quantity inside the large absolute value bars is the (complex) correlation betweenF 1(w) andF * 2(−w). Index αt values near 1 imply strong asymmetry (i.e., lack of correlation) in the transfer function to different directions, and hence strong inseparability.
EFFECT OF FINITE SAMPLING.
We measure the transfer function of cells by varying two parameters, ripple velocity and ripple density. For consistency's sake, we used the same range of parameters for a majority of cells. However, for some cells, the transfer function has not decreased significantly at the “edges” (for instance, in Fig. 9 C, the temporal transfer function is clearly still strong at ±64 Hz and above). This is equivalent to multiplying the true transfer function by a rectangular function which is zero everywhere except between −64 and 64 Hz, over which range it is 1. In the dual Fourier space of the transfer function space, that is, in the STRF space with coordinates t andx, this corresponds to convolving along each dimension the STRF with the Fourier transform of a rectangular pulse, that is, with sin (x)/x. This leads to spurious oscillations in the display of the STRF as can be seen in Fig. 9 C and others. These oscillations would disappear if we had measured the transfer functions all the way to their vanishing values.
Since all the characteristic parameters in this paper (see Table1) are derived in transfer function space, it does not affect the analysis, but it may lead to misleading features in the STRFs.
DEVIATIONS FROM LINEARITY.
Because the STRF is a measure of the linear part of the dynamics of a cell, we only consider effects that might modify the measurement of the first component of the Fourier transform of the period histograms. The most prominent nonlinearities are (approximate) half-wave rectification and compression. The half-wave rectification is primarily due to the positivity of spike rates (ordinarily the steady-state response to a flat spectrum is significantly less than half the peak firing rate of the unit); the distortion of a sinusoid due to half-wave rectification does not affect the phase of the response, and its effect on the amplitude of the first Fourier component is a constant factor, independent of w and Ω. The distortion due to compression or saturation, similarly, does not affect the phase of the Fourier transform components of the response and similarly affects the amplitude only by an overall constant factor for stimuli of moderate level.
Nonlinearities of other types, such as static nonlinearities, if they exist, are quite small and have not shown up in our studies. Ultimately, the proof of linearity, and the relevance of the STRF, is found when one compares the predictions of the response of a cell to a new sound compared with the actual response. We have not found any evidence of systematic deviation between predicted and actual response that would indicate the presence of static nonlinearities.
Many of the data analysis methods described here are similar or straightforward extensions of those developed earlier inKowalski et al. (1996a), and those will be only briefly reviewed here. Figures 4 and5 illustrate the nature of the responses to the ripple stimuli and the analysis to extract the spectral (Fig. 4) and temporal (Fig. 5) transfer functions. In Fig. 4 A, the ripples are presented at 8 Hz for ripple densities from −1.6 to 1.6 cycle/octave in steps of 0.2 cycle/octave. Each stimulus is presented 15 times.
For each ripple density, we compute at 16-bin period histogram based on the responses starting at 120 ms (to exclude the onset response; Fig.4 B). A 16-point Fourier transform (FFT) is then performed on the period histogram, and the amplitude and phase of the first component is taken to be the amplitude and phase of the transfer function. If the modulation of the response was that of a purely linear system, the higher FFT coefficients would be negligible, but because of half-wave rectification and compression, they sometimes are significant. In general Tw (Ω) can be written as Equation 11where j = . Figure4 C illustrates the magnitude ‖Tw (Ω)‖ and the unwrapped phase Φw(Ω) of the transfer functionTw (Ω). The ripple density at which ‖Tw (Ω)‖ is a maximum is designated as Ωm (= 0.0 octave/cycle in Fig.4 C).
Analogous steps are followed in measuring the temporal transfer function as shown in Fig. 5 where ripples are presented at 0.2 cycle/octave for ripple velocities from −24 to 24 Hz in steps of 4 Hz.
Note that in the previous paper (Kowalski et al. 1996a), we weighted the measurement of the first component of the Fourier transforms of the period histograms by a weighted sum of the higher frequency components of the transform. This, however, is not compatible with the idea of a linear system so that the resultant STRF or equivalently the ripple transfer function T would not be expected to be the best possible predictor of the response to new sounds. Therefore in this paper, the values of T correspond directly to the first component of the Fourier transform.
Once the ripple transfer function has been measured, it can be inverse Fourier transformed to display the STRF. Since the transfer function is typically measured over fewer than 8 points along each dimension in each quadrant, the resulting STRF as computed would look very jagged even if the underlying STRF was smooth. We therefore interpolate to a smooth STRF for display purposes, padding the transfer function with zeros to a size of 64 × 64. All statistics and predictions use the measured unsmoothed STRF.
To construct the two-dimensional transfer function, we assume quadrant separability, measure the transfer function along the cross-sections shown in Fig. 3, to combine these spectral and temporal cross sections as illustrated in Fig. 6. For each quadrant, the transfer function is the outer product of the cross-section, divided by the (complex) value of the transfer function at the crossover (×) point. In Fig. 6, the point is (w ×1, Ω×1) = (8 Hz, 0.2 cycle/octave) in quadrant 1 and (w ×2, Ω×2) = (−8 Hz, 0.2 cycles/octave) in quadrant 2. Equation 12where q = 1 and q = 2 are the independent quadrants 1 and 2. In practice, the value of the transfer function along the two cross-sections was measured at two different times, giving two measurements of the transfer function at each crossover pointT(w ×q, Ω×q). The results of the two measurements may differ, and so we use the (complex) geometric mean of the two measured values as the divisor in Eq. 12, T eff(w ×q, Ω×q) = [T 1st(w ×q, Ω×q)T 2nd(w ×q, Ω×q)]1/2.
The ratioT 1st(w ×q, Ω×q)/T 2nd(w ×q, Ω×q), which should be unity, reflects noise in the system and is used to estimate reliability in the following text.
The value of the transfer function along the w = 0 axis is set to zero because the modulation transfer function is not well defined there, i.e., there is no modulation of firing rate around the DC (average) rate with a frequency of 0 Hz. The value of the transfer function along the Ω = 0 axis is not measured directly, so the value used is the mean of the value inferred from being the boundary of quadrant 1 and that inferred from being the boundary of quadrant 2.
Once the values of transfer functions for quadrants 1 and 2 and their boundaries are measured, the values for quadrants 3 and 4 are given byEq. 4 (see also Fig. 3). The STRF is then computed by an inverse Fourier transform (as in Eq. 3 ) and is illustrated in Fig. 6 B (left). This interpolated version of the STRF (used for display) is obtained by using Eq. 3 on the transfer function padded with zeros at high ‖w‖ and ‖Ω‖ (see Fig. 6 A).
Deriving STRF parameters from the phase functions
Numerous parameters can be derived from the STRF (or equivalently the transfer function) that are analogous to traditional response measures such as BF, tuning curve bandwidth, and latency. Most of these parameters are best derived from analysis of the phase of the transfer functions (Fig. 7).
We model the phase of the transfer function within each quadrant Φq(w, Ω), q = 1, 2 (see Eq. 2 ) as a linear function of w and Ω Equation 13where τ is the mean or group delay of the STRF (a portion of which comes from the response latency), x = log (f /f 0) is the mean frequency (in octaves above the base frequency of the ripple, see Eq. 1 ) around which the STRF is centered (putting it near the BF), and χq is a constant phase angle, for each quadrant q. The complex-conjugate symmetry of the transfer function means that these six independent parameters describe the phase everywhere in thew − Ω plane. The convention of the minus sign before τd allows the time-dependent responses to be functions of (t − τd) as is appropriate for a delay.
The justification for assuming linear fits of the phase functions has been discussed in detail earlier in (Depireux et al. 1998) and is strongly motivated by the data (Kowalski et al. 1996a). Note, however, that the assumption ofphase linearity is used only for parameter estimation and is not assumed in computing the STRF. The first linear term in Eq.13 stems from the fact that auditory units differing in their mean neural delays will exhibit linear phase dependence on w with different slope depending on delay. Analogous arguments apply for units that are located at different places along the tonotopic axis: the response phase of different units (with otherwise identical STRFs) changes linearly with Ω at different rates, depending on the relative center frequency locations. In both cases, the slopes of the linear phase function indicate the absolute shift of the STRF relative to the origin, i.e., the mean time delay τ relative to the start of the stimulus, and the center frequencyx relative to the low frequency edge of the ripple spectrum. The linear phase model does not assume that the linear phase shifts, τ andx , are equal across quadrants, but tonotopy suggests thatf andf should be approximately equal and τ ≈ τ since the temporal delays of the neural inputs are not segregated by quadrant. This is shown experimentally in the following text.
An interpretation of τd, for each quadrant, is that it is the sum of the pure response latency and (roughly) half the temporal width of the STRF. This is in contrast to the STRF's peak delay, τSTRF, defined to be the delay for which the STRF achieves its maximum value, which may lead or lag τd, depending on the constant temporal phase shift, θ, defined in the following text. Similarly,f m for each quadrant may or may not fall on the STRF's best frequency,BF STRF, defined to be the frequency at which the STRF achieves its maximum value, depending on the constant spectral phase shift, φ, defined in the following text.
A convenient convention for interpreting the constant component of the phase is to break up the constant phase angle χq into two parts Equation 14θ and φ are, respectively, the temporal polarity and spectral asymmetry of the STRF. Spectral asymmetry parameterizes the balance of the STRF along the spectral axis about its center. For example, a unit with φ = 0 would have itsBF STRF in the center of the spectral envelope of the STRF, possibly surrounded by inhibitory regions. A unit with φ > 0 would have itsBF STRF at a lower frequency than the center of the STRF with an inhibitory sideband aboveBF STRF. A unit with φ < 0 would have its BF STRF at a higher frequency than the center of the STRF, with an inhibitory sideband below BF STRF (see example in Fig. 4C of Shamma et al. 1995). Similarly the temporal polarity parametrizes the balance of the STRF along the temporal axis about its center: whether the peak response occurs before or after regions of inhibition, respectively, θ < 0 (“onset response at BF”) or θ > 0 (“offset response at BF”). There is an ambiguity in fixing θ and φ that we remove by restricting φ to lie between −90 and +90°, while θ ranges the full −180 to +180°. See Fig. 7 as an illustration of the phase behavior in the different quadrants.
In past reports (Kowalski et al. 1996a), θ and φ could be measured without measuring the transfer function in the upward moving quadrant 2 by measuring the constant component of the phase in quadrant 1 (χ1 = −θ + φ) and along thew axis, where the constant component of the phase is expected to be the mean across the quadrants [(χ1 − χ2)/2 = −θ; note the change in convention of θ → −θ between the present work and Kowalski et al. (1996a)].
Because of response variability, we only fit to those points of the transfer function that have more than half of the response power in the first component of the Fourier transform. Then the fit is done across the entire two-dimensional phase plane for each quadrant. Ultimately our unwrapping method is less than ideal, and estimates of θ and φ especially reflect that (Ghiglia and Pritt 1998).
Estimating response variability: the bootstrap method
Variability in our experiments originates from multiple sources, including internal neural mechanisms (e.g., Poisson-like distributions of spike times), extracellular recording/identifying methods, and equipment noise. Quantitative estimates of the reliability of our measurements is crucial to its analysis and subsequent interpretation. A method of variability estimation that is especially appropriate to these measurements is the bootstrap method (Efron and Tibshirani 1993; Politis 1998).
The essence of this method is to use “resamples,” in whichN samples of bootstrap data are drawn with replacement from the N original samples of data. Repeating this procedure a large number of times creates a population of bootstrap resamples whose probability distribution is a good estimator of the probability distribution from which the original data were drawn.
To illustrate this procedure, consider measuring the transfer function at a point (w, Ω). This is done by presenting the same (w, Ω) stimulus N times and constructing a period histogram based on all N sweeps. The amplitude and phase of the first Fourier component of the period histogram are assigned to the amplitude and phase of the transfer function. A single bootstrap resampling of the responses will have N sweeps, where, because they are drawn from the original responses with replacement, some will be duplicated and some will be unused. Nevertheless a period of histogram is constructed, and the bootstrap estimate of the transfer function is assigned to its first Fourier component. Performing a large number of bootstrap resamples results in a population of estimates for the transfer function. This population has a mean, variance, and higher-order moments. These moments are estimators of the moments of the original population (of all transfer functions of all allowable neuronal responses to the stimulus). For example, the standard deviation of all bootstrap estimates of the transfer function is an estimator of the standard deviation of measurements of the transfer function. This allows us to put error bars on our transfer functions and STRFs.
Effects of crossover point errors
Another significant source of error is the difference between the responses of repeated measurements at the transfer function crossover points. The ratio of these independent measurements,T 1st(w , Ω )/T 2nd(w , Ω ) should be unity. When not unity, it reflects the same variability measured by the bootstrap method but also additional systematic error from having measured the two transfer function cross-sections at different times. To account for this disparity, the total squared error of the STRF is set to the sum of the bootstrap STRF variance and the square of the crossover error ς× Equation 15where ς×(t, x) captures the systematic error from not having taken all data at the same time and is given by Equation 16 Finally, we collapse the error over the entire (t, x) plane into two dimensionless terms δ and ε Equation 17 Equation 18where ΔT and ΔX are the length of time and number of octaves over which the STRF was measured.
δ is a measure of the average standard deviation in units of the maximum of the STRF. ε is a measure of the variance in units of power. If noise is additive, then ε =P ε/(P +P ε) = 1/(SNR + 1), withP = power, P ε = noise power, and SNR = signal-to-noise ratio. ε should go down with the number of recordings, assuming the system can be described as the time-invariant random process.
Data presented here were collected from 22 single-unit and 54 multiunit recordings in 11 ferrets. In the summary histograms, both single units and multiunit are included but are distinguished from each other.
Most units encountered in AI respond well to moving ripples. Responses are typically phase-locked to the moving envelope of the ripple over a range of ripple velocities and densities. However, of a total of 172 recordings made, only 76 cases provided adequate quality and quantity of responses. The reasons for this low yield vary. For example, we have encountered responses from a few units that were either poorly phase-locked or were inconsistent from trial to trial; such units were abandoned since our analysis methods are unsuitable for their characterization. Also because of extended recording times, typically over an hour, units were sometimes lost before sufficient data could be collected to carry out a full analysis. In other cases, the unit or animal changed state during the recording session, rendering the data unreliable. The reason for the extended recording time is to present ripple sounds and other sounds consisting of combinations of ripples, so we can verify linearity by using the STRFs to predict the response of the cell to new sounds. We found empirically that about 10,000 spikes are typically needed to obtain an STRF with well-defined features in response to single ripples, which with our sound paradigm usually corresponds to a 20-min presentation per cross-section. To eliminate data corresponding to unreliable cells, as described in the preceding text, we use units only with values of δ ≤ 0.12 and ε ≤ 0.7 (see methods) as the threshold for rejecting the data. These reliability statistics takes into account most of the preceding sources of error. The values of 0.12 and 0.7 are somewhat arbitrary, though we found that cells tended to separate themselves into two populations above and below these thresholds, respectively, and that the mathematical criteria of reliable versus noisy cell corresponded well with our intuitive perception based on visual inspection.
Responses to moving ripples
On average, AI units synchronize their responses to upward and downward moving ripples equally effectively with ripple velocities ranging from 2 to over 100 Hz, and ripple densities up to 4 cycle/octave. Examples of several temporal and spectral transfer function magnitudes are shown in Figs.8-10, each with its corresponding STRF. In all cases, units respond well only over a specific range of ripple velocities and ripple densities, but the detailed shape and extent of the transfer functions vary from one unit to another. For instance, the unit in Fig. 9 A responds well only to ripple velocities of ±4 Hz, whereas the unit in Fig.9 C responds well at least up to ±64 Hz. The unit in Fig. 6responds well to ripple densities within ±0.4 cycle/octave, whereas the unit in Fig. 10 A responds over a wider range of densities but poorly at 0 cycle/octave.
As described in the preceding text, the transfer function atw = 0 is set to 0 since it is not well defined (and so has 0 contribution to the STRF). Additionally, for 12 cells (not shown), the transfer function was measured from ±8 to ±1 Hz in 1-Hz steps, and in all cases, the transfer function was negligible at the slowest ripple velocities (in contrast to the average firing rates, which remained significant).
Units also vary significantly in the asymmetry of their transfer functions with respect to the direction of the moving ripple. For example, responses to the two directions are relatively equal (transfer functions are roughly symmetric) in Figs. 6 and 9 A. By comparison, the temporal transfer functions in Fig. 8, A–C,are asymmetric. The unit in Fig. 8 B responds better to upward moving ripples; the unit in Fig. 8 C responds over a wider range to downward moving ripples. These asymmetries are discussed in depth later in the context of transfer function separability.
The STRFs derived from these transfer functions commonly exhibit alternating significant regions of positive peaks and negative basins, interpreted here as excitatory and inhibitory regions, respectively. The four STRFs illustrated in Figs. 6 and 8 are of units that are tuned between 1 and 2 kHz. However, the shapes of the surrounding inhibitory regions vary considerably reflecting the different temporal and spectral transfer functions (see Fig.11). For instance, STRFs may be relatively symmetric (Fig. 8 A) or asymmetric (Fig.9 C). They can be clearly directional, i.e., tilted one way (Fig. 8 B) or the other (Fig. 8 C) on the spectro-temporal surface.
STRFs display a wide variety of shapes that are briefly described in the following text. The majority of AI cells exhibit STRFs with a simple excitatory field and varying amounts of inhibitory surround. The first peak of the excitatory portion indicates theBF STRF of the unit, while its extent reflects its tuning curve at a given level.
In many cases, the inhibitory surround is spectrally asymmetric around the BF STRF (Fig.9 C); such asymmetry is effectively captured by the parameter φ (Eq. 14 ), where φ values near zero indicate roughly symmetric STRFs, while φ ≈ 90° indicate strong inhibition below the BF STRF, and φ ≈ −90° indicates strong inhibition above theBF STRF. The φ distribution in our sample is summarized in Fig.12 C. It closely resembles that seen earlier with downward moving and stationary ripples (Kowalski et al. 1996a; Schreiner and Calhoun 1994; Versnel et al. 1995).
STRFs also vary considerably in their temporal dynamics, best seen in the t − x domain. Some are fast with envelopes that decay relatively rapidly (Figs. 9 C and10 A). Others are slow, taking over 150 ms to decay (as in Figs. 9 A and 10 B). These response dynamics reflect details of the temporal transfer function such as the ripple velocity at which it peaks (characteristic ripple velocity) and its width (ripple velocity bandwidth). STRFs also exhibit an onset delay (or latency) that is captured by the τd values, derived from the phase function (Eq. 13 ). The distribution of this delay tends to be well clustered around 25 ms as seen in Fig.12 B. Finally, unit STRFs can be generally classified as either onset (Figs. 9, A–C, and 10, Aand B, most cells) or offset (Fig.10 C), a property that corresponds, respectively, to the negative or positive sign of the parameter θ. Onset STRFs are far more common in our sample as seen in the θ distribution in Fig.12 C.
Finally, STRFs may display very complex dynamics and spectro-temporal selectivity that are not easily captured by simple parameters. Two examples of such STRFs are shown in Fig. 11. One might be tempted to dismiss such STRFs as mere aberration or noise except that they are derived from repeatable responses (δ = 0.10 and ε = 0.49 for Fig. 11 A and δ = 0.03 and ε = 0.04 for Fig. 11 B).
Separability and its relation to STRF shape
Separability is an important property of the transfer functions that has significant experimental and theoretical implications. In this paper, we assume quadrant separability and ask whether responses are fully separable, the degree of inseparability, and the origin of the inseparability. Each of these indicators has a potentially useful interpretation for the shape of the STRF and the underlying structure of processes that give rise to it.
The simplest and most general way to examine full separability is to compute the SVD matrix αSVD (Eq. 6 ). Figure 13 illustrates the distribution of αSVD, Eq. 7, computed from all the cells used. Values near 0 indicate that only the first singular value has a large nonzero value and hence that the STRF is fully separable. Increasing values indicate increasing degree of inseparability. A significant fraction of cells deviate from full separability.
It can be shown that fully separable transfer functions must have magnitudes that are symmetric about the (w, Ω) origin, Alone, αSVD offers no insight into the specific nature of these departures from the symmetric, separable case. However, it will be shown that there are three parameters (Eqs.8-10 ) that in combination form αSVD and that each corresponds to a specific distortion of a separable transfer function:
1) αd, the response directionality, or the imbalance in the overall strength of the responses to the upward and downward moving ripples;
2) αt, the asymmetry in the temporal transfer function F(w);
3) αs, the asymmetry in the special transfer function G(Ω).
The distribution of these three parameters is shown in Fig.14. The directionality parameter αd is distributed approximately normally between negative and positive values. This parameter is closely related to the directional selectivity of the STRF. STRFs with large ‖αd‖ values exhibit obvious directional shapes such as seen in Fig. 14 (top, middle). A significant proportion of units (37%) also have spectral dissimilarity values (αs) exceeding 0.3. An STRF with especially large αs is shown in Fig. 14(middle). Note that these STRFs may not necessarily exhibit obvious directionally selective shapes.
A strikingly different finding is the dearth of units (12%) with significant temporal dissimilarity (α > 0.3) as seen in the distribution in Fig. 14 (bottom, left). An STRF with α = 0.30 is displayed in Fig. 14 (bottom, middle): it is difficult to detect simple correlates of the large αt values in the shape of the STRF. Note that this is not due to measuring the temporal transfer function at six points and the spectral transfer function at eight points in each quadrant: when the last two points of the spectral cross section are removed, the same results are obtained.
The three inseparability indicators do not appear to be significantly correlated, based on the pairwise scatter plots in Fig.15, suggesting that independent mechanisms underlie the expression of each factor. By contrast, each factor (as expected) is well correlated with the total SVD index as seen in Fig. 14 (right).
We can define a composite measure of inseparability, the mean of αt, αs, and ‖αd‖. Figure16 illustrates that this measure is highly correlated to αSVD and hence is an equally valid measure of inseparability.
There is no sharp threshold for inseparability. In Fig. 13, for instance, αSVD ≈ 0.35 clearly corresponds to an inseparable cell. However, because of the continuum of values for αSVD, there is no obvious cutoff.
Summary of results
The emphasis of this work has been on presenting a technique to describe neural response patterns of units in the cortex. More precisely, we use moving ripples to characterize the spectral and temporal properties of responses of auditory cortical neurons, although this is a general method that can be used for any population of neurons for which responses are shown to be substantially linear for broadband stimuli.
We have examined the nature of AI responses to rippled spectra moving in both upward and downward directions and incorporated these responses into the STRF. A summary of the main results follows.
1) We confirm earlier findings (Kowalski et al. 1996a) that AI units respond in a phase-locked fashion to the moving ripples over a range of velocities and directions that depend on the ripple density of the spectrum. In particular, responses are usually tuned around a specific ripple velocity and density. In the ferret, responses are commonly best in the 4- to 16-Hz range and densities lower than 2 cycle/octave. These findings are roughly consistent with those found in different species using different experimental paradigms: experiments with dynamic spectra (e.g., narrowband such as AM and FM tones or broadband such as modulated noise and click trains) have found similar maximum rates of synchronized responses in AI (Eggermont 1994; Schreiner and Urbas 1988).
2) We demonstrate a similarity between responses to upward and downward moving ripples. Specifically, the response parameter values and distributions to either direction are comparable (even if unequal), and hence reflect general dynamic response properties, not direction specific properties per se.
3) Complete spectro-temporal transfer functions are measured that exhibit a rich variety of shapes and cover a wide range of stimulus parameters. The STRF describes the way AI units integrate stimulus power along the spectro-temporal dimensions.
4) We illustrate a variety of STRFs with a broad range of BFs, bandwidths, asymmetrical inhibition, temporal dynamics, and direction selectivity. We have assessed the prevalence of these features over all sampled units by examining the distribution of specific parameters that reflect each of these features.
5) The degree and origin of inseparability of the unit transfer functions is assessed using two methods. In the first, SVD analysis is applied to the entire transfer function to determine the number and ratio of the resulting singular values. The results indicate that AI units span a relatively uniform distribution between full separability to moderate inseparability. In the second method, we examine the origin of inseparability and find that it is primarily due to two factors: imbalance in the response power and an asymmetry in the spectral transfer function relative to the direction of ripple motion. Interestingly, we find that temporal (but not spectral) transfer functions are relatively symmetric and hence contribute little to overall transfer function inseparability.
In Kowalski et al. (1996a,b), pentobarbital was used for anesthesia; in the present study, a ketamine/xylazine combination was used. In Kohn et al. (1996), the effect of different anesthetics on the tuning properties of auditory cortical cells as a whole was presented. Under ketamine, a wider variety of responses was found, tuning to ripple density was slightly lower (from 1.05 cycle/octave under pentobarbital to 0.8 cycle/octave under ketamine), and no significant change in temporal tuning was observed. Other properties, though, such as linearity of the STRF for downward moving ripples, were unchanged. These results can be accounted for by assuming that overall, response fields measured with ripples have less inhibition under ketamine than under pentobarbital.
Separability and its implications
An important property of the responses is that for ripples moving in only one direction, the spectral and temporal functions are separable: within each quadrant they can be measured independently of each other. The property of quadrant separability makes it possible to measure the overall spectro-temporal transfer function in reasonable times using only single ripples since only a few velocity and spectral density combinations need to be measured. We have established (Kowalski et al. 1996a) that all recorded transfer functions in AI exhibit quadrant separability. In the experiments reported here, we assumed quadrant separability (Kowalski et al. 1996a,b) and proceeded to examine whether the resulting two-dimensional transfer functions are fully separable. Our findings indicate that AI responses fall uniformly on a continuum between moderately to fully separable.
A fully separable cell cannot be directionally selective in its responses. Inseparability is a necessary condition for the formation of more complex STRFs; direction selectivity is one possible consequence of inseparability. A directionally selective STRF usually has a distinctive elongated form along a spectro-temporal direction that matches that of its most sensitive ripple stimulus. For example, the STRF illustrated in Fig. 8 B is most responsive to a ripple Ω = −0.4 cycle/octave, w = −8 Hz, whose spectrogram matches well the outline of the STRF spacing and orientation. Direction selectively implies that a unit is differentially responsive to one direction of ripple movement and hence must have a significant nonzero directionality index. Therefore direction selectivity necessarily implies an inseparable STRF. The opposite is not true: an inseparable STRF might reflect other factors such as asymmetric temporal and/or spectral transfer functions (αt or αs ≠ 0), which do not manifest themselves in an obvious elongated form or preferential responses to one direction or another (as shown in Fig. 14,center column, middle and bottom).
Separability also places strong constraints on the underlying biological processes that give rise to the STRF shapes. For example, full separability suggests that the STRF is constituted of independent temporal and spectral processing stages. By contrast, inseparability (or just quadrant separability) implies spectrally and temporally intertwined stages of processing with the specific form of the model being entirely dependent on the details of the transfer functions. Quadrant separability in particular is a very strong constraint on both the neural inputs and the processing of the unit: almost all neural networks (whether linear or nonlinear) with multiple fully separable STRFs as inputs will in general produce a totally inseparable STRF. In particular, the naive procedure of constructing a directionally sensitive STRF by talking the simple sum of two fully separable STRFs with differing f m and τd will produce a totally inseparable STRF which is not quadrant separable. To produce a quadrant separable STRF requires special inputs and/or special processing.
It can be shown that a quadrant separable, temporally symmetric (i.e., αt ≪ 1), cortical neuron can be easily constructed by taking inputs from (potentially) many units with (potentially) different spectral response fields and even with (potentially) different temporal impulse response properties as long as the temporal dynamics of the inputs to the cortical cell are fast compared with the temporal dynamics of the cortical cell itself (Simon et al. 2000). Quadrant separability then occurs when the inputs are temporally phase-lagged relative to each other [though not necessarily 90° as in Saul and Humphrey (1990) and Dong and Atick (1995)].
This is consistent with the input neural connectivity one expects from layer IV cortical neurons, which receive input from thalamic medial geniculate body (MGB). MBG neurons may have fully separable STRF [as is the case for typical inferior colliculus central (ICC) neurons (Escabi and Schreiner 1999)] with different spectral response fields (differing in width, extent/location of inhibitory bands, and to a lesser extent, best frequency). MGB temporal cross-sections of transfer functions are essential constant when low-passed at a cutoff frequency appropriate to cortical behavior (e.g., typically well below 100 Hz) (Yeshurun et al. 1985). Furthermore some MGB neurons may have a temporal phase lag, as in the visual system's lateral geniculate's “lagged cells” (Saul and Humphrey 1990).
Significantly, the property of quadrant separability with temporal symmetry does not allow for any cortical inputs unless those inputs have the same temporal behavior as the neuron studied. If, for instance, all neurons in the same cortical column have similar temporal properties, including similar neural delays, this would be consistent with quadrant separability. Otherwise, cortical inputs would break quadrant separability and create a totally inseparable neuron. Total inseparability would be expected for cortical neurons in layers that receive significant input from other cortical columns or from any other neural source with significantly different temporal processing, including (but not limited to) any significant delays.
It is possible that this extremely constraining result is an anesthesia-induced effect. If not, the result is a fascinating constraint on the neural network providing input to a given cortical cell.
We particularly thank A. Saul. We also thank J. Eggermont, I. Ohzawa, and M. Slaney for very helpful and illuminating discussions.
This work was supported by Office of Naval Research Multidisciplinary University Research Initiative Grant N00014-97-1-0501, National Institute on Deafness and Other Communication Disorders Training Grant T32 DC-00046-01, and National Science Foundation Grant NSFD CD8803012.
J. Z. Simon (E-mail:).
- Copyright © 2001 The American Physiological Society