|
|
||||||||
1Center for Neuroscience and the 2Section for Neurobiology, Physiology and Behavior, University of California, Davis, California
Submitted 14 January 2005; accepted in final form 24 August 2005
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Here we describe and employ an adaptive stimulus optimization technique that manipulates the spectral composition of a stimulus, a multi-tone complex, in an attempt to maximize responses from a single neuron. It relies on feedback from the neuron to explore a multi-dimensional parameter space, searching for the best stimulus in that space. It estimates a neuron's preferred spectral input, regardless of the spectral complexity of the stimulus. It also has the advantage of significantly reducing the size of the potential parameter space because it focuses on the portion of the space that is important to the cell. In this study, adaptive stimulus optimization provided a direct and efficient way to determine the relevant stimulus features and basic functional properties for AC neurons. The results also demonstrate the tremendous potential of the adaptive stimulus optimization technique.
The resulting preferred spectra displayed a structural simplicity and consistency not previously reported for AC neurons. The spectra possessed a nearly scale-invariant, prototypical form having a relatively simple quantitative description. This structure appears well suited for identifying important spectral features and for efficiently representing the information in natural sounds.
| METHODS |
|---|
|
|
|---|
Two adult rhesus monkeys, (Macaca mulatta; 1 male, 1 female) with normal hearing, on a restricted water access protocol, were subjects. All procedures performed on the subjects conformed to the PHS policy on experimental animal care and were approved by the UC Davis animal care and use committee.
Electrophysiological recording and data acquisition
Each monkey was implanted with a head post and chronic recording chamber for access to auditory cortex. Recordings were made while the monkeys were comfortably restrained and sitting quietly in an acoustically "transparent" primate chair within a sound-attenuated, foam-lined booth (IAC: 9.5 x 10.5 x 6.5 foot). Subjects received diluted fruit juice or water intermittently. High-impedance tungsten microelectrodes (FHC) were inserted into the brain using a remotely controlled hydraulic microdrive (FHC) through guide tubes held by a plastic grid (Crist Instrument) in the recording chamber. Extracellular potentials were amplified and filtered (0.35 kHz; AM Systems 1800) and selected using a dual (amplitude-time) window discriminator (Bak RP-1). Auditory cortex was identified by single- and multiunit responses to pure-tone pips, broad- and narrow-band noise bursts, and clicks. Primary (core) auditory cortex was identifiable by the vigor and selectivity of single-unit responses to pure tones and the latency of these responses and also from the gradient of best frequency obtained along rostrocaudal and mediolateral anatomical coordinates. During adaptive optimization sessions, counts of well-isolated single-unit potentials were made during presentation of the 170-ms stimulus and for 100 ms immediately after. Experimental control and data collection and analysis was accomplished using customized C-language and Matlab (MathWorks) programs running on a personal computer.
Stimulus generation and presentation
The optimization stimulus was a multi-tone complex created by summing a large number of pure tones (Fig. 1A) with randomized phases. Each tone complex comprised either 12 or 16 tones per octave (typically 2436 tones) spaced at equal log-frequency intervals. The range of frequencies was usually three octaves, but was adjusted if needed to suit the pure-tone frequency selectivity of the cell (range: 26 octaves). An attempt was made to center the range on the neuron's preferred frequency as determined by pure-tone stimulation, although the range of frequencies able to evoke strong responses was often quite broad. The tone complex was temporally shaped with a Gaussian amplitude envelope with a width (at one-half-amplitude) of
50 ms, producing a temporal Gabor stimulus (Fig. 1A). The intensity of the stimulus was adjusted to a moderate level within the cell's best-intensity region as estimated from the initial search. The intensity range across experiments was between 27 and 66 dB SPL (mean = 46.4 dB; SD = 8.1 dB; Bruel & Kjaer 2231 meter, unfiltered calibration).
|
![]() | (1) |
i the average response over the ith set,
rmax(i) the maximum absolute difference from the mean across the ith set, vij the parameter vector representing the jth stimulus variant in the ith set, and
, a weighting factor determining the magnitude of parameter perturbation. This rule can be summarized as follows: For each iteration, it determines the differences between the response to each stimulus in the set and the average response, normalizes these differences, weights each stimulus variant by its corresponding normalized response, averages across the set of stimuli and, finally, determines the new base parameter vector by weighting the average and adding the result to the previous base parameter vector. The resulting parameter vectors were then used to synthesize a new base stimulus. The base stimulus is thereby moved along a gradient in multi-dimensional parameter space, at each step moving toward the form of the stimulus that evokes the largest response.
The base stimulus and set of randomized variants (48 stimuli) were presented in random order within blocks, several times (range: 26) on one presentation (iteration). Iterations continued until the amplitude vector (spectrum) stabilized (1 session). The experiment was then repeated for a second optimization session, starting from different initial conditions. The mean inter-stimulus interval was
1.2 s with a random uniform variation of ±0.25 s. The time required to complete one experimental session was usually
1.5 h.
For the starting base stimulus on each session, the amplitudes of all frequency components were set to the same level and all phases were randomized. For the first presentation and every iteration, a set of stimulus variants was generated by randomly perturbing the amplitudes and phases of the base tone components. The second testing session began with a different set of randomized variants than the first. The phase of each tone component was randomly advanced or delayed by 36° (or, in some experiments, 45°) from the base value. The amplitude of each frequency component was randomly increased or decreased by a constant magnitude on an iteration. The magnitude of the amplitude perturbations was gradually increased over the course of an experimental session. Typically beginning at 6 dB, amplitude perturbations were usually increased in 2-dB steps to 12 dB (step size was controlled by the weighting factor
in Eq. 1). This was done to avoid local maxima (the gradient ascended toward the stimulus evoking the largest response), and to counteract response habituation by presenting stimulus variants sufficiently different from the base.
In the initial experiments, the amplitude of each tone in the complex was independently varied. However, we found that better results could often be obtained if the perturbations initially occurred in segments or bins of adjacent frequencies, particularly if the spectral sensitivity of the cell was broad (see Fig. 2A). When employing this "coarse-search" technique, the number of frequency segments independently varied in amplitude was gradually increased during the session from a small set (e.g., 5 or 6) to the total number of frequencies comprising the stimulus (typically 2436). Prior to using the coarse search strategy, the probability of obtaining a pair of optimization sessions from a cell for which the amplitude vectors were significantly correlated was slightly less than one in two (0.44) after its implementation the odds increased to greater than one in two (0.65).
|
The stimuli were normalized with respect to digital (16-bit) signal peak amplitude. This provided a limit to overall energy level because during the optimization process the settings on the attenuators were fixed. We did a post hoc analysis of the energy level changes of the base stimuli using two techniques: we calculated the level difference (
L) in dB between each base stimulus on the first presentation and last iteration using the equation
L = 20 · log10 (A1/A2), where A1 and A2 represent the root-mean-square digitized waveform amplitudes of the base stimuli on the first presentation and last iteration, respectively, and we measured the difference in sound-pressure level between the base stimulus on the first presentation and last iteration, over all neurons and sessions. The distributions of both measures peaked near zero, with a modest variation in intensity (means:
SPL = 0.1 dB,
L = 0.6 dB; SD:
SPL = 2.3 dB,
L = 1.1 dB). There was no significant correlation between the two metrics indicating that, within the range tested here, changes in the generated signal waveforms due to stimulus optimization did not yield appreciable differences in measurable sound energy.
Data analysis
A final estimate of the preferred spectrum was obtained by averaging the amplitude vectors from the two experimental sessions, provided that the vectors were significantly correlated (Pearson r, P < 0.05, 1-tailed). As Table 1 shows, most (26) of these correlations were highly significant (P < 0.005). The preferred spectrum can be considered an estimate of the neuron's spectral receptive field in the sense that it reveals the frequencies that influence a neurons response, and because historically the term "receptive field" has referred to the stimulus space to which a neuron is sensitive (Hartline 1938
). However, recently the term has become associated with the notion of a quantitative model of a spectral or spectral-temporal receptive field (e.g., Theunissen et al. 2000
). Therefore we have used the term "preferred spectrum" in this paper to avoid confusion.
|
Widths of the preferred spectra were estimated by measuring the distance (in Hz) between the peaks and troughs of the amplitude spectrum. If the spectrum comprised a center peak surrounded by two flanking troughs, the difference between the frequencies at the upper and lower troughs was taken, provided that their level was at least one SD more negative than the vector mean amplitude (the analogous operation was performed for the cells with center troughs). If only one lower trough met this criterion, the difference between the peak and trough was taken as the spectral width. If a neuron's preferred spectrum was symmetrical, the center frequency of the receptive field (RF) corresponded to the maximum peak (or minimum trough) frequency. If the spectrum was asymmetrical, then the center frequency was defined as the geometric mean of the peak and trough frequencies.
A Gabor function and difference of Gaussians (DoG) function were fit to the amplitude spectrum obtained from averaging the vectors from both optimization sessions for each cell. The Gabor function, y = a [1/
(2
)1/2] exp[(x - µ)2/2
2] sin(
x +
), is the product of a Gaussian and a sine function, where the parameters µ and
are the center frequency and SD of the Gaussian,
and
are the frequency and phase of the sinusoid, and a is a scale factor. In the DoG function y = a/
1(2
)1/2] exp[(x µ1)2/2
12]b/
2(2
)1/2] exp[(x µ2)2/2
22, the parameters µ1 and µ2, and
1 and
2, are the center frequencies and SDs of the two Gaussians, and a and b are scale factors. The fits were performed using an iterative, nonlinear (reflective-Newton) least-squares algorithm. Care was taken to avoid local minima by performing the fit several times using different starting parameters and choosing the best fit.
For statistical evaluation of the fitted Gabor functions, the analysis was limited to that portion of the obtained amplitude vector falling within the width of the Gaussian window (equal to twice the 1/2 power width centered over the Gaussian mean). For the DoG function, the analysis was limited to the amplitude vector falling within the bounds of the upper and lower Gaussians, calculated in the same way. The r2 statistic (representing the proportion of total variance accounted for by the fitted function) was determined as a measure of the goodness of fit. An F statistic (the ratio between the variation from the dependent variable and the residual variation about the regression) was then calculated giving the statistical significance for each fitted function (the degrees of freedom were K1 and NK1, where N was the number of elements in the vector and K the number of parameters in the function) (Daniel and Wood 1980
). Twenty-three (77%) of the Gabor fits were significant at the P < 0.05 level (21 at the P < 0.025 level, and 16 at the P < 0.01 level); 24 (80%) of the DoG fits were significant at the P < 0.05 level (20 at the P < 0.025 level and 18 at the P < 0.01 level).
| RESULTS |
|---|
|
|
|---|
Optimization and convergence
The evolution of stimulus optimization is illustrated for one of these successful cases in Fig. 2A, which depicts the relative amplitude of each frequency component in the multi-tone complex at three stages in the process. The resulting amplitude vector is the neuron's preferred spectrum. It reflects the neuron's affinity (positive level) or aversion (negative level) for each frequency when simultaneously present in the stimulus.
Before proceeding further there are two obvious questions concerning the optimization process that should be addressed: does the process converge toward a global rather than a local optimum and, if so, how quickly? For the first question, we used the criterion that, for each cell, two independent sessions produce spectra that were essentially alike, i.e., significantly correlated (Pearson r), meeting a minimal criterion of at least P < 0.05 (e.g., Fig. 2B). As Table 1 shows, most of the significance values were considerably smaller than this criterion with 72% (26/36) of the correlations significant at the smallest level (P < 0.005). Figure 2C shows the probability distribution for these correlation coefficients, as well as that for the 21 neurons for which two sessions were completed but the resulting amplitude vectors were not significantly correlated (only a single session was performed for the remaining 15 units either because there was no evident change in the base spectrum or progress toward any discernable spectral pattern during this first session). As the plot shows, there was no overlap in these distributions.
To examine the rate of convergence, we tracked the similarity of spectra from successive iterations as optimization progressed. To do this, we computed the direction cosine of the angle (the normalized dot product) between the base vectors of adjacent pairs of iterations, which is equivalent to the correlation coefficient. Correlation coefficients computed from unit data for each session were plotted against iteration number and a negative exponential growth function was fit to the points. Thirty of the 36 units (83%) reaching criterion displayed at least one significant negative exponential fit (P < 0.05); the fits from both sessions were significant for 20 (56%) units. In the case of the amplitude spectra, most units displayed rapid pattern convergence as a negatively accelerated function of iteration number (Fig. 3, A and B). Greater than half of the growth functions fit to individual unit data had time constants of two iterations or less, and asymptotic values derived from these fits cluster close to one, indicating that convergence was occurring quickly on most sessions (Fig. 3, C and D).
|
Preferred spectrum measurement and structure
The plots in Fig. 2, A and B, indicate the characteristic form of the preferred spectra obtained: a circumscribed, antagonistic multi-lobed organization in which positive and negative regionssuggestive of excitation and inhibitionappear to be about balanced. Figure 4 demonstrates the variation in spectral structure found within this form, which includes spectra with centered (positive) peaks (e.g., Fig. 4, A and B) and centered (negative) troughs (e.g., Fig. 4, E and F) as well as those of intermediate symmetry. All 36 neurons exhibited a preferred spectrum having this type of basic structure.
|
|
|
The fundamental form of the spectra displayed in Fig. 4 seems quite consistent, being relatively independent of size or shifts in center frequency. We examined this consistency by measuring the spectra (e.g., histograms in Fig. 4) and their excitatory and inhibitory subfields. Spectrum widths (in kHz) are plotted against spectrum center frequency in Fig. 7A. The average change in spectrum width as a function of frequency is well described by the regression line in Fig. 7A with a slope (exponent) close to one. This shows that spectrum width and center frequency increase at roughly the same rate; that is, that the ratio of spectrum width to center frequency varies about some constant value. This means that relative spectrum size (size in octaves) remains, on average, unchanged as a function of frequency. This is illustrated in Fig. 7B, which plots spectrum width in octaves (median = 0.69) over frequency. This scaling relationship also holds well for the subfields within the spectrum as is shown in Fig. 7C. The widths of the upper and lower subfields change at nearly the same rate with subfield peak (trough) frequencies changing at about the same rate as center frequency. The plots reveal striking constancy in spectrum structure, the spacing of the subfields closely maintaining their relationship over a large frequency range. Figure 7C suggests that the upper and lower frequency subfields have roughly the same relative width. This relationship is examined in Fig. 7D, which compares the width of the upper and lower subfields of individual preferred spectra in octaves. The points cluster near the positive diagonal, with lower bands, on average, slightly larger than upper bands.
|
The structure of these preferred spectra is suggestive of the RFs found for simple cells in primary visual cortex (VI) which, like the AI preferred spectra found here, display a circumscribed, antagonistic multi-lobed organization. Two functions that have been used to provide a simple quantitative description of this sort of structure are the Gabor and DoG (Hawken and Parker 1987
; Jones and Palmer 1987
). The Gabor function has also been used to describe the temporal and spectral components (profiles) of spectrotemporal RFs (STRFs) for many neurons in the central nucleus of the inferior colliculus (Qiu et al. 2003
), though as we note in the following text, with different results than our own. To test these functions against our data, we fit a Gabor and DoG function to each neuron's optimized spectrum (Fig. 4). The fits of both functions track the large undulations in the spectra, and both are able to account for a significant proportion of the variance for a majority of the spectra [Gabor: 78% (28/36); DoG: 83% (30/36)]. Both fits also appear to provide good estimates of the extent of inhibitory and excitatory subregions (Fig. 4).
The preferred spectra resulting from adaptive stimulus optimization appear to be generally similar in shape despite large changes in their size (scale) or in center frequency (translation), that is, they seem to be scale invariant. To further quantify the structure of the spectra and substantiate this scale invariance, we examined the parameters obtained from significant fits of the Gabor and DoG functions. For the DoG, scale invariance implies that the widths of the two Gaussian functions, corresponding to excitation and inhibition, change at equal rates as a function of spectrum width. This is supported by Fig. 8A, which shows that the mean excitatory and inhibitory Gaussian widths (in kHz) change at nearly the same rate when plotted against spectrum width. In the case of the Gabor function, for scale invariance to be preserved, there should be an inverse relationship between the Gaussian width and the frequency of the sine-spectral profile function. That is, as width increases there should be a corresponding decrease in frequency, such that the number of lobes or cycles within the spectrum remains relatively unchanged. Conversely, the period of the Gabor sine function and the Gaussian width should be in direct proportion. These relationships are supported by the plots in Fig. 8, B and C. Gabor width and sine-spectral profile frequency tend to be inversely related (Fig. 8B), while width and period tend to be directly proportional (Fig. 8C). The remaining parameter defining Gabor function shape, the phase of the sine profile, determines the relative symmetry of the spectrum. A phase of 0° corresponds to a center peak with flanking troughs (e.g., Fig. 4, A and B), a phase of 180° to a center trough with flanking peaks (e.g., Fig. 4, E and F). The histogram in Fig. 8D shows that the phase parameter is about evenly distributed between 0360°, indicating that spectral troughs are about as prevalent as peaks. These results differ strikingly from Qiu et.al.'s results showing that the phase of the spectral Gabors fitted to inferior colliculus (IC) STRFs was bounded between approximately 090°. We do not yet know, of course, to what degree this difference is due to processing effects between IC and cortex or to the differences in method (adaptive search vs. linear estimation).
|
Stimulus optimization and neural responses
Because cortical neurons are stochastically nonstationary and habituate to repeated stimulation, we would not predict their responses to simply monotonically increase during the adaptive optimization process. Rather their actual behavior is an important empirical question that must be examined to understand the optimization technique. Figure 9A depicts the response dynamics for three neurons with significantly correlated final vectors. These cells' responses (spikes/trial), averaged over all stimuli (
) as a function of iteration for the first (left) and second (right) optimization sessions, illustrate the degree of response variability encountered. In most cases the base stimulus responses (
) tracked those of the mean response (Fig. 9A). This supports the notion that the stimulus setdesigned to randomly explore the multidimensional parameter space immediately surrounding the base stimulusdid so, given that this set and the base produced positively correlated responses in almost all cases.
|
To assess the general response trends during the optimization process, we performed linear regressions on the mean rate-by-iteration plots (e.g., those in Fig. 9A) for all neurons. Not surprisingly, given the degree of response variability over iterations, few of these fits were significant. For the successful optimization sessions, the correlations were about equally distributed about zero as were the correlations computed for base stimuli only. In contrast, the correlations for the remaining cases tended to be negative [median r = 0.319; Wilcoxon T(57) = 476.5; P < 0.01], indicating that overall response strength tended to decrease during sessions for neurons that did not successfully meet criterion. There was a slight but nonsignificant negative bias for the corresponding base stimulus correlations (median r = 0.098).
The fact that response rates were more likely to decrease on sessions where optimization was unsuccessful might mean that an increase in response rate contributed to success, even if it was not actually instrumental. This view is supported by examination of the response rates for neurons meeting criterion and those that did not. These are compared in Table 2, which displays the medians over sessions for the responses averaged over all stimuli and for the base stimuli alone. The response rates for successful optimization sessions tended to be larger than the unsuccessful ones for all stimuli as well as for base stimuli, although neither of these differences quite reached significance. The median maximum response rates for successful optimization sessions were significantly larger for all stimuli as well as for the base stimuli (Table 2). It is important to point out that responses measured on the first and second sessions tended to be highly significantly correlated and did not differ significantly in magnitude. The correlation was highest for the maximum responses made to base stimuli and (in the only nonsignificant case), lowest for averages over all stimuli for the unsuccessful sessions (Table 3). Despite the rather high degree of variance in neural response, then, there was a high degree of retest reliability over session in the optimization process. These results suggest that, although high response rates were not critical to successful stimulus optimization, they did play a role, perhaps in attenuating the effects of habituation.
|
|
| DISCUSSION |
|---|
|
|
|---|
Scale invariance, RF structure, and efficiency
The preferred spectra resulting from adaptive optimization have a circumscribed, antagonistic multi-lobed organization that appears scale invariant. How does this relatively simple prototypical form come about? It seems reasonable that both linear and nonlinear interactions contribute. The extent to which AC neurons summate linearly over frequency is a matter of current controversy. There is, however, strong recent evidence from studies using multi-tone complexes (Calhoun and Schreiner 1998
; Nelken et al. 1994a
) and linear RF estimation techniques (Barbour and Wang 2003
; Machens et al. 2004
; Sahani and Linden 2003
) that many AC neurons behave in a substantially nonlinear manner in response to complex spectral input. It certainly seems possible, then, that nonlinear spectral interactions played a significant role in determining the preferred spectra found here. It is also important to note that this form and organization may arise before the level of auditory cortex; recent work using reverse correlation on neurons in the IC reveals that the Gabor functions approximating their spectral RFs show a trade-off between the widths of the Gabors and their sine-profile frequency (Qiu et al. 2003
). An obvious question is: what purpose might this structure serve for audition? One possibility is that these RFs are designed to extract local features in sound spectra (deCharms et al. 1998
), such as the peaks and notches in power that are important for identifying and localizing natural sounds (Middlebrooks and Green 1991
; Reiss and Young 2005
), that is, that they might operate as "edge" and "line" detectors for a spectrographic-like sound representation. It has also been suggested (Shamma et al. 1994
) that AI RFs act as local linear filters, essentially performing a Fourier analysis of a power spectrum, analogous to the Fourier analysis on images that has been proposed for VI neurons (De Valois and De Valois 1988
).
Another possibility is that this type of structure is designed for the analysis of natural acoustic scenes. Our preferred spectra show some properties consistent with Lewicki's predictions of efficient filters for the analysis of natural sounds (Lewicki 2002
). These filters were shown to have proportional increases in bandwidth, along with decreases in temporal extent, with increasing frequencyand so represent an efficient trade-off for locating sounds in both the time and frequency domains. The preferred spectra obtained in our study obey the same relationship between bandwidth and center frequency and so may be efficiently designed for the analysis of environmental sounds.
Furthermore, natural sounds are not randomly organized but display local correlations in structure in both time and frequency (Attias and Schreiner 1997
; Hoth 1941
; Singh and Theunissen 2003
; Voss and Clark 1977
). The task of the auditory system, then, may be to decorrelate overlapping signal and background sounds. Psychophysical evidence from studies of comodulation masking release (Hall et al. 1984
), and neurophysiological evidence from the responses of cat AI cells to stimuli composed of either modulated or unmodulated background noise (Nelken et al. 1999
) support that the auditory system may be designed for this function.
A localized, antagonistic spectral organization, such as found in our study, may efficiently perform this function because it is responsive to features that stand out against a uniform background (Daugman 1989
). Interestingly, algorithms designed to optimize the efficiency of model visual RFs using natural images derive localized, antagonistic multi-lobed forms (Hyvarinen and Hoyer 2000
; Olshausen and Field 1996
). The resulting model RFs are similar in shape despite large changes in scale or translation. In the auditory system, this type of structure would permit AI cells to extract feature information from noisy backgrounds at multiple bandwidth levels or scales, with equal fidelity.
Spectral RF size and structure: comparisons with previous methods
Our results showing that preferred spectrum width scales with frequency might not seem surprising, given previous evidence that the frequency response areas (FRA) of neurons tend to broaden (in kHz) as a function of frequency in the lemniscal tonotopic pathway. However, the precise quantitative nature of this relationship has been more difficult to establish. The most common way of assessing this relation has been to measure the width of a neuron's excitatory FRA at an intensity level just above threshold (typically 10 dB above). The best frequency (BF) divided by the width (in Hz) yields the familiar Q (quality) factor. If RF width scales with frequency, then Q values should remain constant relative to frequency. This is generally not true, however, for values of Q measured at all levels of the auditory pathway from the auditory nerve to cortex. There is a tendency for Q to increase as a function of frequency indicating that excitatory FRAs are relatively narrower (in octaves) at high frequencies, at least near threshold (Aitkin and Webster 1972
; Aitkin et al. 1972
, 1975
; Batzri-Izraeli and Wollberg 1992
; Cheung et al. 2001
; Egorova et al. 2001
; Ehret and Moffat 1985
; Evans 1972
; Kiang et al. 1965
; Nuding et al. 1999
; Pelleg-Toiba and Wollberg 1989
; Phillips and Irvine 1981
; Recanzone et al. 1999
).
These results are consistent with studies reporting that AI excitatory bandwidth measured in octaves declines by about a factor of two across a 10-fold increase in BF (Evans and Whitfield 1964
; Kowalski et al. 1995
; measured 15 and 20 dB above threshold, respectively). Another study, however, has found no significant relationship between excitatory octave-bandwidth (10 and 40 dB above threshold) and BF, although this might be accounted for by the fact that only one neuron with a BF <4 kHz was included in the analysis, as the authors remark (Schreiner and Sutter 1992
). Still other studies have found a more modest though still significant (P < 0.05) decline of
0.25 octave in bandwidth (40 dB above threshold) per decade increase in frequency (Loftus and Sutter 2001
; Sutter and Loftus 2003
; unpublished findings). Overall the measurements of excitatory FRAs suggest a relative decline in octave width as a function of BF, though this result may, to at least some extent, be dependent on the type of measure used as well as other experimental variables.
Another measure of excitatory FRA width is the square-root transformation. This is defined as the difference between the square roots of the high- and low-frequency bounds of the FRA just (1020 dB) above threshold (Whitfield and Purser 1972
). For FRAs that scale with frequency, square-root transform values should be linearly related to frequency when plotted on loglog coordinates with a best-fit regression slope (exponent) of 0.5. However, measurements made from the IC, medial geniculate and AI have not revealed any particular relationship between square-root transform values and frequency (Batzri-Izraeli and Wollberg 1992
; Calford and Webster 1981
; Calford et al. 1983
; Pelleg-Toiba and Wollberg 1989
; Whitfield and Purser 1972
). We analyzed our own data using a variant of this technique, defining the upper and lower bounds of the preferred spectra as the peak and trough frequency points. A loglog plot of the square-root transform of these values against frequency revealed the best-fitting regression line to have a slope of 0.47 (P < 0.01), quite close to the predicted value of 0.5 for scale-invariant spectra.
One difference between the current and previous studies using Q and the square-root transformation is that the previous studies recorded excitatory responses to single tones, whereas the present study incorporates inhibitory areas and possible nonlinear interactions including facilitation. Several studies have examined the role of excitation and inhibition on cortical FRA width using tone-plus-tone or tone-plus-noise stimulus combinations. Some of the two-tone studies have focused on the size and shape of excitatory and inhibitory response areas and their relative symmetry, in AI. Two-tone experiments conducted in cat AI have shown most neurons to have single excitatory FRAs and a smaller number (
20%) to exhibit multi-peaked excitatory bands, primarily in dorsal AI (Sutter and Schreiner 1991
). The most common FRA type consists of a single excitatory band with two inhibitory flanking sidebands,
50% in ventral AI (Sutter et al. 1999
). In general agreement with these findings, recordings made in ferret AI using two-tone stimuli revealed most neurons to have excitatory centers with some degree of flanking inhibition (Kowalski et al. 1995
; Shamma et al. 1993
). It is interesting that the mean bandwidth of multi-peaked FRAs appears to be more nearly constant as a function of BF than excitatory FRA width, although there is considerable variability in the data (Loftus and Sutter 2001
; Sutter and Loftus 2003
; unpublished findings). This may mean that the addition of flanking inhibition (and possibly excitation) increases overall FRA width to a greater extent at high frequencies than low, perhaps serving to normalize FRA size with respect to BF.
Several other techniques have recently become popular for estimating the structure of auditory RFs. One of these employs ripple or spectral sine-profile-modulated stimuli (Green 1986
), analogous to the sine-wave gratings used in vision research. Studies using static ripple stimuli have shown that the best ripple frequencies for cortical neurons range between
14 cycles/octave (Schreiner and Calhoun 1994
) and 0.33 cycles/octave (Shamma et al. 1994
) with best-mean frequencies of 1.1 and 1.0 cycles/octave, respectively. These results correspond well to the Gabor, sine-profile frequencies fit to the spectra in our study: these range between 0.2 and 3 cycles/octave with a mean of 1.17. In the Shamma et al. study, the distribution of best sine-profile phase sensitivities was sharply peaked at about 0° with very few neurons responding maximally near 180°. This implies that most neurons had symmetric or nearly symmetric RFs with an excitatory center, a finding in agreement with two-tone studies (see preceding text). These results stand in contrast to our own, in which inhibitory-center RFs were found to be about as common as excitatory-center RFs (Fig. 8D).
The reverse correlation technique (or closely related spike-triggered average) is another approach for assessing a neuron's response to complex acoustic stimulation. It employs a spectrally complex, temporally varying stimulus (such as white noise) to determine the average stimulus preceding a neuron's spike. This technique generates a neuron's time varying, frequency-sensitive RF, or spectro-temporal RF (STRF), which provides the best linear model of a neuron's response to a dynamic stimulus. In agreement with the RF estimates made using two-tone, tone-plus-noise, and static ripple stimuli, reverse correlation STRFs obtained from AI neurons tend to show a strong excitatory region (usually in response to stimulus onset) flanked by one or more weaker inhibitory and excitatory bands (Blake and Merzenich 2002
; deCharms et al. 1998
; Depireux et al. 2001
; Miller et al. 2002
; Rutkowski et al. 2002
). In the reverse correlation STRFs, inhibitory regions of longer duration, but lesser magnitude, often follow excitatory ones.
Estimates of spectral RF width from reverse correlation vary quite a bit from study to study and seem to depend on the precise technique used. deCharms et al. (1998)
, using random-chord stimuli, found a median STRF width (for the total response area) of 1.80 octaves. This is larger than our median bandwidth of 1.29 octaves measured at twice the one-half power width of the spectral Gaussian Gabor envelope. Blake and Merzenich (2002)
, also using random-chord stimuli, found STRF width to depend on spectrotemporal tone density, with both mean excitatory and inhibitory bandwidth increasing (by 0.82 and 22.5 octaves, respectively) as tone density decreased. Miller et al. (2002)
, using dynamic ripple stimuli, found the distribution of best ripple frequencies to be very skewed toward low values (mean = 0.46; median = 0.25 cycles/octave), implying a preponderance of RFs with very broad bandwidths (50% >4 octaves). The results from Miller et al. are in contrast to those from static ripple experiments, as well as from our own study, showing that the AI preferred spectra average about one-half of this width. Most bandwidth estimates from reverse correlation, then, give larger values than our study and the more conventional techniques, though it is clear that the spectrotemporal structure of the stimulus is important in bandwidth determination.
A recent technique of estimating a static rather than temporally dynamic spectral RF is termed the "random spectral stimulus" (RSS) method, which employs stimuli similar to those in our study (Barbour and Wang 2003
; Yu and Young 2000
). Barbour and Wang used a linear RSS technique on AC neurons and revealed linear weighting functions with excitatory centers and surrounding inhibitory sidebands, the sort of RF type that predominates in other studies. They also obtained AC RFs at different mean intensities for individual neurons and found RF width to be relatively level tolerant or invariant. Our results extend this finding of structural invariance to large changes in spectral width and translation.
The weighting functions of Barbour and Wang bear some similarity to the preferred spectra in our study (though they are expressed in different units). The examples they show, however, seem to indicate a preponderance of excitatory-center/inhibitory-surround RF types, but because the authors did not present a quantitative summary of these data (the proportion of off-center neurons), comparison with our own results is difficult. The predominance of this structure would be congruent with the STRF results described above. It seems likely that adaptive search is better able to reveal inhibition because it does not rely on the elicitation of high response rates to do so, which is the case with the stimulus averaging techniques.
The similarity and scale invariance of the preferred spectra in our study are striking given the range of intensities used across experiments. It is possible that in both types of study, across-frequency interactions normalized cortical responses, producing at least two different sorts of invariance: Invariance with respect to intensity and invariance with respect to scale. Whether the preferred spectra obtained from individual neurons using adaptive stimulus optimization will display level invariant size and structure must await further work.
In summary, our findings show that average, relative AI preferred spectrum size remains constant as function of frequency and basic preferred spectrum structure has a relatively simple quantitative description with inhibition about as prominent as excitation. When considering the differences between our results and those of previous studies, it is important to bear in mind that most of these studies used cats although several different species were studied, and that most, but not all, used anesthetized subjects.
Masking, the critical band and AI preferred spectra
The "critical band"the frequency range within which sound energy is effective in perceptually masking a tonehas been extensively studied psychophysically, and investigators have looked for its neural correlate, the neural critical bandwidth (CBW). The CBWs of central IC (ICC) neurons parallel psychophysically determined CBWs when both are plotted against frequency although they are generally broader (this similarity included a leveling off of the functions for frequencies below
0.5 kHz) (Ehret and Merzenich 1985
, 1988
). The relationship between CBW and frequency appeared linear on loglog coordinates (above
0.5 kHz), and best-fitting regression lines (>0.5 kHz) had slopes ranging from 0.63 to 0.97 [mean = 0.72 ± 0.038 (SE)], indicating that bandwidth increased at a slower rate than frequency. A similar relationship between CBW and BF (slope = 0.63) was obtained for neurons in ventral and central cat AI when CBW was determined using narrowband noise maskers but not when broadband noise and the critical ratio (the noise intensity required to mask a tone divided by the tone intensity) were used (Ehret and Schreiner 1997
). The CBW measurements from ICC and AI neurons imply that their RFs tend to be relatively narrower at high frequencies when characterized using combinations of tones and noise, a result similar to that obtained from single-tone excitatory FRAs.
From their results, Ehret and his colleagues have argued that the ICC is a likely site for the neural mechanisms underlying critical-band related phenomenamore so than the cochlea. This claim is based on neural tuning properties such as bandwidth, level tolerance, and linearity, and is substantiated by the similarity between the neural and psychophysical CBW-frequency plots for the cat, which yield log-log slopes centering on 0.70.8 (Ehret and Merzenich 1985
, 1988
; Pickles 1979
; Pickles and Comis 1976
). This is evidence that neural and psychophysical CBWs are narrower at high frequencies in this species (Felis catus). In our study, the loglog plot of spectrum width and center frequency shows a best-fit slope of 0.99 (Fig. 7A), indicating that width remains relatively constant as a function of frequency. An important question is how this relationship compares with that of the behaviorally measured critical band and frequency in macaques. When plotted against frequency on log-log axes, critical-band measures in pig-tailed macaques (Macaca nemistrina) reveal a slope of 1.01 (Gourevitch 1970
).2 (Such data are not available for rhesus macaques.) The pig-tailed macaque CBWs are comparable to those of the human within the frequency range tested (Fay 1988
; Zwicker et al. 1957
), suggesting that psychophysical CBWs may be similar across primates. The macaque and human psychophysical CBWs are plotted as a function of frequency in Fig. 10, along with the preferred spectrum widths found in our study. The neural bandwidths tend to be broader with the behavioral CBWs arrayed along the lower bound of the scatterplot, results that are similar to those found with ICC CBWs in the cat. We do not mean to imply with these data that AI neurons are solely or principally responsible for critical-band phenomena in macaques or humans. The results do, however, parallel the relationship between estimated RF size and behavioral and neural masking in the cat, and so are also consistent with the structure of the preferred spectra found in our study relating to frequency selectivity or analysis.
|
As noted in the preceding text, studies using single- and two-tone stimuli have demonstrated that some frequency-response areas in AI have multiple excitatory bands separated by an inhibitory band (Abeles and Goldstein 1972
; Sutter and Schreiner 1991
). Our adaptive optimization results also revealed this type of spectral sensitivity in neurons with
180° sine-phase, i.e., those having a center trough with surrounding peaks (Fig. 6D). Studies using two-tone stimuli to reveal inhibitory bands also show some very complex multi-lobed AI FRAs (Sutter and Loftus 2003
; Sutter et al. 1999
), a form that is not obvious in our results. It may be that the central two or three lobes of the RF are those that overwhelmingly affect a neuron's response during the adaptive search process. It is also possible that the increased number and/or range of possible frequency interactions available in our broadband stimuli yield a simpler spectral form; i.e., that across-frequency interactions are critical in creating this form.
Although both the Gabor and DoG models described the basic structure of the preferred spectra, they did not always capture their finer details (it should be noted that the DoG fit can, at most, capture 1 central and 2 surround bands). For instance, the spectrum peaks and troughs sometimes under- or overshoot the fitted functions (e.g., Fig. 4, G, I, and J). Also, the functions sometimes failed to account for variations at the fringes of the preferred spectra (e.g., Fig. 4, B, D, and F). The alternating pattern of these outlying variations suggests that they are genuine low-amplitude skirts of the preferred spectra. These patterns are consistent with two-tone studies demonstrating more complex multi-banded antagonistic regions in AI FRAs (Sutter et al. 1999
). Despite these limitations, both functions appear to provide reasonable simplifying formal descriptions of the largest components of AI preferred spectra.
Response variability and stimulus optimization
Gradient optimization methods might seem unsuitable for single units in cortex because of response variability. Our positive results, and the success of a similar procedure for VI neurons (Foldiak 2001
), counter this argument. We were able to obtain two independent, convergent optimization runs for one-half of the 72 single units we tested.
Other adaptive search techniques have been used in the auditory system with some success. Nelken and colleagues (1994b)
applied a stimulus optimization technique using the simplex algorithm, maximizing the (filtered and summed) activity of clusters of units in auditory cortex. The parameters to be optimized in their study were the frequencies, but not amplitudes, of pure and complex tones, where the complex tones were the sum of two, four, or nine pure tones. In the successful optimization cases, they obtained stimuli the tone elements of which (from different optimization runs) were close in frequency and found that optimization improved with the more complex stimuli. They concluded that the location of a spectral peak in a sound, or the locations of pairs of peaks, is an important parameter for cortical sound