|
|
||||||||
1 Keck Center for Integrative Neuroscience, University of California, San Francisco, California 94143; 2 Department of Otolaryngology, University of California, San Francisco, California 94143; 3 Department of Physiology, University of California, San Francisco, California 94143; 4 Department of Sloan-Swartz Center for Theoretical Neurobiology, University of California, San Francisco, California 94143; 5 Gatsby Computational Neuroscience Unit, University College, London, WC1N 3AR, United Kingdom
Submitted 3 September 2002; accepted in final form 13 June 2003
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Previous studies have distinguished at least five auditory fields in mouse cortex (Stiebler et al. 1997
) on the basis of tonotopy, spontaneous activity, and general characteristics of responses to tones, noise bursts, and frequency sweeps. These fields include the primary auditory field (AI), the anterior auditory field (AAF), the ultrasonic field (UF, which may be a continuation of AI and/or AAF into ultrasonic frequency sensitivities) (Hofstetter and Ehret 1992
), and two higher-order auditory areas, the secondary auditory field (AII) and the dorsoposterior field (DP). The two tonotopic areas AI and AAF appear to be organized as in other mammals with a reversal of tonotopy across the high-frequency border between the two fields. However, except for one very brief report (Shen et al. 1999
), there is no published data on the structure of auditory receptive fields in these areas. Better descriptions of auditory receptive field structure in mouse AI and AAF are a prerequisite for understanding cortical processing of auditory information in the mouse.
With that goal in mind, we here describe the auditory receptive fields of neurons recorded extracellularly in thalamorecipient layers of mouse AI and AAF. At each recording site, we employed two different response characterization strategies. Conventional frequency-intensity tuning curves and poststimulus time histograms (PSTHs) were determined from multiunit responses to isolated tone bursts to identify auditory fields and to assess the properties of multiunit activity at each site in a manner consistent with many previous studies of auditory cortex responses in other species. Then spectrotemporal receptive fields (STRFs) were estimated from single-unit or unit-cluster responses to dynamic random chord stimuli at the same recording sites, to characterize the receptive fields of single neurons or small numbers of neurons in greater spectrotemporal detail. Similar spectrographic reverse-correlation methods have previously been used to characterize STRFs in the auditory cortex of the guinea pig (Rutkowski et al. 2002
), cat (Miller et al. 2001
, 2002
; Schnupp et al. 2001
), ferret (Depireux et al. 2001
; Kowalski et al. 1996a
), and monkey (deCharms et al. 1998
) and also to describe the properties of auditory neurons in other brain areas and other species (e.g., Eggermont et al. 1983a
; Escabi and Schreiner 2002
; Keller and Takahashi 2000
; Qiu et al. 2003
; Sen et al. 2001
; Theunissen et al. 2000
; for a review, see Eggermont et al. 1983b
).
These complementary analyses provide a detailed picture of the spectral, temporal, and spectrotemporal properties of neuronal responses in mouse auditory cortex. They reveal that several features of mouse auditory responses are common to both areas AI and AAF, including very short minimum response latencies, broad spectral tuning, and, for nearly one-quarter of the neurons, significant spectrotemporal inseparability (i.e., interaction between spectral and temporal sensitivities). Areas AI and AAF differ, however, in the distributions of both first-spike latencies and peak latencies and the durations of responses and of receptive fields. These results are consistent with findings from similar studies in other species and suggest that although neurons in areas AI and AAF share many response characteristics, AAF may be specialized for processing faster temporal modulations. This detailed characterization of mouse auditory receptive fields now provides a foundation for future studies of auditory cortical processing and plasticity in mice.
| METHODS |
|---|
|
|
|---|
Twelve adult CBA/CaJ mice (615 wk old) were used in this study. The CBA/CaJ strain was chosen because mice of this inbred strain have excellent hearing with minimal age-related hearing loss (Zheng et al. 1999
) and are often used as a standard in studies of mouse auditory physiology and behavior (e.g., Willott et al. 1993
, 2000
).
Surgical procedures
All surgical procedures conformed to protocols approved by the University of California at San Francisco's Committee on Animal Research and were in accordance with federal guidelines for care and use of animals in research. Mice were anesthetized and maintained at a surgical plane of anesthesia with ketamine and medetomidine. Dexamethasone was administered to control edema, atropine or glycopyrrolate to minimize respiratory secretions, and Ringer solution or saline to ensure adequate hydration. Heart rate, respiration rate, and temperature were monitored throughout each experiment; temperature was maintained near 37.5°C with a rectal probe and a homeothermic blanket system (Harvard Instruments). Tracheotomies were performed on some of the mice to allow for artificial respiration with a pressure-controlled ventilator (Kent Scientific).
Once anesthetized and prepared for surgery, each mouse was placed in a nose clamp to immobilize its head. Topical anesthetics were applied to the scalp, and the skin transected to expose the skull. A hand drill and scalpel were then used to remove a section of bone over the left auditory cortex. Silicone oil was applied to the dural surface to keep the exposed cortex moist, and electrode penetrations were made through the dura.
Recording procedures
All experiments were conducted in a sound-shielded anechoic chamber (Industrial Acoustics). Auditory stimuli were delivered from two free-field speakers (Dynaudio and Ultrasound Advice), one covering the low-frequency portion of the mouse hearing range and another covering the ultrasound range. Animals were positioned with the right ear near the opening of an acoustical horn (Ultrasound Advice) and a sound-attenuating plug in the left ear so that the free-field stimuli were presented monoaurally to the right ear. Acoustic calibration was performed before every experiment with a 1/4-in Bruel and Kjaer microphone placed at the opening of the acoustical horn so that the speaker output could be corrected to ensure a flat frequency response (±2 dB SPL) and <2% total harmonic distortion over the appropriate frequency range for each speaker.
Epoxylite-coated tungsten electrodes (14 M
impedance; Fred Haer and Co.) were introduced into the left auditory cortex in penetrations orthogonal to the cortical surface. Recordings targeted thalamorecipient layers III/IV (Smith and Populin 2001
) by cortical depth (350600 µm below the dural surface) and by the polarity and size of stimulus-evoked local field potentials. These criteria were previously validated in separate histological studies (data not shown). Neuronal responses to noise bursts, frequency sweeps, and repeated tone bursts were then examined to determine the location of each recording in the mouse auditory cortex map (Fig. 1), according to criteria described by Stiebler et al. (1997
). Neurons in areas AI and AAF were identified by their nonadapting responses to repeated tone bursts at interstimulus intervals of 400500 ms and by a reversal of tonotopy along the rostral-caudal axis. Neurons in area UF were distinguished by strong tuning to ultrasonic frequencies and sensitivity to frequency sweeps. Area AII was characterized by its fractured tonotopy and by strongly habituating neuronal responses to repeated tone bursts. Neurons in area DP were identified by their bursty spontaneous activity and persistent locking to repeated stimuli.
|
The auditory cortex was not mapped extensively in these experiments but only enough to identify the likely locations of AI and AAF. At recording sites judged to be in thalamorecipient layers of these two auditory fields, neuronal responses were then characterized in detail. This characterization proceeded in two stages. In the first stage, simple tonal stimuli (see following text) were presented with the intensity and frequency of the tone bursts varying pseudorandomly over either a low- or a high-frequency range. Because different speakers were required to cover the two frequency rangesand because simultaneous playback from both speakers was not feasible for technical reasonsresponses to tonal stimuli in each frequency range were recorded separately. Electrode signals were amplified (Axon Instruments), band-pass filtered between 300 and 6,000 Hz (Stewart) and then thresholded in software (Brainware, Tucker-Davis Technologies) to extract the times of neuronal action potentials. Thresholds were set above the level of noise in the recording but usually low enough to include spikes of varying amplitude; therefore the recorded responses to tonal stimuli typically represented multiunit activity.
In the second stage of the characterization, repeated trials of the dynamic random chord stimuli (see following text) were played continuously for many minutes. Like the tonal stimuli, these complex stimuli spanned either a low- or a high-frequency range; the two types of stimuli were presented sequentially at each recording site (separated in time by at least a few minutes). Neuronal responses to dynamic random chord stimuli were recorded with minimal band-pass filtering (10010,000 Hz), and the amplified electrode signals were sampled continuously (National Instruments) for further processing off-line. The recordings were later band-pass filtered in software (3006,000 Hz, 5th-order Butterworth filters) and then analyzed using Bayesian spike-sorting techniques (Lewicki 1994
) (user interface software by M. Kvale, UCSF) to extract responses from single units or small clusters of neurons. These single-unit or unit-cluster responses were used to estimate STRFs (see following text). Because multiple single units or distinct neuronal clusters could often be obtained from spike sorting of the continuous recordings, the total number of STRFs included in the analyses of responses to dynamic random chord stimuli differed from the number of recording sites included in analyses of responses to tonal stimuli.
Tonal stimuli
Tonal stimuli consisted of 60-ms tone bursts, ramped up and down with 5-ms cosine gates. The frequency and intensity of each tone burst varied pseudorandomly over the range of possible values in the stimulus set. Frequencies covered 232 kHz for the low-frequency stimulus set and 25100 kHz for the high-frequency stimulus set, in 1/10-octave increments. Intensities ranged from 0 to 70 dB SPL in 5-dB increments. Each of the possible frequency-intensity combinations was presented only once per stimulus set. The time interval between successive tone bursts was
400 ms.
Analysis of responses to tonal stimuli
Spike times collected during presentation of tonal stimuli were analyzed off-line using interactive Matlab software (Ben Bonham, UCSF). The analysis proceeded as follows. First, rastergrams in which trials were grouped by tone frequency and/or by tone intensity were examined to identify a time window that appeared to encompass the maximal stimulus-evoked response. This time window, which was determined by eye based on the concentration of spikes in the rastergrams, averaged 37 ms in length (range: 1470 ms) across all files analyzed. Spike counts within this time window were then plotted as a function of tone intensity and frequency, and the outline of the frequency-intensity tuning curve was estimated by eye. Although responses to both low- and high-frequency stimulus sets were collected at each recording site, the stimulus-evoked response and frequency-intensity tuning curve usually fell mostly within one frequency range or the other, and so further analyses for each site were performed only on the recording in the appropriate stimulus range.
Response characteristics were subsequently defined with reference to this frequency-intensity tuning curve. The threshold was chosen to be the minimum stimulus intensity included in the frequency-intensity curve, and the characteristic frequency (CF) was the tone frequency that evoked a response at threshold. Bandwidth at 10 dB above threshold was the frequency width of the tuning curve at that intensity level; normalized BW10 was then defined to be this bandwidth normalized by the CF (i.e., the inverse of Q10). The first-spike latency was chosen to be the apparent asymptote in a plot of response latency versus increasing stimulus intensity for tone burst frequencies within 1/10 octave of the CF. Finally, the response duration was defined as the time from the start of the response (defined by the first-spike latency) to the end of the peak in the PSTH formed by averaging responses to the subset of stimuli that fell within the frequency-intensity range of the tuning curve. More precisely, the end of this PSTH peak was usually defined as the time at which the spike rate fell to within 1 SD of the mean spontaneous firing rate; this time value was occasionally corrected if it did not correspond well to the apparent end of the response peak that would have been chosen by eye. This method for determining response duration thus produced a measure that represented the time from the beginning of the earliest response to the end of the longest response within the frequency-intensity tuning curve. To address the possibility that this measure of response duration could be biased by large variations in response latency or duration for different stimuli within the frequency-intensity tuning curve, the data were also analyzed using a much more restricted, threshold-dependent measure of response duration. This alternative measure of response duration was based on a PSTH constructed only from responses to stimuli with frequencies within 1/10 octave of the CF and amplitudes within 10 dB of threshold, and was defined as the width of the first peak in the restricted PSTH that exceeded the mean spontaneous firing rate by 1 SD.
Dynamic random chord stimuli
The dynamic random chord stimuli used in these experiments were similar to those used in previous studies (deCharms et al. 1998
; Rutkowski et al. 2002
; Schnupp et al. 2001
) except that the intensity of component tone pulses was variable. A schematic spectrographic representation of the stimulus is shown in Fig. 2A. Tone pulses were 20 ms in length, ramped up and down with 5-ms cosine gates. The times, frequencies, and sound intensities of all tone pulses were chosen randomly and independently within the discretizations of those variables (20-ms bins in time, 1/12-octave bins covering either 232 or 25100 kHz in frequency and 5-dB-SPL bins covering 2570 dB SPL in sound level). At any time point, the stimulus averaged two tone pulses per octave, with an expected loudness of 73 dB SPL for the low-frequency stimulus and 70 dB SPL for the high-frequency stimulus. Each stimulus trial was 60 s in duration and was repeated either 20 times (for the low-frequency stimulus) or 10 times (for the high-frequency stimulus) at each site. Sound presentation from one trial to the next was continuous with no inter-trial interval; thus the total duration of playback was 20 min for the low-frequency stimulus and 10 min for the high-frequency stimulus.
|
Analysis of responses to dynamic random chord stimuli
STRF ESTIMATION. Responses to dynamic random chord stimuli that exhibited extreme instability in total spike count over the 10- or 20-min stimulus presentation time were discarded from the data set before STRF estimation. Because minor instability in a long recording cannot easily be distinguished from true neuronal response variability, we chose as conservative a threshold as possible for rejection of recordings. The criterion for rejecting a recording was that the Fano factor (variance divided by mean) for the total number of spikes per 60-s trial had to exceed 200; that is, the trial-to-trial spike-count variance had to be >200 times greater than would be expected from a Poisson process. We considered this criterion to be an appropriately conservative threshold for data rejection because Fano factors typically reported for cortex are at least an order of magnitude smaller than this threshold (Kisley and Gerstein 1999
; Tolhurst et al. 1983
) and because the few recordings with Fano factors >200 tended to exhibit smooth variations in firing rate over the 10- to 20-min recording time that seemed very likely to be related to changes in the animal's anesthetic state. Only 7 of 198 STRF recordings (3.5%) were rejected for exceeding the Fano factor threshold of 200, and the majority of the remaining recordings had Fano factors <10.
Histograms of the neuronal responses to dynamic random chord stimuli were constructed by collecting spikes from all 1020 trials of the sorted spike trains (e.g., Fig. 2B) into 20-ms bins aligned with the tone-pulse components of the stimuli (e.g., Fig. 2C). STRFs for each recording site were then estimated as explained in the following text, using the histogrammed responses and a spectrographic representation of the stimulus in terms of tone-pulse times, frequencies, and amplitudes. Thus both stimulus and response were treated as discrete-time processes, with the time step (20 ms) given by the duration of tone-pulse components of the dynamic random chord stimuli. The strong autocorrelations present in the stimuli at all shorter time scales made STRFs estimated with finer temporal precision noisier than STRFs estimated with this 20-ms time step. Our Bayesian estimation techniques (see following text) were designed to reduce noise in STRF models by eliminating noisier modes of the models and smoothing the reliable modes; since the reliable modes were those at a 20-ms time scale, it was not productive to estimate the STRF with a finer temporal precision than 20 ms.
Conceptually, the STRF estimation procedure used here was similar to reverse-correlation on the response histogram or, more precisely, to computation of an optimally smoothed and de-noised, autocorrelation-corrected, spike-count-weighted average of 300-ms stimulus segments preceding each 20-ms bin in the histogram. Mathematical details of our STRF estimation method, called automatic smoothness determination and relevance determination (ASD/RD), are given elsewhere (Sahani and Linden 2003a
) and are described only briefly here. Bayesian techniques were first used to derive optimal spectral and temporal smoothing and scale parameters for each recording. The STRF was then estimated by maximum a posteriori linear regression between the response histogram and stimulus, using the previously determined optimal smoothing and scale parameters to set the prior distribution on the weights. Linear regression with no smoothing or scaling prior yields exactly the same result as the discrete-time Wiener filter usually associated with reverse correlation (Aertsen and Johannesma 1981
). When combined with Bayesian smoothing and scale selection as done here, regression yields STRF estimates that are better able to predict neuronal responses to novel data than the conventional Wiener filter (Sahani and Linden 2003a
). Thus this procedure provides a better estimate of the true spectrogram-linear component of the neuronal response function than does the Wiener filter, which overfits more severely to the noise inevitably present in the limited available data. Other researchers have previously used different methods (for example, elimination of singular-value decomposition components of the Wiener filter by cross-validation) (Theunissen et al. 2001
) to pursue the same goal of obtaining improved STRF estimates.
Predicted neuronal responses were computed from each STRF by convolving the STRF with a time-frequency representation of the stimulus. Specifically, the predicted firing rate for each 20-ms time bin of the response histogram was the dot-product between the 300-ms-long STRF and the time-frequency representation of tone amplitudes in the 300-ms stimulus segment preceding that time bin. The prediction error for the STRF was then determined by comparing predicted and measured responses to novel segments of the dynamic random chord stimulus (i.e., stimulus segments that were not used for STRF estimation). The average prediction error was estimated by cross-validation, a standard statistical procedure (Duda and Hart 1973
) performed as follows. Each stimulus trial was divided 10 times into a training segment and a test segment (9/10 and 1/10 of the 60-s stimulus trial length, respectively), such that the 10 test segments were all disjoint. Ten STRF estimates were then obtained from the histogrammed neuronal responses to each of the 10 different training segments. For each of these 10 STRFs, we calculated the mean squared error between the histogrammed response to the test segment that would be predicted based on the STRF and the corresponding histogram of actual measured responses to the test segment. These mean squared errors for each of the ten data subdivisions were then averaged together to yield the final estimated prediction error. A standard error on the estimated prediction error was also obtained, by dividing the SD of the mean squared error estimates for the 10 data subdivisions by the square root of the number of data subdivisions. Recordings for which the STRF prediction error was
2 SEs smaller than could be achieved by simply predicting a constant mean response were deemed predictive STRFs. Only these predictive STRFs (114 of 191 STRFs; see RESULTS) were included in the final analysis of STRF characteristics.
There are two reasons why some recordings might have failed to yield predictive STRFs. The responses of the neurons might have been largely nonlinear in the spectrogram of the stimulus and therefore not predictable from the spectrogram-linear STRFs. Alternatively (or additionally), the signal-to-noise levels in those recordings might have been too low for the predictive capabilities of the STRFs to be detected. In work described elsewhere (Sahani and Linden 2003b
), we have shown that rodent auditory cortex responses to dynamic random chord stimuli, including the mouse auditory cortex responses considered in the present paper, are significantly nonlinear. Consistent with this observation, we found that factors associated with low signal-to-noise levels, such as low spike count and high trial-to-trial variability, could account only in part for failures of recordings to yield predictive STRFs.
STRF CHARACTERIZATION. Analysis of various features of STRFs, including temporal and spectral profiles and spectrotemporal inseparability (see following text), was based on mathematical decomposition of the STRF matrix. The STRF can be viewed as a matrix of weights R in time-frequency space with each row corresponding to a single frequency and each column to a single time lag (Fig. 3A). These weights correspond mathematically to the regression coefficients from the STRF estimation and conceptually to the sensitivity of the neuron to different frequencies at different times preceding a spike. Like any other matrix, the STRF matrix can be decomposed into a series of uncorrelated components, each consisting of a single vector along one dimension (time) and a single vector along the other (frequency). These components can be recombined completely to obtain the original matrix or recombined in subsets to yield various approximations to the original matrix. The process of finding these components for rectangular matrices (such as the 15 time bin by 24 or 48 frequency bin STRFs considered here) is called singular value decomposition (SVD).
|
The SVD of the time-frequency STRF matrix was calculated as R = USV, with the rows and columns of the factor matrices U, S, and V arranged so that the singular values along the diagonal of S appeared in decreasing order. In this representation, the product of the first column of U with the first row of V, scaled by the first singular value, gives the spectrotemporally separable matrix (or separable model) that best approximates the full STRF matrix R in the least-squares sense. Thus the first column of U and first row of V may be taken to represent the spectral profile and temporal profile for the full STRF (illustrated in Fig. 3A as curves along the left and top edges of the STRF). Like the comparable approach used in Depireux et al. (2001
), this procedure is preferable to averaging across time or frequency to obtain spectral or temporal profiles, because such averages would be confounded by reversals in tuning polarity along the averaged dimension (such as the alternation of dark and light regions along the time dimension in Fig. 3A); it is also preferable to analyzing time or frequency slices through the peak in the STRF, because peak-dependent definitions of spectral or temporal profiles would be more sensitive to noise in the STRF estimation procedure.
Temporal and spectral properties of the receptive fields were defined based on the temporal and spectral profiles as illustrated in Fig. 3A. The peak latency was the time to the center of the peak in the first subfield of the receptive field (usually an excitatory subfield, but occasionally an inhibitory subfield). Excitatory subfield duration was defined to be the width at half-maximum of the positive peak in temporal profile, and inhibitory subfield duration was defined to be the width at half-minimum of the negative peak in the temporal profile. The receptive-field duration was then defined as the time from the beginning of the first subfield to the end of the last subfield, a measure insensitive to the ordering of excitatory and inhibitory subfields. In the spectral profile, the best frequency was the frequency corresponding to the absolute maximum (larger of positive or negative peaks) in the spectral profile. The bandwidth was then defined to be the width at half-height of that spectral peak, and the normalized bandwidth was this bandwidth normalized by the best frequency.
STRF INSEPARABILITY. Time-frequency inseparability of the STRF was assessed by comparing the predictive capabilities of rank 1 (separable) and rank 3 (inseparable) SVD approximations to the full STRF matrix. As noted in the preceding text, the best separable (i.e., rank 1) approximation to the full STRF R is the matrix formed by the product of the first column of U, first element of S, and first row of V. More generally, the matrix formed from the product of the first n columns of U, first n x n block of S and first n rows of V gives the best rank n approximation to R in the least-squares sense (Fig. 3B). Higher values of n will produce increasingly accurate approximations to R; however, higher-rank models often performed poorly at predicting responses to novel stimulus segments, indicating greater overfitting to noise. We chose a rank of 3 to optimize the trade-off between minimizing the model rank and capturing potentially important inseparable structure in the full STRFs, on the basis of two observations. First, when we used a Bayesian de-noising technique related to our STRF estimation method (Sahani and Linden 2003a
) to choose the number of relevant SVD components for each STRF, we found that the distribution of resulting ranks had a mode of 3. Second, we noticed that STRF approximations of rank 3 were usually qualitatively indistinguishable from full-rank STRFs; that is, the rank 3 approximations generally captured what appeared by eye to be structure rather than noise in the full STRFs.
Predictive capabilities of the separable and rank 3 inseparable models for each STRF were determined by a cross-validation procedure like that described in the previous section on STRF estimation. We divided the data into a training segment and a test segment (9/10 and 1/10 of each stimulus trial, respectively), estimated a full STRF from the training segment, and computed the SVD of this STRF to obtain its separable and rank 3 inseparable approximations. Then we predicted neuronal responses in the test segment of the data using both the separable and rank 3 inseparable models, calculated the error in these predictions as described previously, and computed the difference in error between the two predictions. Repeating this procedure for each of 10 disjoint divisions of the data into a training segment and a test segment, we were able to obtain both an estimate of the average difference in prediction error between the two models, and a SE on this estimate. If the average difference in prediction error between the separable model and rank 3 inseparable model was
2 SEs greater than zero, then the STRF was declared to be significantly inseparable. In other words, an inseparable STRF was one for which the rank 3 inseparable model predicted responses to test data significantly more accurately than did the separable model. Because the null hypothesis was that the STRF was separable, this approach gave us a conservative test for inseparability. STRFs that were judged to be not significantly inseparable might either be truly separable or else simply not distinguishable from separable given our limited data.
To obtain the prediction-based inseparability index used in population data plots, the average difference in prediction error between the separable model and rank 3 inseparable model was normalized by an estimate of the total predictable stimulus-related power in the neuronal response [the "signal power"; see Sahani and Linden (2003b
) for details on the derivation of this quantity]. Thus a value of 0.1 for the prediction-based inseparability index would mean that the inseparable model predicted 10% more of the stimulus-related power in the neuronal response than the separable model. [We have shown elsewhere (Sahani and Linden 2003b
) that rodent auditory cortical responses to dynamic random chord stimuli are so nonlinear that linear STRF models typically capture no more than half of the stimulus-related response power; therefore a change of 10% could represent a substantial improvement in response prediction.] For comparison with previous studies, we also quantified inseparability with a SVD-based inseparability index previously used for characterizing STRFs from ferret auditory cortex (Depireux et al. 2001
) and related to similar measures used in other studies (e.g., Sen et al. 2001
). This index, called
SVD in Depireux et al. (2001
), quantifies the concentration of power in higher singular values of the STRF matrix SVD
![]() |
Population analyses
Population distributions were compared using the two-sample Kolmogorov-Smirnov test, a nonparametric test of the null hypothesis that two distributions are similar (Lindgren 1993
). Correlations between measured population variables were quantified with the nonparametric Spearman rank correlation test; population fractions were compared with Fisher's exact test; and differences in population means were assessed with the two-sample Student's t-test (Lindgren 1993
; Zar 1996
). The t-test results on means are reported in preference to nonparametric Wilcoxon rank-sum test results on medians because the two-sample t-test is more robust than the Wilcoxon rank-sum test to violations of the assumption that the two distributions under consideration have the same shape, and because the Gaussian approximation to the posterior mean distribution inherent in the t-test was not unreasonable given the sample sizes. However, in all cases in which a significant difference in population means is reported, the difference in population medians was also significant according to the Wilcoxon rank-sum test.
Results of all statistical tests were deemed significant if the null hypothesis was rejected at a significance level of 0.05. Throughout the text, "K-S test" is used as an abbreviation for "two-sample Kolmogorov-Smirnov test," and "t-test" implies "two-sample Student's t-test." Test statistic values are reported as Dn for the K-S test and tn for the t-test, where n is the number of degrees of freedom, rs is the Spearman rank correlation coefficient. All tests are identified as 1- or 2-tailed in the text as appropriate for the alternative hypothesis being tested.
| RESULTS |
|---|
|
|
|---|
Neuronal responses to prolonged dynamic random chord stimuli and simple tone bursts were recorded at 35 AI sites and 31 AAF sites. Multiunit responses to tone bursts, used to determine the position of each recording site in the AIAAF tonotopy, revealed no significant differences between the distributions of characteristic frequencies or response thresholds for AI and AAF recording sites (2-tailed K-S tests and t-tests). As shown in Fig. 4, the CF values for AI sites ranged from 6 to 40 kHz and thresholds varied from 4 to 36 dB SPL; for AAF sites, CFs were 1035 kHz and thresholds 439 dB SPL.
|
Off-line spike sorting and analysis of electrode signals continuously recorded during presentation of low- and high-frequency dynamic random chord stimuli at each site yielded a total of 191 STRFs. There were no significant differences observed between AI and AAF responses to dynamic random chord stimuli in either noise level [see Sahani and Linden (2003b
) for details on the noise power calculation] or stability of firing rate across repeated trials (quantified as the inverse Fano factor). Moreover, applying analysis techniques described at length in Sahani and Linden (2003b
), we found no significant differences in the goodness-of-fit of linear STRFs for AI versus AAF, nor for low- versus high-frequency recordings. Of the 191 STRF recordings analyzed, 114 (60 from AI and 54 from AAF) proved to be predictive; that is, these STRFs could be used to obtain significantly more accurate predictions of neuronal responses to novel stimulus segments than could be achieved based on knowledge of the mean firing rate alone (see METHODS). These 114 predictive STRFs form the database for all further STRF analyses presented in RESULTS.
Because responses to low- and high-frequency dynamic random chord stimuli were recorded and analyzed separately at each site and because an average of 1.5 distinct single units or small neuronal clusters could be extracted from each recording by spike-sorting, many recording sites produced multiple predictive STRFs. STRFs in the same frequency range recorded at the same site could be assumed to have arisen from different neurons or distinct neuronal clusters because the spike waveforms had been discriminated during spike sorting. However, because responses to low- and high-frequency dynamic random chord stimuli were recorded and analyzed separately, low- and high-frequency STRFs obtained from the same recording site might have arisen either from distinct neurons or neuronal clusters or from the same neuron or neuronal cluster responding to both low- and high-frequency stimuli. Definitive identification of the same spike waveform in separate low- and high-frequency recordings was often not possible; therefore, we pooled data from all predictive STRFs in the analyses shown here, regardless of stimulus frequency range or site of recording. To address the possibility that our results could have been affected by the inclusion in the database of separate high- and low-frequency STRFs for the same neurons, we also re-analyzed all the data using only low-frequency STRFs, which formed the majority (71%) of the predictive STRFs. All results obtained using a database restricted to low-frequency STRFs were similar to those reported in the following text for the database pooling high- and low-frequency STRFs.
Data from single-unit and cluster recordings were also pooled in all STRF analyses. Among the recordings in the STRF database, 40% from AI and 15% from AAF appeared to be single units based on the combined results of Bayesian spike-sorting (Lewicki 1994
) and visual inspection of spike waveforms chosen randomly from throughout each 10- or 20-min recording. No significant differences between STRFs derived from single-unit recordings and STRFs obtained from cluster recordings were observed for any of the receptive-field parameters examined here. Figure 5 displays six representative mouse STRFs from AI and AAF and illustrates the similarity of single-unit and cluster STRFs; there are no clear differences between the STRFs derived from single-unit recordings in Fig. 5, B and E, and the cluster STRFs shown in the other panels of the figure.
|
STRF temporal structure
The examples of AI and AAF STRFs shown in Fig. 5 demonstrate the observed temporal differences between AI and AAF. All six STRFs in the figure suggest tuning to sound onsets in a similar frequency range, but the temporal structure of the AI STRFs (Fig. 5, A, C, and E) differs from that of the AAF STRFs (Fig. 5, B, D, and F) in two ways. First, the peaks of the excitatory subfields (lightest regions) are shifted farther left in the AI than in the AAF STRFs. This shift indicates that there was a longer delay for the AI neurons than for the AAF neurons between the moment at which a preferred stimulus occurred, and the time at which the neuron most reliably fired a spike; in other words, the STRF peak latency was longer for these AI neurons than for the AAF neurons. Second, the combined excitatory and inhibitory subfields (light and dark regions, respectively) seem to extend across more time bins in the AI than in the AAF STRFs. This elongation implies that the preferred stimuli (or more precisely, the stimuli that would best activate the linear filter approximation to the true neuronal response function) for the AI neurons were more slowly modulated in amplitude than the preferred stimuli for the AAF neurons, or equivalently, that excitatory/inhibitory receptivefield duration was longer for the AI than the AAF example neurons.
These observations hold across the database of AI and AAF STRFs, as shown in Fig. 6. Distributions of STRF peak latencies were different for AI and AAF (Fig. 6A); the AI distribution was significantly shifted to larger values (1-tailed K-S test, P < 0.0005, D114 = 0.39), and the mean peak latency was 17 ms longer for AI than for AAF (mean ± SE, 44 ± 2 ms for AI and 27 ± 2 ms for AAF; 1-tailed t-test, P < 0.0001, t112 = 5.87). Similarly, despite their considerable overlap, the distributions of AI and AAF receptive-field durations differed (Fig. 6B), again with a significant shift in the AI distribution toward larger values (1-tailed K-S test, P < 0.0001, D114 = 0.39). Mean receptive-field duration was nearly 30 ms longer for AI than for AAF (135 ± 5 ms for AI and 108 ± 4 ms for AAF; 1-tailed t-test, P < 0.0001, t112 = 3.94). Excitatory subfields appeared to contribute more to this difference in receptive-field duration than inhibitory subfields, but both excitatory and inhibitory subfield durations were significantly longer in AI than in AAF (1-tailed K-S tests and t-tests, P < 0.05 in all cases; subfield data not shown). Thus both peak latency and receptive-field duration were longer for STRFs in AI than in AAF, and the difference in receptive-field duration involved both excitatory and inhibitory subfields.
|
As would be expected given these differences between AI and AAF, peak latency and receptive-field duration were significantly correlated across the entire STRF database; the correlation was strongest for peak latency and excitatory subfield duration (2-tailed Spearman rank correlation test, rs = 0.65, P < 0.0001). Peak latency and excitatory subfield duration were also correlated within AI alone (rs = 0.72, P < 0.0001), and (less strongly) within AAF alone (rs = 0.29, P < 0.05), indicating that these temporal properties of receptive fields co-varied not only between auditory fields but also across STRFs recorded within each field.
STRF spectral structure
As suggested by the examples shown in Fig. 5, AI and AAF STRFs appeared to be more similar in spectral structure than in temporal structure; however, some small differences between the two auditory fields were evident in the population analyses. STRFs in both areas had broad frequency tuning, but the STRF bandwidths tended to be slightly larger in AI than in AAF. Figure 7 displays the distributions of normalized bandwidth (STRF peak width at half-height, normalized by STRF best frequency) for all STRFs in which the STRF peak did not fall near the edge of the STRF frequency range (61 of the 114 predictive STRFs). The normalized bandwidth distribution for AI was significantly shifted to larger bandwidths (1-tailed K-S test, P < 0.005, D61 = 0.43), and the mean bandwidth for AI was larger than for AAF (mean ± SE, 1.14 ± 0.10 for AI and 0.86 ± 0.08 for AAF; 1-tailed t-test, P < 0.05, t59 = 2.21). Thus while frequency tuning was broad in both auditory areas, STRF bandwidths were broader in AI than in AAF. Because AI STRFs also tended to have longer times to peak and longer receptive-field durations than AAF STRFs, spectral and temporal STRF measures were correlated across the entire population of recorded STRFs (2-tailed Spearman rank correlation test, P < 0.05 for all comparisons). This trend toward co-variation in spectral and temporal measures was also evident within AI alone but did not reach significance within AAF alone.
|
In addition to their rather broad bandwidths, recordings from AI and AAF often shared another spectral characteristic: ultrasound sensitivity. While most of the recording sites produced only low-frequency STRFs, more than one-third of the 66 recording sites (13 sites in AI, 13 sites in AAF) yielded predictive STRFs in both the low (232 kHz)-and high (25100 kHz)-frequency ranges. (Frequency-intensity tuning curves collected at these sites also showed both low-frequency and ultrasound sensitivity.) Of these 26 sites, 11 sites (3 in AI, 8 in AAF) gave low- and high-frequency STRFs with clearly distinct low- and high-frequency peaks; 13 sites (9 in AI, 4 in AAF) yielded STRFs that displayed broad frequency tuning extending smoothly from low frequencies to 50 kHz or above; and the remaining 2 sites (1 each in AI and AAF) produced ultrasound STRFs with high-frequency sensitivity confined to the 25- to 32-kHz region of overlap between the two stimulus frequency ranges.
Figure 8 shows ultrasound STRFs of the first type, derived from sites that were judged during mapping with tonal stimuli to be tuned to lower frequencies consistent with the tonotopic organization of AI and AAF. All four of these recording sites also produced predictive low-frequency STRFs with distinct receptive-field peaks below 25 kHz, as well as frequency-intensity tuning curves with low-frequency tuning peaks but also some sensitivity in the ultrasound range (not shown). Our single-electrode recordings did not provide definitive proof that the same neurons were responding both to lower frequencies and to higher frequencies at these sites; indeed, it is possible that the low- and high-frequency responses at these sites reflect the sensitivities of separate and differently tuned neurons. However, in either case, the results indicate that neurons with ultrasound sensitivity exist at recording sites judged to be within the boundaries of AI and AAF as well as within the previously defined ultrasound field.
|
STRF spectrotemporal structure
Analysis of temporal or spectral properties of STRFs requires the simplifying assumption that STRFs may be viewed as separable so that the temporal or spectral profiles of the receptive field can be examined separately. This simplification, although useful for highlighting obvious differences between AI and AAF, obscures the complexity that was observed in some of the mouse STRFs. Nearly one-quarter of the STRFs recorded in both AI and AAF had significantly inseparable spectrotemporal structurei.e., structure that could not be described fully without reference to an interaction between spectral and temporal features of the receptive field. Three representative examples of such spectrotemporally inseparable STRFs are shown in Fig. 9. Each set of two panels displays the separable model of the STRF on the left, for comparison with the rank 3 inseparable model of the STRF (see METHODS) on the right. (For most STRFs, including these 3 examples, the rank 3 inseparable model was nearly indistinguishable by eye from the full STRF.) As explained in METHODS, these STRFs were judged to be inseparable because the inseparable model significantly outperformed the separable model at prediction of neuronal responses to novel segments of the dynamic random chord stimulus. Inseparable features of the STRFs, which presumably account for the improvements in response prediction, can be identified by comparing the two models. In Fig. 9A, the inseparability appears as offset excitatory and inhibitory subfields in the inseparable model; the excitatory subfield is shifted to lower frequencies than the inhibitory subfield preceding it. The inseparable model in Fig. 9B displays a different form of inseparability: a smooth slant in both the excitatory and inhibitory portions of the receptive field, suggesting tuning to fast frequency sweeps. Figure 9C combines features of both of the other two examples in what appear to be multiple stacked excitatory and inhibitory subfields displaced along a spectrotemporal slant.
|
Across the population, a total of 26/114 STRFs (23%) had significantly inseparable spectrotemporal structure, like that shown in the examples in the preceding text. Significantly inseparable STRFs appeared in similar proportions in both areas AI and AAF (16/60 STRFs in AI versus 10/54 STRFs in AAF; Fisher's exact text, P > 0.5), and among both single-unit and cluster recordings (8/32 single-unit STRFs vs. 18/82 cluster STRFs; Fisher's exact test, P > 0.8). There were no significant differences between AI and AAF STRFs (or single-unit or cluster STRFs) in their degree of spectrotemporal inseparability, so all data from the two auditory fields were pooled in Fig. 10A to illustrate the total population spread in two measures of STRF inseparability. The prediction-based inseparability index, explained in METHODS, quantifies the improvement in neuronal response predictions that could be obtained using an inseparable rather than a separable STRF model. Significantly inseparable STRFs (i.e., those for which the prediction-based inseparability index was significantly greater than 0) are indicated (
) in the scatter plot and in both marginal histograms. The distribution of the prediction-based inseparability index is compared in the scatterplot to the distribution of a SVD-based inseparability index similar to that used in previous studies of STRF inseparability (Depireux et al. 2001
; cf. Sen et al. 2001
). Values of the two inseparability indices were significantly correlated across the population (2-tailed Spearman rank correlation test, rs = 0.46, P < 0.0001). Note, however, that values of the SVD-based inseparability index were quite low even for STRFs with significantly inseparable STRFs according to the prediction-based measure. [Quantitatively similar results were obtained even when the SVD-based measure was defined relative to only the first 3 singular values of the SVD, as in Sen et al. (2001
)]. For further comparison with previous studies, Fig. 10B shows the distribution of the SVD-based inseparability index applied to the first and second quadrants of the two-dimensional Fourier transform of each STRF; the strong peak near zero in both distributions suggests that STRFs in mouse auditory cortex tend to be "quadrant separable" (Depireux et al. 2001
).
|
Responses to tonal stimuli
The major findings of the STRF analysis regarding the structure of receptive fields in AI and AAF were confirmed by analysis of multi-unit responses to tonal stimuli. Figure 11 displays smoothed frequency-intensity tuning curves and cumulative PSTHs for responses to tonal stimuli at the same AI and AAF recording sites that produced the STRFs depicted in Fig. 5, A and B. To facilitate comparison of the frequency-intensity tuning curves and STRFs, solid lines along the top of each plot show the frequency tuning curve averaged over the 25- to 70-dB SPL intensity range spanned by tone pips in the dynamic random chord stimulus, and dashed lines show the spectral profile for the corresponding STRF from Fig. 5. Like their corresponding STRFs, the tuning curves from these AI and AAF sites had similar spectral tuning, with maximum sensitivity to sounds in the 10- to 16-kHz frequency range and comparable response bandwidths. However, the PSTHs at the two recording sites were very different; both first-spike latency and response duration were longer for the AI than for the AAF site. The longer response duration at the AI site arose not simply from wider variation in response latency for tones of different frequencies, but from more prolonged responses to individual tone bursts (not shown). Thus these AI and AAF responses to tonal stimuli, like the corresponding STRFs for the same recording sites, appeared to be quite different in temporal structure despite their similar spectral characteristics.
|
Results for the entire population of AI and AAF recording sites are illustrated in Fig. 12. As shown in Fig. 12A, the distribution of first-spike latencies for AI sites was significantly shifted toward longer values relative to the distribution for AAF sites (1-tailed K-S test, P < 0.0001, D66 = 0.55). Moreover, although the very shortest response latencies recorded in the two fields were similar (56 ms), the mean first-spike latency for AI was longer than the mean latency for AAF (mean ± SE, 17 ± 1 ms for AI and 11 ± 1 ms for AAF; 1-tailed t-test, P < 0.0001, t64 = 4.28). Likewise, although AI and AAF response durations spanned a similar range (Fig. 12B), the distribution was significantly shifted toward longer durations for AI than AAF (1-tailed K-S test, P < 0.005, D66 = 0.42), and the mean of the AI distribution was 20 ms longer than the mean of the AAF distribution (59 ± 4 ms for AI, 39 ± 3 ms for AAF; 1-tailed t-test, P < 0.0001, t64 = 4.24). (Similar results were obtained when response duration was measured only for stimuli with frequencies near the CF and amplitudes near threshold; see METHODS.) In contrast, normalized bandwidths at 10 dB above threshold were very broad in both AI and AAF (Fig. 12C) with no significant difference observed between the AI and AAF distributions (2-tailed K-S test, P > 0.8) or between the distribution means (0.48 ± 0.03 for AI, 0.54 ± 0.05 for AAF; 2-tailed t-test, P > 0.3). No significant correlations were observed between any of these response measures, except for a correlation between response duration and normalized BW10 for AAF recording sites (2-tailed Spearman rank correlation test, rs = 0.53, P < 0.005).
|
Overall these findings corroborate the results of the STRF analysis by demonstrating that temporal properties of auditory responses were longer on average in AI than in AAF and that the frequency tuning of the responses in both areas was very broad. Indeed, recording site by recording site, there were significant positive correlations between traditional measures of responses to tonal stimuli and STRF measures derived from responses to dynamic random chord stimuli. As illustrated in Fig. 13, characteristic frequencies of frequency-intensity tuning curves were correlated with peak frequencies of STRFs (Fig. 13A; 2-tailed Spearman rank correlation test, rs = 0.65, P < 0.0001); first-spike latencies for responses to tonal stimuli were correlated with STRF peak latencies (Fig. 13B; rs = 0.66, P < 0.0001); normalized BW10s for tuning curves were correlated with normalized STRF bandwidths (Fig. 13C; rs = 0.46, P < 0.005); and the durations of responses to tonal stimuli were correlated with excitatory subfield durations for STRFs (Fig. 13D; rs = 0.33, P < 0.005). These correlations suggest strong analogies between traditional receptive-field measures based on responses to tonal stimuli and STRF measures based on responses to dynamic random chord stimuli.
|
| DISCUSSION |
|---|
|
|
|---|