|
|
||||||||
Laboratory of Auditory Neurophysiology, Medical School, K.U.Leuven, B-3000 Leuven, Belgium
Submitted 20 August 2003; accepted in final form 17 December 2003
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Pure-tone stimuli are limiting because they afford only limited prediction of temporal responses to broadband stimuli, due to cochlear nonlinearities. Also they are unsuited to study envelope synchronization. Vector strength has a number of drawbacks, one being that it is calculated at a predetermined frequency (e.g., carrier or modulation frequency of the stimulus) and thus is most useful for periodic signals.
Our interest in characterizing the temporal information in the monaural channels feeding into the binaural system led us to search for metrics that are applicable independent of stimulus bandwidth or periodicity and that do not require knowledge of the stimulus in their computation. Temporal studies of responses to nonperiodic stimuli have employed poststimulus time (PST) dot rasters or histograms, reverse correlation (revcor), or autocorrelation. Revcor and PST analyses document temporal structure in spike trains but do not provide straight-forward metrics for comparison across stimuli or across neuronal populations (e.g., different cell types, different anatomical levels, and different species). Strict autocorrelation analysis, based on interval statistics within spike trains (Perkel et al. 1967a
), provides a rich temporal description but has the disadvantage of obscuring fast temporal fluctuations because of the presence of the refractory period. Joris (2003
) introduced an autocorrelation analysis that provides a straightforward means to compare monaural and binaural responses and that can be applied to responses to a variety of stimuli but does not require knowledge of the stimulus. The analysis entails the comparison of spike times across (rather than within) spike trains to identical or nonidentical stimuli. Because the envelope of sound waveforms, but not their fine-structure, is independent of polarity, use of polarity-inverted pairs of stimuli allows disambiguation of phase-locking to fine-structure and envelopes.
Here, we extend this analysis by extracting metrics from the autocorrelograms. We show that there are consistencies between these metrics and measures based on pure tones, that the same metrics can be applied to the study of temporal information based on envelope and on fine-structure, and that these metrics reveal new relationships not described earlier.
| METHODS |
|---|
|
|
|---|
Cats with normal eardrum and air-filled middle ear were anesthetized with a mixture of acepromazine (0.2 mg/kg) and ketamine (20 mg/kg). A venous canula allowed infusion of Ringer solution and sodium pentobarbital at doses sufficient to maintain an areflexic state. After a tracheotomy, the pinna and temporalis muscle of one side were removed and the bulla exposed and vented with a 30-cm-long polyethylene tube (0.9 mm ID). The animal was placed in a double-walled soundproof room (Industrial Acoustics Company, Niederkrüchten, Germany). Through a posterior fossa craniotomy, a small portion of the cerebellum was aspirated and the auditory nerve (AN) exposed. A plastic base was glued to the skull to support a hydraulic microdrive (Trent Wells, Coulterville, CA) and was filled with warm 3% agar after a glass micropipette, filled with 3 M NaCl, was positioned in the AN under visual control.
Hard- and software environment
A dynamic phone (supertweeter, Radio Shack, Forth Worth, TX) was connected to a hollow Teflon earpiece that fit in the transversely cut ear canal. Custom software, run within Matlab (The MathWorks, Natick, MA) on a personal computer, was used to calculate the stimuli and control the digital hardware (Tucker-Davis Technologies, Alachua, FL). The transfer function of the closed acoustic assembly was obtained via a probe the tip of which was placed within a few millimeters of the tympanic membrane and was coupled to a 1/2-in (12.7 mm) condensor microphone and conditioning amplifier (Bruel and Kjaer, Nærum, Denmark). All stimuli were compensated for this transfer function, and the stimuli were specified in SPL (dB re. 20 µPa). The neural signal was amplified and filtered (300 Hz to 3 kHz; DAM 80, World Precision Instruments, Sarasota, FL), and spikes were converted to standard TTL pulses with a level discriminator (DIS-1, BAK Electronics, Germantown, MD) or a custom-built peak-picker (Carney and Yin 1988
). These pulses were time-stamped to an accuracy of 1 µs (ET-1, Tucker-Davis Technologies, Alachua, FL).
Stimuli
Spontaneous rate (SR), minimum threshold, and characteristic frequency (CF) of single fibers were determined with an automated tuning curve program. Short tone bursts at CF (duration: 25 ms, repeated every 100 ms, 200 repetitions, rise-fall time 2.5 ms, starting in sine phase) were then presented at increasing SPL in 10-dB steps. The PST histogram was displayed on-line.
A standard frozen noise waveform (Gaussian broadband noise, 0.130 kHz, 1 s, repeated every 1.2 s, 10 repetitions) was presented at a sound intensity from 10 to 90 dB SPL in 10-dB steps to obtain a rate-level curve. Next, the noise waveform was presented
50 times at 70 dB SPL (spectrum level: 25 dB SPL) and when possible also at 50, 60, and 80 dB SPL. A second series of responses to the same noise waveform was then obtained, the only difference being that the waveform was inverted in polarity. Inverting the polarity corresponds to a 180° phase shift of all frequency components. Because the normalized correlation coefficient of the reference waveform and its inverted version is 1, we refer to the two waveforms as reference noise and anti-correlated noise.
Synchronization to tones
For pure-tone stimuli, vector strength was calculated over an analysis window of 1025 ms relative to the stimulus onset to eliminate the onset response, which was not always in phase with the sustained response. Significance (P < 0.001) of phase-locking was evaluated with the Rayleigh test (Mardia and Jupp 2000
).
Construction of shuffled autocorrelograms
The construction of shuffled autocorrelograms (SACs) follows Joris (2003
) and is schematized in Fig. 1. For each fiber, N spike trains were available to the same noise stimulus (Fig. 1A). Next, we listed all possible pairs of nonidentical spike trains (Fig. 1B). N spike trains yielded N(N 1) pairs. For all these pairs, we measured the forward time intervals between all spikes of the first spike train and all spikes of the second spike train (Fig. 1C). All intervals from all listed pairs were tallied in a histogram, which we refer to as a SAC. By measuring intervals across spike trains (thus excluding intervals within spike trains), the obscuring effect of the refractory period on short intervals was avoided (see following text). Shuffling is a classical trick used to reveal stimulus-induced temporal structure in cross-correlograms of paired neuronal recordings, sometimes referred to as the "shift predictor" (Perkel et al. 1967b
). This shuffling should not be confused with shuffling of intervals within spike trains (see Moore et al. 1966
). Thus shuffled refers to the exclusion of identical spike trains, auto to the exclusive use of responses obtained from one single fiber, and correlation to the formal equivalence between this procedure and the cross-correlation of two spike trains. In practice, the SACs were only calculated and analyzed for intervals
30 ms, which are small relative to the duration of the noise stimulus so that we did not need to correct the SAC for the finite stimulus duration.
|

, and stimulus duration D. This was achieved by dividing by N(N 1)r2
D. This results in SACs with a dimensionless ordinate that we refer to as normalized SACs. A crucial property of this choice of normalization is the occurrence of a unity "baseline": in the absence of any temporal coding, the SAC will consist of a horizontal line with unity intercept. Any temporal structure will result in deviations from this baseline. In our data, the unity baseline is often visible as an asymptote for high delays (see the discussion of Fig. 3). The mathematics behind the normalization constant is explained in the following text. Preliminary analysis showed that the shape and magnitude of the normalized SAC was not critically dependent on binwidth up to widths of
70 µs, a value probably reflecting minimal jitter of spike timing. Binwidths >70 µs obscured fast oscillations in the correlogram and reduced bin values around 0 delay. We therefore used a standard binwidth of 50 µs.
|
Like SACs, cross-stimulus autocorrelograms (XACs) are also all-order interval histograms, but here the spike times are compared across responses to two different stimuli (say A and B) rather than across responses to the same stimulus. In this paper, A and B are the reference noise and its inverted version. In contrast to autocorrelograms, no identical spike trains are to be excluded in the calculation of XACs. Therefore there is a total of NA NB ordered pairs of spike trains. Consequently, the normalization factor becomes NANBrArB
D.
Normalized SACs of some artificial pulse trains
For illustrative purposes, Fig. 2 shows autocorrelograms for simple artificial pulse trains. The left panels show dot rasters and the right panels the corresponding normalized SACs. Figure 2A shows a collection of identical pulse trains. Each dot represents a pulse or spike. The pulse trains have a duration D but only 10 ms are shown. The pulses are spaced at 1.33 ms (period of 750 Hz; T750 Hz). The corresponding normalized SAC (Fig. 2A, right) is periodic and shows peaks, one bin wide and spaced at the period of the pulse train (1.33 ms). Because counting all-order intervals is equivalent to counting coincident spikes between two spike trains shifted over different delays, SACs can be graphed as number of intervals versus interval duration, or as number of coincidences versus delay: the latter labels are used in Fig. 2 (right) and throughout the remainder of this paper. Figure 2B shows a similar collection of pulse trains, but here each pulse time is jittered by adding a small time interval randomly chosen from a normal distribution with a SD
of 80 µs. Jitter results in irregular alignment of spikes in the dot rasters. The corresponding normalized SAC also has peaks with a 1.33 spacing, but the peaks are lower and broader (Fig. 2B, right; note the difference in Y scale). Pulse trains in C have the same periodicity and jitter as B, but pulses that could occur at any time were added. Compared with the normalized SAC of B, the presence of such pulses lowered the peaks but did not affect the width. They also cause a "noise floor" of coincidences. Pulses in D have the same periodicity as in B, but the amount of jitter is increased to
= 170 µs. Compared with B, lowering and broadening of the peaks in the SAC is more pronounced (Fig. 2D, right). Figure 2E shows pulse trains similar to those of Fig. 2D, but half of the pulses were randomly removed. This causes a reduction of average pulse rate of 50% but does not affect the normalized SAC (Fig. 2E, right).
|

-wide bins; this would yield a number of bins equal to B = D/
. If the average number of spikes in the pulse trains equals
, the average pulse rate will be r =
/D. If the pulse train is a realization of a uniform distribution on the interval [0,D], the probability for the occurrence of a spike in a bin is the same for all bins of all pulse trains and equals
/B provided the binwidth is sufficiently small for each bin to contain at most one spike. In the case of independent pulse trains, the joint probability for a spike in bin i of train 1 and a spike in bin j of train 2 (a "coincidence") is (
/B)2, and amounts to (
/B)2B for the entire pulse train. Because we count coincidences between ordered pairs and each pair can be ordered in two ways, the number of coincidences per pair equals 2(
/B)2B. Note that this is an average; dispersion around this average will increase with smaller
, higher dispersion around
and smaller 
. Using the preceding expressions for r and B, this quantity can be rewritten as 2r2
D. This is equal to the normalization factor for n = 2, and hence the normalized number of coincidences per pair of pulse trains fulfilling the preceding assumptions is unity. Figure 2G (left) shows a collection of identical pulse trains with pulses chosen from a uniform distribution. The corresponding normalized SAC (Fig. 2G, right) has the shape of a one-bin-wide peak at delay zero flanked by a plateau at unity level.
| RESULTS |
|---|
|
|
|---|
All-order interspike interval (ISI) histograms have been used in the analysis of responses to wideband stimuli (Cariani and Delgutte 1996a
,b
; Ruggero 1973
; Shofner 1991
; ten Kate and van Bekkum 1988
). Figure 3, top, shows all-order ISIs for four AN fibers, arranged from low (left) to high (right) CF. In response to broadband noise, low-CF AN fibers tend to discharge spikes with intervals close to the period of their CF (Ruggero 1973
), henceforth referred to as TCF. The dots below the abscissa indicate interval durations or delays equaling TCF, 2TCF, and 3TCF. For the fiber with the lowest CF (0.55 kHz), these dots are close to modes in the ISI distribution. For the other fibers (CFs
2.5 kHz), the intervals TCF, 2TCF fall within the trough reflecting the refractory period. The ordinate here is scaled to units of rate (Abeles 1982
; Ruggero 1973
; ten Kate and van Bekkum 1988
) by dividing the number of events in a bin by Nr
D. Immediately after firing a spike, the probability of firing is zero due to the refractory period. For delays larger than a few milliseconds, the probability of firing settles toward the average firing rate as routinely measured by counting all spikes over the stimulus duration. This average firing rate is indicated with horizontal lines (Fig. 3, top).
To discard refractory effects and reveal temporal patterns on a short time scale, we compare spike times across rather than within stimulus repetitions by constructing a SAC. The SACs in Fig. 3 (middle) clearly reveal an oscillatory temporal pattern that lines up with integer multiples of TCF for the cells with CF in the phase-locking range (columns 13), and a single peak for the cell with CF of 5 kHz. At long delays, the histograms settle to a constant rate which equals the average firing rate. Note that this "background" level of coincidences does not reflect SR but rather the base level of chance coincidences expected for uncorrelated spike trains. Also, rate can decrease to values below SR (e.g., Fig. 3E at delay of 1 ms), as is well known from tonal responses. Besides revealing temporal structure at all CFs, SACs have the advantage of being much smoother than ISI histograms due to the higher number of intervals in pairs of spike trains [intervals from N(N 1) pairs for SACs versus intervals from N spike trains for all-order ISI histograms].
Figure 3, bottom, illustrates normalized SACs as used in the further analysis. Normalized SACs have a dimensionless ordinate (see also METHODS). Note that whereas the correlograms of top and middle at large delays approach the average firing rate, the normalized SACs approach unity. Furthermore, we display the SACs in full i.e., symmetrical around 0 ms; each positive interval of spike train pair (a, b) is a negative interval in pair (b, a) and vice versa.
The CF dependence in the SACs of Fig. 3 is representative for all AN fibers studied and is consistent with that described by Joris (2003
). For fibers with CFs well within the range of pure-tone phase-locking, the SAC shape was that of a damped oscillation with an oscillation frequency near the CF of the fiber (Fig. 3, I and J). Fibers with CF above the phase-locking range showed a single central peak (Fig. 3L). At intermediate CFs, the SAC showed both components and consisted of a damped oscillation superimposed on a single central peak (Fig. 3K).
The damped oscillatory shape of the SAC at low CFs is consistent with the autocorrelation function of the broadband stimulus after band-pass filtering and reflects the center frequency and tuning width of cochlear filtering preceding spike initiation in the auditory nerve. At high CFs the oscillatory "fine-structure" is removed due to limits of phase-locking (Johnson 1980
; Rose et al. 1967
) and the single peak in the SAC depends on envelope phase-locking (Joris 2003
). Note that the SAC does not represent synchronization to the stimulus per se but to the "effective" stimulus to the AN fiber as determined by the mechanical and transduction events that precede spike initiation at the cochlear site where the AN fiber originates. In summary, the normalized SAC reveals how spikes are constrained in their timing jointly by cochlear filtering and phase-locking to fine-structure and envelope. Analysis in this paper mainly concerns phase-locking. The issue of what these correlograms tell us about cochlear filtering is deferred to a later paper.
Maximum SAC values were always reached at delays near 0 ms and will be referred to as the central peak. The magnitude of other peaks and troughs at non-0 delays covaried with that of the central peak. To quantify phase-locking, we focus on two obvious parameters of the central peak: its height and width. Central peak-height is larger than unity to the extent that spikes tend to occur in the same temporal position (within the 50-µs binwidth) on repeated presentation of the same stimulus. Peak-width depends on the temporal precision at which spikes are constrained to certain timings.
Height of the central peak of normalized SACs from responses to broadband noise presented at 70 dB SPL (spectrum level of 25 dB SPL) was obtained for 219 fibers pooled from eight animals (Fig. 4). Three fibers had a tuning curve threshold >50 dB SPL and were not or barely driven by the broadband noise stimuli: these fibers are excluded from the analysis.
|
Separation of fine-structure and envelope
Joris (2003
) argued that the central peak of SACs reflects synchronization to different waveform features at different CF regions: fine-structure and envelope for low- and high-CF fibers, respectively, and we used the same method to disambiguate synchronization to these stimulus features. To tease synchronization to envelope and fine-structure apart, we obtained responses to a broadband noise and, separately, at the same SPL, to the same noise inverted in polarity. The response component that changes on inverting the polarity is due to synchronization to fine-structure, whereas the response component common to the reference and anti-correlated noise reflects synchronization to the envelope. We constructed normalized XACs from responses to reference and anti-correlated noise (see METHODS). Figure 5 (top) shows normalized SACs and XACs for the same fibers as Fig. 3. XACs differed from SACs only for fibers with CF in the phase-locking range (<5 kHz) (Fig. 5, AC). XACs of low-CF fibers showed a damped oscillation with frequency close to the CF of the fiber but with a central trough rather than a central peak at delay zero as expected from the polarity inversion. Note that whereas SACs are perfectly symmetric around 0 delay, XACs are only approximately symmetric because values at positive and negative delays are obtained from different intervals.
|
Synchronization to fine-structure
Difcors enabled us to quantify synchronization based on the fine-structure of the effective stimulus waveform. In contrast to SACs, difcors oscillated around zero and were flat at CFs above the phase-locking limit (Fig. 5, H vs. D), and they did not exhibit a combined morphology at intermediate CFs (Fig. 5, G vs. C). Figure 6A illustrates the measures taken: difcor peak-height is the height at 0 delay, difcor halfwidth is the width of the central peak taken at (difcor peak-height)/2.
|
4 kHz and was not analyzed for higher CFs. Halfwidth (Fig. 7B, 152 fibers) decreased monotonically with increasing CF. Halfwidth was only measured in fibers with CF
3.5 kHz and with difcor peak-height
0.1. For CFs above
1 kHz, the halfwidth is well predicted by the width of a half-wave rectified cosine of corresponding frequency (Fig. 7B, and inset), equaling TCF/3. Halfwidths of difcors for CFs below
1 kHz were narrower and spanned a considerable range. The dependence of difcors on CF, in both the height and the width of the central peak, thus gives a picture that shows similarities to the dependence of vector strength to pure tones at CF (Johnson 1980
|
|
) and difcor peak-height (x and
) plotted on a single dimensionless ordinate. Both measures show a decrease with increasing CF.
|
Having difcors to both pure tones and broadband noise, we can directly compare temporal behavior to these stimuli, with the same metric, in the same fibers. Figure 10 shows the relation between difcor peak-height of responses to the 70-dB broadband noise and 50-dB pure tones at CF, for a population of fibers for which both responses were available. The two measures are well correlated and, more importantly, difcor peak-height to broadband noise is larger than that to pure tones in the vast majority of fibers. Because difcor peak-height is also dependent on SPL (see following text) and the choice of levels in the comparison of Fig. 10 is somewhat arbitrary, we checked whether the larger peak-heights to noise holds for other SPL combinations. We compared difcor peak-height for all pairwise combinations of noise responses obtained at 50, 60, 70, and 80 dB SPL and tone responses obtained at 30, 40, 50, 60, and 70 dB SPL. In all cases, the pattern observed was the same as in Fig. 10, showing larger difcor peak-heights for the responses to noise. The median difference in difcor peak-height between noise and tone stimuli was 0.6 for the choice of SPLs of Fig. 10; at other SPLs it ranged from 0.4 (for 80-dB noise and 50-dB tones) to 1.2 (for 50-dB noise and 30-dB tones). Thus the tendency of low-CF nerve fibers to discharge spikes in the same temporal position for different presentations of the same stimulus is stronger to broadband noise than to pure tones.
|
In 160 fibers, noise responses were obtained at multiple levels. Figure 11 shows superimposed difcors for different SPLs for three AN fibers, arranged from low CF (top) to intermediate CF (bottom). Difcors at different SPLs differed in three respects. First, peak-height could clearly differ for different SPLs (Fig. 11, left). This is further illustrated in Fig. 12A, which shows peak-height for a collection of AN fibers for which noise responses were available at
4 SPLs. Typically, difcor height is maximal at intermediate SPLs (5070 dB) and is slightly smaller at higher SPLs. Note that this is not attributable to the increase in average rate since peak-height is measured after normalization. A similar decrease with SPL is also present for vector strength (Javel et al. 1983
) and has been attributed to compression (Greenwood 1986
). As mentioned earlier (Fig. 7A), difcor peak-height at 70 dB SPL was on average higher for low/medium-SR fibers than for high-SR fibers; Fig. 12A shows that this difference is even more pronounced at lower SPLs.
|
|
Third, the delay at which side peaks occurred varied with SPL. This is illustrated in Fig. 11 (middle) by scaling the abscissa such that the location of the side peak at the lowest SPL was set at 1. For the AN fibers with CF of 0.38 and 1 kHz (Fig. 11, D and E), the side peaks shifted toward shorter delays with increasing SPL, but for the AN fiber with a CF of 3.18 kHz (Fig. 11F), the shift was in the opposite direction. A shift in the location of side peaks indicates a change in the oscillation frequency of the difcor. We further quantified this change by calculating the Fourier spectrum of the difcor (Figs. 6B and 11, right). A full analysis of these spectra is beyond the scope of this paper; here we only look at the dominant frequency (DF), measured as the frequency corresponding to the maximum in the spectrum (Fig. 6B). Figure 12B shows the shift of DF with SPL for a collection of AN fibers. DF tended to shift to higher frequencies for CFs <
1 kHz and to lower frequencies for CFs > approximately 1 kHz (Fig. 12B). Similar shifts have been reported for the best frequency of revcors (Evans 1977
). Figure 13 shows the DFs from noise responses at 70 dB for 175 AN fibers plotted against the CF obtained from the threshold tuning curve. The two measures were well correlated up to
2.5 kHz.
|
As shown earlier (Joris 2003
) and illustrated in Fig. 5, the shape of SACs and XACs changes with increasing CF: for CFs above the range of pure-tone phase-locking, SACs and XACs become indistinguishable. Here we define "high-CF" fibers on the basis of the similarity in SAC and XAC, using the ratio of the height of these two functions at delay zero. Note that this definition differs from that of Joris (2003
), who used a correlation measure. When the SAC and XAC heights differed <20%, resulting in a ratio between 0.8 and 1.2, we regard the fiber as "high-CF" because phase-locking to fine-structure has a minor or negligible influence in shaping the response. Figure 14 shows this ratio for our population of AN fibers. The ratio shows a sigmoidal relationship to CF with all fibers with CF >4 kHz falling into the high-CF class. The following analysis only concerns these high-CF fibers. We use sumcors (the average of SAC and XAC) when both types of responses are available, and SACs in the other cases. The rationale for using the sumcor is that a signal's envelope is independent of stimulus polarity (phase shift rule) (Hartmann 1997
) while its fine-structure is not. To the extent that the response is linearly dependent on the stimulus, summation of SAC and XAC will preserve response components due to envelope but not to fine-structure.
|
|
|
400 µs and did not show any systematic differences between low/medium- and high-SR fibers.
|
| DISCUSSION |
|---|
|
|
|---|
The main findings are as follows: 1) at CFs below a few kiloHertz, SACs were dominated by synchronization to fine-structure and had the shape of a damped oscillation (Fig. 3). For high-CF (>4 kHz) fibers, SACs were dominated by envelope synchronization and had the shape of a single peak around zero. A superposition of both shapes was observed at intermediate CFs. Peak-height showed a monotonic decrease with CF (Fig. 4). 2) Fine-structure synchronization to noise was studied with difcors (Fig. 5). The central peak of these difcors systematically decreased in height and width with CF (Fig. 7). Peak-height also depended on sound level (Figs. 11 and 12A). The difcor periodicity (DF) was well-matched with CF (Fig. 13) and showed small but systematic shifts with SPL (Figs. 11 and 12B). 3) Fine-structure synchronization to pure tones at CF was studied with difcors and vector strength (Fig. 8). Peak-height shows a strong dependence on CF (Fig. 9), even at CFs where maximum vector strength shows little change (<1 kHz). 4) Envelope synchronization in high-CF fibers (>4 kHz) was measured with sumcors and SACs. Peak-height showed strong dependence on SPL (Figs. 15 and 16) but not on CF (Fig. 4). Halfwidth decreased with CF (Fig. 17). 5) In all autocorrelograms (across CFs, and both to noise and to low-CF tones), there was a marked difference between fibers of different SR classes. Peak-height was higher for low/medium- than for high-SR fibers (Figs. 4, 7, 9, 10, 12, and 16), but there was no systematic difference for halfwidth (Figs. 7 and 17). 6) The peak-heights of difcors to CF tones and noise were correlated but were larger for noise than for tones (Fig. 10).
Points 14 are largely consistent with previous observations made with periodic stimuli and Fourier analysis, while points 5 and especially 6 reveal new aspects of temporal properties.
Interpretation of peak-width and halfheight of correlograms
Traditionally, a single metric, vector strength, has been used for the measurement of phase-locking, while here we report two metrics: SAC peak-width and halfheight. In principle, a vector average can be calculated for the central peak of low-frequency SACs or difcors (for a similar use, see Yin et al. 1987
). However, the height and width of the central peak of the correlograms capture different aspects of temporal behavior and offer a finer physiological dissection tool than a combined metric would. This is nicely illustrated in the differences observed between low/medium- and high-SR fiber populations: peak-height is smaller in high-SR than in low-SR fibers (Figs. 4 and 7A), whereas halfwidth does not markedly differ between the two SR populations (Fig. 7B).
Peak-height of normalized correlograms does not reflect differences in average rate but rather the tendency of neurons to generate spikes at the same point in time on repeated stimulation with the same stimulus. For instance, a normalized SAC height of 3 means that pairwise comparison of neural spike trains yields three times more coincidences than expected for random pulse trains with the same spike rate (see METHODS). Large peak-height values thus require not only phase-locking in the traditional sensesynchronization to the effective stimulus waveform but an additional component of consistency in response to repetitions of the same stimulus.
Halfwidth quantifies the spread of spike times around certain events on repeated stimulation with the same noise stimulus. At least two components affect halfwidth. Put simply, in response to noise, the filtering properties at the cochlear point of innervation result in slow vibrations of the basilar membrane at the cochlear apex and fast ones at the cochlear base. Because spike generation is coupled to basilar membrane vibration, the spike trains of the innervating fibers show correspondingly slow or fast oscillations, resulting in broad or narrow central peaks, respectively, in the autocorrelograms. Second, this coupling is affected by temporal limits in the transduction process, here defined as all steps intervening between basilar membrane motion and spike generation.
The dependence of halfwidth on CF is complicated by the presence of two forms of temporal behavior in response to noise: fine-structure and envelope. Mechanically, both forms are present throughout the cochlea, but the effect of temporal transduction limits is such that the SAC central peak reflects fine-structure for apical fibers and envelope for basal fibers. The main cause for the decrease in halfwidth with CF for apical fibers is shortening of TCF, whereas for basal fibers, it is the increase in bandwidth of cochlear tuning (Fig. 17) (Joris 2003
). There is an area of transition, at CFs between
2 and 4 kHz, in which coding of fine-structure gives way to coding of envelope (Fig. 14) (and, using a different metric, Joris 2003
). Importantly, the halfwidth of difcors is not a constant proportion of the stimulus cycle. At frequencies between 1 and 4 kHz, difcor halfwidth is close to TCF/3, but it is smaller at the lowest CFs (Fig. 7B, ). We return to this observation in the following text.
Elimination of refractory effects
Previous authors have used unshuffled interspike-interval or autocorrelation analysis to study temporal patterns in response to wideband sounds (Cariani and Delgutte 1996a
,b
; Horst et al. 1986
; Ruggero 1973
; Shofner 1991
). Because of the obscuring effect of the refractory period on small intervals, the presence of high-frequency (roughly >1 kHz) temporal information was probably underestimated in these analyses. For example, the coding of formant frequencies (Delgutte 1997
), of components in harmonic complexes at high SPLs (Horst et al. 1986
), and of short delays in rippled noise (Shofner 1991
) may be more robust than appears from unshuffled autocorrelograms.
The study that comes closest to ours in intent is that by Ruggero (1973
), who used unshuffled all-order interval histograms for a temporal analysis of AN responses to broadband noise in the squirrel monkey. He found a correspondence between the frequency of the oscillations in the histograms and best frequency
2 kHz, which is about an octave below the phase-locking limit for that species (Rose et al. 1967
). For higher best frequencies, no modes could be detected in the histograms, presumably because of the obscuring effect of the refractory period (Fig. 3).
Joris (2003
) introduced SACs to compare the temporal structure in AN fibers with noise-delay functions measured in the inferior colliculus. By shuffling, the obscuring of short intervals by refractoriness is eliminated, revealing fast modulations in fine-structure and envelope. This procedure has a physiological counterpart because virtually all cochlear nucleus neurons receive inputs from multiple nerve fibers (Liberman 1991
; Ryugo and Sento 1991
) and because the available evidence suggests that spike generation in different nerve fibers is independent (Johnson and Kiang 1976
; Kiang 1990
; Kiang et al. 1965
). Thus even though stimulus-locked intervals shorter than the refractory period are not transmitted by a single fiber, convergence of at least two fibers makes these short intervals effectively available to the CNS. The clearest illustration that the CNS makes use of this type of temporal information is in the binaural system, where cells are found that show a dependence on interaural time differences that shows striking similarities to SACs (Yin and Chan 1990
; Yin et al. 1986
). Joris (2003
) pointed out that a SAC can be viewed as the output of the simplest binaural coincidence circuit conceivable in which a coincidence detector receives inputs from two identical fibers and has an integration window equaling the binwidth used in the SAC computation and with the delay axis denoting interaural time delay. Whether across-fiber spike timing is also important in monaural processing is less established because experimental control over the relative timing of inputs to a cell is more difficult monaurally than binaurally. However, evidence for monaural coincidence processes has been provided by Carney and collaborators (Carney 1990
, 1994
; Carney et al. 2002
; Heinz et al. 2001
; Joris et al. 1994a
).
Comparison with studies of phase-locking to pure tones
Pure tones have been the stimulus used most often to quantify phase-locking. Johnson (1980
) first reported a detailed description of the dependence of vector strength on CF, and numerous studies have replicated his findings. In the cat, AN responses to pure tones near CF show a narrow range of Rmax values up to
1 kHz, followed by a decrease at CFs >1 kHz, reaching insignificant levels for CFs and stimulus frequencies near 45 kHz (Fig. 9A). This relationship is also found in other species and is usually characterized as a low-pass filter (Weiss and Rose 1988
). There are, however, two problems with this characterization: maximum synchronization to CF tones, measured for a population of fibers of different CFs, does not appear to coincide with maximum synchronization to different frequencies within a fiber (Joris et al. 1994b
), and vector strength is a compressive metric with a ceiling at 1 and needs to be graphed on an expansive scale to obtain an approximately isovariant axis (Johnson 1980
).
Difcors, which also capture phase-locking to fine-structure, show a CF dependence with certain similarities to that of vector strength: they show large-amplitude oscillations at low CFs that become unmeasurable at CFs >45 kHz (Figs. 5 and 7). Small-amplitude oscillations were seen up to CFs of
5 kHz, but in this study, we did not attempt to obtain a precise statistical estimate of the upper CF limit. Both difcor peak-height and halfwidth show a monotonically decreasing dependence on CF. These relationships differ, however, from that of pure-tone vector strength in two respects. First, for corresponding CFs, peak-height (but not halfwidth) is markedly larger for low/medium- than for high-SR fibers. We will return to this observation in the following text. Second, while vector strength shows a low-frequency plateau up to
1 kHz (Fig. 9A), difcor peak-height and halfwidth show a clear CF dependence in that range. This is perhaps not surprising with respect to halfwidth because, as we already argued, the abscissa is not scaled to the effective stimulus period as it is for period histograms. For peak-height, however, the clear decrease with CF in the range <1 kHz is unexpected and also suggests that the characterization of phase-locking as a low-pass filter is flawed.
The most likely explanation for the decrease in peak-height with CF, for CFs <1 kHz, lies in the small halfwidth, relative to TCF, at very low CFs. It is known that the probability of nerve fibers to discharge a spike in a given stimulus cycle is inversely proportional to stimulus frequency (Rose et al. 1967
). The probability that, on repeated presentations of a stimulus, a spike occurs in the same nth cycle of the stimulus, is therefore higher at low than at high frequencies. On the other hand, this effect is offset by the increase in stimulus period T: if two spike trains contain a spike in the same nth cycle, chances that they will occur in a 50-µs coincidence window are higher at high than at low CFs. However, because halfwidth relative to TCF is smaller at CFs of a few hundred hertz than near 1 kHz (Fig. 7B), consistency in spike timing between spike trains will be higher at low than at high CFs. The reason why halfwidth relative to TCF is smaller at low CFs in the first place is not clear from our data and analysis but may be related to the distorted waveforms reported to tonal low-frequency stimuli in inner hair cells (Mountain and Cody 1999
).
To enable a more direct comparison between autocorrelation and vector strength metrics, we calculated both metrics for responses to pure tones (Figs. 8 and 9A). As was the case for noise, the peak-height of difcors to pure tones also showed a clear dependence on CFs, even <1 kHz. Graphing of the two metrics against each other (Fig. 9B) clearly illustrates that responses which result in similar vector strengths can differ in their difcor peak-heights.
Besides a description of the general dependence of vector strength on frequency, studies of neural phase-locking to pure tones, in conjunction with other techniques such as reverse correlation (de Boer and de Jongh 1978
), have provided much information on cochlear processing. At a qualitative level, SACs and difcors show features that are consistent with phenomena known from such studies, such as the SPL-dependent changes in best frequency and broadening of filtering (Figs. 11 and 12). Because SACs and revcors can be calculated from the same noise responses, a direct comparison of these analyses can be made but is outside of the scope of the present paper.
Comparison with studies of phase-locking to sinusoidally amplitude-modulated tones
With broadband noise stimulation, the effective stimulus waveform at a point in the cochlea can be described as a randomly amplitude-modulated carrier with carrier frequency near the CF of the fiber, where the range of modulation frequencies is limited by the bandwidth of the cochlear filter. It is appropriate therefore to compare the temporal characteristics in responses of high-CF fibers to broadband noise with envelope synchronization in responses to sinusoidally amplitude-modulated (SAM) tones. Because we did not obtain both sets of responses in individual fibers, we cannot make a direct comparison as we did for pure tones, but we can compare at the population level.
As is the case for synchronization to AM (Cooper et al. 1993
; Javel 1980
; Joris and Yin 1992
; Smith and Brachman 1980
; Wang and Sachs 1993
), the peak-height of responses to broadband noise is strongly nonmonotonic with SPL (Fig. 16). Also, as pointed out in the preceding text, peak-height of SACs and sumcors was larger for low/medium-SR fibers compared with low-SR fibers, while synchronization to AM is also higher in low/medium-SR fibers (Joris and Yin 1992
; Wang and Sachs 1993
). For CFs >5 kHz, peak-height does not seem to depend systematically on CF (Fig. 4), which is also true for maximal synchronization values to AM (see nerve data in Fig. 12A of Joris and Yin 1998
).
At high CFs, autocorrelogram halfwidth is likely inversely related to the width of the modulation transfer function (MTF) because both are limited by filter bandwidth. MTFs show an increase in corner frequency with CF, but the relationship shows a saturating tendency at
10 kHz (Greenwood and Joris 1996
; Joris and Yin 1992
; Rhode and Greenberg 1994
). Similarly, halfwidths of autocorrelograms show a decrease with CF but there is little change >10 kHz (Fig. 17).
Finally, some SACs show a central peak surrounded by very shallow troughs that give it a "Mexican hat" appearance (Fig. 15): these troughs probably correspond to the shallow high-pass slope found in some MTFs (Cooper et al. 1993
; Joris and Yin 1992
).
Differences between spontaneous rate classes and between responses to noise and pure tones
Previous studies have described differences in maximum vector strength between different SR classes. The differences are small in responses to pure tones near CF (Johnson 1980
) but larger for tones in the tuning curve tail (Joris et al. 1994b
) and for envelope phase-locking to SAM tones, particularly at low CFs (Cooper et al. 1993
; Joris and Yin 1992
; Wang and Sachs 1993
). Two striking findings of this study are larger peak-heights in response to noise than in response to pure tones (Fig. 10) and large differences in central peak-height between SR classes, being lower in high-SR fibers than in low/medium-SR fibers (Figs. 4, 7A, and 9).
A difference between SR classes was also reported by Cariani and Delgutte (1996a
), who calculated an autocorrelation-based "fiber salience": this is the peak-to-background ratio of unshuffled autocorrelation histograms, at non-0 delays, in response to various periodic complex waveforms. The highest fiber saliences were also obtained in medium/low-SR fibers. The reasons for the larger peak-heights in low/medium-SR fibers are not clear. In AN fibers with high SR, there is obviously a mechanism that triggers spikes in the absence of a stimulus. To the extent that this mechanism is also in effect in the presence of a stimulus, spikes of high-SR fibers may be less restrained to certain timings by the stimulus than spikes of low-SR fibers, in which all spikes are stimulus-evoked and therefore likely more stimulus-coupled. However, the different SR classes also differ in other properties than merely in the presence of spontaneous spikes, and other factors are likely involved in the difference in peak-height, particularly at high driven rates.
Interestingly, the difference in peak-height between SR classes is observed in response