JN Fuel your research with LabChart
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J Neurophysiol 94: 347-362, 2005. First published March 23, 2005; doi:10.1152/jn.01114.2004
0022-3077/05 $8.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
94/1/347    most recent
01114.2004v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (4)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Cedolin, L.
Right arrow Articles by Delgutte, B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Cedolin, L.
Right arrow Articles by Delgutte, B.

Pitch of Complex Tones: Rate-Place and Interspike Interval Representations in the Auditory Nerve

Leonardo Cedolin1,2 and Bertrand Delgutte1,2,3

1Eaton-Peabody Laboratory, Massachusetts Eye and Ear Infirmary, Boston; 2Speech and Hearing Bioscience and Technology Program, Harvard-Massachusetts Institute of Technology Division of Health Sciences and Technology; and 3Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts

Submitted 26 October 2004; accepted in final form 17 March 2005


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Harmonic complex tones elicit a pitch sensation at their fundamental frequency (F0), even when their spectrum contains no energy at F0, a phenomenon known as "pitch of the missing fundamental." The strength of this pitch percept depends upon the degree to which individual harmonics are spaced sufficiently apart to be "resolved" by the mechanical frequency analysis in the cochlea. We investigated the resolvability of harmonics of missing-fundamental complex tones in the auditory nerve (AN) of anesthetized cats at low and moderate stimulus levels and compared the effectiveness of two representations of pitch over a much wider range of F0s (110–3,520 Hz) than in previous studies. We found that individual harmonics are increasingly well resolved in rate responses of AN fibers as the characteristic frequency (CF) increases. We obtained rate-based estimates of pitch dependent upon harmonic resolvability by matching harmonic templates to profiles of average discharge rate against CF. These estimates were most accurate for F0s above 400–500 Hz, where harmonics were sufficiently resolved. We also derived pitch estimates from all-order interspike-interval distributions, pooled over our entire sample of fibers. Such interval-based pitch estimates, which are dependent on phase-locking to the harmonics, were accurate for F0s below 1,300 Hz, consistent with the upper limit of the pitch of the missing fundamental in humans. The two pitch representations are complementary with respect to the F0 range over which they are effective; however, neither is entirely satisfactory in accounting for human psychophysical data.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
A harmonic complex tone is a sound consisting of frequency components that are all integer multiples of a common fundamental (F0). The pitch elicited by a harmonic complex tone is normally very close to that of a pure tone at the fundamental frequency, even when the stimulus spectrum contains no energy at that frequency, a phenomenon known as "pitch of the missing fundamental."

Investigating the neural mechanisms underlying the perception of the pitch of harmonic complex tones is of great importance for a variety of reasons. Changes in pitch convey melody in music, and the superposition of different pitches is the basis for harmony. Pitch has an important role in speech, where it carries prosodic features and information about speaker identity. In tone languages such as Mandarin Chinese, pitch also cues lexical contrasts. Pitch plays a major role in auditory scene analysis: differences in pitch are a major cue for sound source segregation, while frequency components that share a common fundamental tend to be grouped into a single auditory object (Bregman 1990Go; Darwin and Carlyon 1995Go).

Pitch perception with missing fundamental stimuli is not unique to humans; it also occurs in birds (Cynx and Shapiro 1986Go) and nonhuman mammals (Heffner and Whitfield 1976Go; Tomlinson and Schwartz 1988Go), making animal models suitable for studying neural representations of pitch. Pitch perception mechanisms in animals may play a role in processing conspecific vocalizations, which often contain harmonic complex tones.

The neural mechanisms underlying pitch perception of harmonic complex tones have been at the center of a debate among scientists for over a century (Ohm 1843Go; Seebeck 1841Go). This debate arises because the peripheral auditory system provides two types of cues to the pitch of complex tones: place cues dependent upon the frequency selectivity and tonotopic mapping of the cochlea and temporal cues dependent on neural phase locking.

The peripheral auditory system can be thought of as containing a bank of band-pass filters representing the mechanical frequency analysis performed by the basilar membrane. When two partials of a complex tone are spaced sufficiently apart relative to the auditory filter bandwidths, each of them produces an individual local maximum in the spatial pattern of basilar membrane motion. In this case, the two harmonics are said to be "resolved" by the auditory periphery. On the other hand, when two or more harmonics fall within the pass-band of a single peripheral filter, they are said to be "unresolved." Because the bandwidths of the auditory filters increase with their center frequency, only low-order harmonics are resolved. Based on psychophysical data, the first 6–10 harmonics are thought to be resolved in humans (Bernstein and Oxenham 2003bGo; Plomp 1964Go).

When a complex tone contains resolved harmonics, its pitch can be extracted by matching the pattern of activity across a tonotopic neural map to internally stored harmonic templates (Cohen et al. 1994Go; Goldstein 1973Go; Terhardt 1974Go; Wightman 1973Go). This type of model accounts for many pitch phenomena, including the pitch of the missing fundamental, the pitch shift associated with inharmonic complexes, and the pitch ambiguity of complex tones comprising only a few harmonics. However, a key issue in these models is the exact nature of the neural representation upon which the hypothetical template matching mechanism operates.

Pitch percepts can also be produced by complex tones consisting entirely of unresolved harmonics. In general, though, these pitches are weaker and more dependent on phase relationships among the partials than the pitch based on resolved harmonics (Bernstein and Oxenham 2003bGo; Carlyon and Shackleton 1994Go; Houtsma and Smurzynski 1990Go). With unresolved harmonics, there are no spectral cues to pitch, and therefore harmonic template models are not applicable. On the other hand, unresolved harmonics produce direct temporal cues to pitch because the waveform of a combination of unresolved harmonics has a period equal to that of the complex tone. These periodicity cues, which are reflected in neural phase locking, can be extracted by an autocorrelation-type mechanism (Licklider 1951Go; Meddis and Hewitt 1991Go; Moore 1990Go; Yost 1996Go), which is mathematically equivalent to an all-order interspike-interval distribution for neural spike trains. The autocorrelation model also works with resolved harmonics, since the period of the F0 is always an integer multiple of the period of any of the harmonics; this common period can be extracted by combining (e.g., summing) autocorrelation functions from frequency channels tuned to different resolved harmonics (Meddis and Hewitt 1991Go; Moore 1990Go).

Previous neurophysiological studies of the coding of the pitch of complex tones in the auditory nerve and cochlear nucleus have documented a robust temporal representation based on pooled interspike-interval distributions obtained by summing the interval distributions from neurons covering a wide range of characteristic frequencies (Cariani and Delgutte 1996aGo,bGo; Palmer 1990Go; Palmer and Winter 1993Go; Rhode 1995Go; Shofner 1991Go). This representation accounts for a wide variety of pitch phenomena, such as the pitch of the missing fundamental, the pitch shift of inharmonic tones, pitch ambiguity, the pitch equivalence of stimuli with similar periodicity, the relative phase invariance of pitch, and, to some extent, the dominance of low-frequency harmonics in pitch. Despite its remarkable effectiveness, the autocorrelation model has difficulty in accounting for the greater pitch salience of stimuli containing resolved harmonics compared to stimuli consisting entirely of unresolved harmonics (Bernstein and Oxenham 2003aGo; Carlyon 1998Go; Carlyon and Shackleton 1994Go; Meddis and O'Mard 1997Go). This issue was not addressed in previous physiological studies because they did not have a means of assessing whether individual harmonics are resolved or not. Moreover, the upper F0 limit over which the interspike-interval representation of pitch is physiologically viable has not been determined. The existence of such a limit is expected due to the degradation in neural phase locking with increasing frequency (Johnson 1980Go).

In contrast to the wealth of data on the interspike-interval representation of pitch, possible rate-place cues to pitch that might be available when individual harmonics are resolved by the peripheral auditory system have rarely been investigated. The few studies that provide relevant information (Hirahara et al. 1996Go; Sachs and Young 1979Go; Shamma 1985aGo,bGo) show no evidence for rate-place cues to pitch, even at low stimulus levels where the limited dynamic range of individual neurons is not an issue. The reason for this failure could be that the stimuli used had low fundamental frequencies in the range of human voice (100–300 Hz) and therefore produced few, if any, resolved harmonics in typical experimental animals, which have a poorer cochlear frequency selectivity compared to humans (Shera et al. 2002Go). Rate-place cues to pitch might be available in animals for complex tones with higher F0s in the range of conspecific vocalizations, which corresponds to about 500–1,000 Hz for cats (Brown et al. 1978Go; Nicastro and Owren 2003Go; Shipley et al. 1991Go). This hypothesis is consistent with a report that up to 13 harmonics of a complex tone could be resolved in the rate responses of high-CF units in the cat anteroventral cochlear nucleus (Smoorenburg and Linschoten 1977Go).

In this study, we investigated the resolvability of harmonics of complex tones in the cat auditory nerve and compared the effectiveness of rate-place and interval-based representations of pitch over a much wider range of fundamental frequencies (110–3,520 Hz) than in previous studies. We found that the two representations are complementary with respect to the F0 range over which they are effective, but that neither representation is entirely satisfactory in accounting for human psychophysical data. Preliminary reports of our findings have been presented (Cedolin and Delgutte 2003Go, 2005aGo).


    METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Procedures

Methods for recording from auditory nerve (AN) fibers in anesthetized cats are as described by Kiang et al. (1965Go) and Cariani and Delgutte (1996a)Go. Cats were anesthetized with Dial in urethane (75 mg/kg), with supplementary doses given as needed to maintain an areflexic state. The posterior portion of the skull was removed, and the cerebellum was retracted to expose the auditory nerve. The tympanic bullae and the middle-ear cavities were opened to expose the round window. Throughout the experiment, the cat was given injections of dexamethasone (0.26 mg/kg) to prevent brain swelling and Ringer solution (50 ml/d) to prevent dehydration.

The cat was placed on a vibration-isolated table in an electrically shielded, temperature-controlled, soundproof chamber. A silver electrode was positioned at the round window to record the compound action potential (CAP) in response to click stimuli, in order to assess the condition and stability of cochlear function.

Sound was delivered to the cat's ear through a closed acoustic assembly driven by an electrodynamic speaker (Realistic 40–1377). The acoustic system was calibrated to allow accurate control over the sound-pressure level at the tympanic membrane. Stimuli were generated by a 16-bit D/A converter (Concurrent DA04H) using sampling rates of 20 or 50 kHz. Stimuli were digitally filtered to compensate for the transfer characteristics of the acoustic system.

Spikes were recorded with glass micropipettes filled with 2 M KCl. The electrode was inserted into the nerve and mechanically advanced using a micropositioner (Kopf 650). The electrode signal was band-pass filtered and fed to a custom spike detector. The times of spike peaks were recorded with 1-µs resolution and saved to disk for subsequent analysis.

A click stimulus at ~55 dB SPL was used to search for single units. Upon contact with a fiber, a frequency tuning curve was measured by an automatic tracking algorithm (Kiang et al. 1970Go) using 100-ms tone bursts, and the characteristic frequency (CF) was determined. The spontaneous firing rate (SR) of the fiber was measured over an interval of 20 s. The responses to complex-tone stimuli were then studied.

Complex-tone stimuli

Stimuli were harmonic complex tones whose F0 was stepped up and down over a two-octave range. The harmonics of each complex tone were all of equal amplitude, and the fundamental component was always missing. Depending on the fiber’s CF, one of four presynthesized stimuli covering different F0 ranges was selected so that some of the harmonics would likely be resolved (Table 1). For example, for a fiber with a 1,760-Hz CF, we typically used F0s ranging from 220 to 880 Hz so that the order of the harmonic closest to the CF would vary from 2 to 8. In each of the four stimuli, the harmonics were restricted to a fixed frequency region as F0 varied (Table 1). For each fiber, the stimulus was selected so that the CF fell approximately at the center of the frequency region spanned by the harmonics. In some cases, data were collected from the same fiber in response to two different stimuli whose harmonics spanned overlapping frequency ranges.


View this table:
[in this window]
[in a new window]
 
TABLE 1. Parameters of the four complex tone stimuli with varying F0 and range of CFs for which each stimulus was used

 
Each of the 50 F0 steps (25 up, 25 down) lasted 200 ms, including a 20-ms transition period during which the waveform for one F0 gradually decayed while overlapping with the gradual build up of the waveform for the subsequent F0. Spikes recorded during these transition periods were not included in the analysis. Responses were typically collected over 20 repetitions of the 10-s stimulus (50 steps x 200 ms) with no interruption.

We used mostly low and moderate stimulus levels in order to minimize rate saturation, which would prevent us from accurately assessing harmonic resolvability by the cochlea. Specifically, the sound pressure level of each harmonic was initially set at 15–20 dB above the fiber's threshold for a pure tone at CF and ranged from 10 to 70 dB SPL, with a median of 25 dB SPL. Because our stimuli contain many harmonics, overall stimulus levels are about 5–10 dB higher than the level of each harmonic, depending on F0. In some cases, responses were measured for two or more stimulus levels differing by 10–20 dB.

To compare neural responses to psychophysical data on the phase dependence of pitch, three versions of each stimulus were generated with different phase relationships among the harmonics: cosine phase, alternating (sine-cosine) phase, and negative Schroeder phase (Schroeder 1970Go). The three stimuli have the same power spectrum and autocorrelation function, but differ in their temporal fine structure and envelope: while the cosine-phase and alternating-phase stimuli have very "peaky" envelopes, the envelope of the Schroeder-phase stimulus is nearly flat (Fig. 1). Moreover, the envelope periodicity is at F0 for the cosine-phase stimulus, but at 2 x F0 for the alternating- phase stimulus. Alternating-phase stimuli have been widely used in previous studies of neural coding (Horst et al. 1990Go; Palmer and Winter 1992Go, 1993Go).



View larger version (30K):
[in this window]
[in a new window]
 
FIG. 1. Different phase relationships among harmonics give rise to different stimulus waveforms. For harmonics in cosine phase (top), waveform shows one peak per period of fundamental frequency (F0). When the harmonics are in alternating phase (middle), waveform peaks twice every period of the F0. A negative Schroeder phase relationship among the harmonics (bottom) minimizes amplitude of oscillations of the envelope of waveform.

 
Average-rate analysis

For each step in the F0 sequence, spikes were counted over a 180-ms window extending over the stimulus duration but excluding the transition period between F0 steps. Spikes counts from the two stimulus segments having the same F0 (from the ascending and descending parts of the F0 sequence) were added together because response to both directions were generally similar. The spike counts were converted to units of discharge rate (spikes/s) and plotted either as a function of F0 for a given fiber or as a function of fiber CF for a given F0 to form a "rate-place profile" (Sachs and Young 1979Go).

To assess the statistical reliability of these discharge rate estimates, "bootstrap" resampling (Efron and Tibshirani 1993Go) was performed on the data recorded from each fiber. One hundred resampled data sets were generated by drawing with replacement from the set of spike trains in response to each F0. Spike counts in the ascending and descending part of the F0 sequence were drawn independently from each other. Spike counts from each bootstrap data set were converted to discharge rate estimates as for the original data, and the standard deviation of these estimates was used as an error bar for the mean discharge rate.

Simple phenomenological models were used to analyze average- rate responses to the complex-tone stimuli. Specifically, a single-fiber model was fit to responses of a given fiber as a function of stimulus F0 to quantify harmonic resolvability, while a population model was used to estimate pitch from profiles of average discharge rate against CF for a given F0.

The single-fiber model (Fig. 2) is a cascade of three stages. The linear band-pass filtering stage, representing cochlear frequency selectivity, is implemented by a symmetric rounded exponential function (Patterson 1976Go). The model of Sachs and Abbas (1974Go) of rate- level functions is then used to derive the mean discharge rate r from the r.m.s. amplitude p at the output of the band-pass filter

(1)
In this expression, rsp is the spontaneous rate, rdmax is the maximum driven rate, and p50 is the value of p for which the driven rate reaches one-half of its maximum value. The exponent {alpha} was fixed at 1.77 to obtain a dynamic range of about 20 dB (Sachs and Abbas 1974Go). The single-fiber model has a total of five free parameters: the center frequency and bandwidth of the band-pass filter, rsp, rdmax, and p50. This number is considerably smaller than the 25 F0 values for which responses were obtained in each fiber. The model was fit to the data by the least squares method using the Levenberg-Marquardt algorithm as implemented by Matlab's "lsqcurvefit" function.



View larger version (9K):
[in this window]
[in a new window]
 
FIG. 2. Single-fiber average-rate model. First stage represents cochlear frequency selectivity, implemented by a symmetric rounded-exponential band-pass filter (Patterson 1976). The model of Sachs and Abbas (1974Go) of rate-level functions is used to compute the mean discharge rate from the r.m.s. amplitude at the output of the band-pass filter.

 
The population model is an array of single-fiber models indexed on CF so as to predict the AN rate response to any stimulus as a function of cochlear place. The population model has no free parameters; rather, it is used to find the stimulus parameters (F0 and SPL) most likely to have produced the measured rate-place profile, assuming that the spike counts are statistically independent random variables with Poisson distributions whose expected values are given by the model response at each CF. The resulting maximum-likelihood F0 estimate gives a rate-based estimate of pitch that does not require a priori knowledge of the stimulus F0. This strategy effectively implements the concept of "harmonic template" used in pattern recognition models of pitch (Cohen et al. 1994Go; Goldstein 1973Go; Terhardt 1974Go; Wightman 1973Go): here, the template is the model response to a harmonic complex tone with equal amplitude harmonics. In practice, the maximum-likelihood F0 estimate was obtained by computing the model responses to complex tones covering a wide range of F0 in fine increments (0.1%) and finding the F0 value that maximizes the likelihood of the data.

While the population model has no free parameters, five fixed (i.e., stimulus-independent) parameters still need to be specified for each fiber in the modeled population. These parameters were selected so as to meet two separate requirements: 1) the model's normalized driven rate must vary smoothly with CF, and 2) the model must completely specify the Poisson distribution of spike counts for each fiber so as to be able to apply the maximum-likelihood method. To meet these requirements, three of the population-model parameters were directly obtained from the corresponding parameters for the single-fiber model: the center frequency of the band-pass filter (effectively the CF), the spontaneous rate rsp, and the maximum driven rate rdmax. The sensitivity parameter p50 in the population model was set to the median value of this parameter over our fiber sample. Finally, the bandwidth of the band-pass filter was derived from its center frequency by assuming a power law relationship between the two (Shera et al. 2002Go). The parameters of this power function were obtained by fitting a straight line in double logarithmic coordinates to a scatter plot of filter bandwidth against center frequency for our sample of fibers.

Interspike-interval analysis

As in previous studies of the neural coding of pitch (Cariani and Delgutte 1996aGo,bGo; Rhode 1995Go), we derived pitch estimates from pooled interspike-interval distributions. The pooled interval distribution is the sum of the all-order interspike-interval distributions for all the sampled auditory-nerve fibers and is closely related to the summary autocorrelation in the model of Meddis and Hewitt (1991Go). The single-fiber interval distribution (bin width 0.1 ms) was computed for each F0 using spikes occurring in the same time window as used in the rate analysis.

To derive pitch estimates from pooled interval distributions, we used "periodic templates" that select intervals at a given period and its multiples. Specifically, we define the contrast ratio of a periodic template as the ratio of the weighted mean number of intervals for bins within the template to the weighted mean number of intervals per bin in the entire histogram. The estimated pitch period is the period of the template that maximizes the contrast ratio. In computing the contrast ratio, each interval is weighted by an exponentially decaying function of its length to give greater weight to short intervals. This weighting implements the idea that the lower F0 limit of pitch at about 30 Hz (Pressnitzer et al. 2001Go) implies that the auditory system is unable to use very long intervals in forming pitch percepts. A 3.6-ms decay time constant was found empirically to minimize the number of octave and suboctave errors in pitch estimation. The statistical reliability of the pitch estimates was assessed by generating 100 bootstrap replications of the pooled interval distribution (using the same resampling techniques as in the rate analysis) and computing a pitch estimate for each bootstrap replication.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Our results are based on 122 measurements of responses to harmonic complex tones recorded from 75 AN fibers in two cats. Of these, 54 had high SR (>18 spikes/s), 10 had low SR (<0.5 spike/s), and 11 had medium SR. The CFs of the fibers ranged from 450 to 9,200 Hz. We first describe the rate responses of single fibers as a function of F0 to characterize harmonic resolvability. We then derive pitch estimates from both rate-place profiles and pooled interspike-interval distributions and quantify the accuracy and precision of these estimates as a function of F0.

Single-fiber cues to resolved harmonics

Figure 3 shows the average discharge rate as a function of complex-tone F0 (harmonics in cosine phase) for two AN fibers with CFs of 952 (A) and 4,026 Hz (B), respectively. Data are plotted against the dimensionless ratio of fiber CF to stimulus F0, which we call harmonic number (lower horizontal axis). Because this ratio varies inversely with F0, F0 increases from right to left along the top axis in these plots. The harmonic number takes an integer value when the CF coincides with one of the harmonics of the stimulus, while it is an odd integer multiple of 0.5 (2.5, 3.5, etc.) when the CF falls halfway between two harmonics. Thus resolved harmonics should appear as peaks in firing rate for integer values of the harmonic number, with valleys in between. This prediction is verified for both fibers at lower values of the harmonic number (higher F0s), although the oscillations are more pronounced and extend to higher harmonic numbers for the high-CF fiber than for the low-CF fiber. This observation is consistent with the higher quality factor (Q = CF/Bandwidth) of high-CF fibers compared with low-CF fibers (Kiang et al. 1965Go; Liberman 1978Go).



View larger version (32K):
[in this window]
[in a new window]
 
FIG. 3. Average discharge rate against complex-tone F0 for 2 auditory nerve (AN) fibers from the same cat with characteristic frequencies (CFs) of 952 (A) and 4,026 Hz (B). Because the bottom axis shows the harmonic number CF/F0, F0 increases from right to left along the top axis. Filled circles with error bars show mean discharge rate ± SD obtained by bootstrap resampling of the stimulus trials. Solid lines show response of best-fitting single-fiber model (Fig. 2). The top and bottom envelopes of the fitted curve are shown by dotted lines. Intersection of the bottom envelope with 2 typical SD from the top envelope (gray shading) gives the maximum harmonic number Nmax for which harmonics are resolved (vertical lines).

 
To quantify the range of harmonics that can be resolved by each fiber, a simple peripheral auditory model was fit to the data (see METHODS). For both fibers, the response of the best-fitting model (Fig. 3, solid lines) captures the oscillatory trend in the data. The rate of decay of these oscillations is determined by the bandwidth of the band-pass filter representing cochlear frequency selectivity in the model (Fig. 2). The harmonics of F0 are considered to be resolved so long as the oscillations in the fitted curve exceed two typical standard deviation of the discharge rate obtained by bootstrapping (gray shading). The maximum resolved harmonic number Nmax is 4.1 for the low-CF fiber, smaller than Nmax for the high-CF fiber (6.3). The ratio CF/Nmax gives F0min, the lowest fundamental frequency for which harmonics are resolved in a fiber's rate response. In the examples of Fig. 3, F0min is 232 Hz for the low-CF fiber and 639 Hz for the high-CF fiber.

Figure 4 shows how F0min varies with CF for our entire sample of fibers. To be included in this plot, the variance of the residuals after fitting the single-fiber model to the data had to be significantly smaller (P < 0.05, F-test) than the variance of the raw data so that Nmax (and therefore F0min) could be reliably estimated. Thirty-five of 122 measurements were thus excluded; 23 of these had CFs <2,000 Hz. On the other hand, the figure includes data from fibers (shown by triangles) for which F0min was bounded by the lowest F0 presented and was therefore overestimated. F0min increases systematically with CF, and the increase is well fit by a power function with an exponent of 0.63 (solid line). This increase is consistent with the increase in tuning curve bandwidths with CF (Kiang et al. 1965Go).



View larger version (18K):
[in this window]
[in a new window]
 
FIG. 4. Lowest resolved F0 as a function of CF. Each point shows data from 1 AN fiber. Triangles show data points for which F0min was somewhat overestimated because harmonics were still resolved for the lowest F0 presented. Solid line shows best-fitting straight line on double logarithmic coordinates (a power law).

 
Rate responses of AN fibers to complex stimuli are known to depend strongly on stimulus level (Sachs and Young 1979Go). The representation of resolved harmonics in rate responses is expected to degrade as the firing rates become saturated. The increase in cochlear filter bandwidths with level may further degrade harmonic resolvability. We were able to reliably fit single-fiber models to the rate-F0 data for stimulus levels as high as 38 dB above the threshold at CF per component. In general, this limit increased with CF, from roughly 15 dB above threshold for CFs below 1 kHz to about 30 dB above threshold for CFs above 5 kHz. No obvious dependence of this limit on fiber's spontaneous rate was noticed.

To more directly address the level dependence of responses, we held 24 fibers long enough to record the responses to harmonic complex tones at two or more stimulus levels differing by 10–20 dB. In 23 of these 24 cases, the maximum resolved harmonic number Nmax decreased with increasing level. One example is shown in Fig. 5 for a fiber with CF at 1,983 Hz. For this fiber, Nmax decreased from 7.1 at 20 dB SPL to 4.9 at 30 dB SPL.



View larger version (37K):
[in this window]
[in a new window]
 
FIG. 5. Effect of stimulus level on harmonic resolvability in rate responses of an AN fiber (CF = 1,983 Hz). Open and filled circles show mean discharge rate against F0 for complex tones at 20 and 30 dB SPL, respectively. Solid lines show response of best-fitting model when model parameters were constrained to be the same for both stimulus levels. Other features as in Fig. 3.

 
The observed decrease in Nmax with level could reflect either broadened cochlear tuning or rate saturation. To distinguish between these two hypotheses, two versions of the single-fiber model were compared when data were available at two stimulus levels. In one version, all the model parameters were constrained to be the same at both levels; in the other version, the bandwidth of the band-pass filter representing cochlear frequency selectivity was allowed to vary with level. The variable-bandwidth model is guaranteed to fit the data better (in a least-squares sense) than the fixed-bandwidth model because it has an additional free parameter. However, an F-test for the ratio of the variances of the residuals revealed no statistically significant difference between the two models at the 0.05 level for any of the 24 fibers, meaning than the additional free parameter of the variable-bandwidth model gave it only an insignificant advantage over the fixed-bandwidth model. This result suggests that rate saturation, which is present in both models, may be the main factor responsible for the decrease in Nmax with stimulus level.

Pitch estimation from rate-place profiles

Having characterized the limits of harmonic resolvability in rate responses of AN fibers, the next step is to determine how accurately pitch can be estimated from rate-place cues to resolved harmonics. For this purpose, we fit harmonic templates to profiles of average discharge rate against CF and derive pitch estimates by the maximum likelihood method, assuming that the spike counts from each fiber are random variables with statistically independent Poisson distributions. In our implementation, a harmonic template is the response of a peripheral auditory model to a complex tone with equal-amplitude harmonics. The estimated pitch is therefore the F0 of the complex tone most likely to have produced the observed response if the stimulus-response relationship were defined by the model.

Figure 6 shows the normalized driven discharge rate of AN fibers as a function of CF in response to two complex tones (harmonics in cosine phase) with F0s of 541.5 (A) and 1,564.4 Hz (C). The rate is normalized by subtracting the spontaneous rate and dividing by the maximum driven rate (Sachs and Young 1979Go), and these parameters are estimated by fitting the single-fiber model to the rate-F0 data. As for the single-fiber responses in Figs. 3 and 5, responses are plotted against the dimensionless harmonic number CF/F0, with the difference that F0 is now fixed while CF varies, instead of the opposite. Resolved harmonics should again result in peaks in firing rate at integer values of the harmonic number. Despite considerable scatter in the data, this prediction is verified for both F0s, although the oscillations are more pronounced for the higher F0. Many factors are likely to contribute to the scatter, including the threshold differences among fibers with the same CF (Liberman 1978Go), pooling data from two animals, intrinsic variability in neural responses, and inaccuracies in estimating the minimum and maximum discharge rates used in computing the normalized rate.



View larger version (29K):
[in this window]
[in a new window]
 
FIG. 6. Maximum-likelihood pitch estimation from rate-place profiles using harmonic templates for 2 complex tones with F0s at 542 (A and B) and 1,564 Hz (C and D), respectively. A and C: filled circles show normalized driven rate as a function of both CF (top axis) and harmonic number CF/F0 (bottom axis). Solid lines show maximum-likelihood harmonic template, which is the response of a population model to a complex tone with equal-amplitude harmonics. B and D: log-likelihood of the harmonic template model in producing the data points as a function of template F0.

 
The solid lines in Fig. 6, A and C, show the normalized rate response of the population model to the complex tone whose F0 maximizes the likelihood, i.e., the best-fitting harmonic template. Note that, while Fig. 6 shows the normalized model response, unnormalized rates (actually, spike counts) are used when applying the maximum likelihood method because only spike counts have the integer values required for a Poisson distribution. For both F0s, the model response shows local maxima near integer values of the harmonic number, indicating that the pitch estimates are very close to the stimulus F0s. This point is shown more precisely in Fig. 6, B and D, which show the log-likelihood of the model response as a function of template F0. Despite the very moderate number of data points in the rate-place profiles, for both F0s, the likelihood shows a sharp maximum when the template F0 is very close to the stimulus F0. For the complex tone with F0 at 541.5 Hz, the estimated pitch is 554.2 Hz, about 2% above the actual F0. For the 1,564.4 Hz tone, the estimated pitch is 1,565.7 Hz, only 0.1% above the actual F0. The likelihood functions also show secondary maxima for template F0s that form ratios of small integers (e.g., 4/3, 3/4) with respect to the stimulus F0. However, because these secondary maxima are much lower than the absolute maximum, the estimated pitch is highly unambiguous, consistent with psychophysical observations for complex tones containing many harmonics (Houtsma and Smurzynski 1990Go).

To assess the reliability of the maximum-likelihood pitch estimates, estimates were computed for 100 bootstrap resamplings of the data for each F0 (see METHODS). Figure 7A shows the median absolute estimation error of these bootstrap estimates as a function of F0 for complex tones with harmonics in cosine phase. With few exceptions, median pitch estimates only deviate by a few percent from the stimulus F0 above 500 Hz. Larger deviations are more common for lower F0s. The number and CF distribution of the fibers had to meet certain constraints for each F0 to be included in the figure because, to reliably estimate F0, the sampling of the CF axis has to be sufficiently dense to capture the harmonically related oscillations in the rate-CF profiles. This is why Fig. 7 shows no estimates for F0s below 220 Hz and for a small subset of F0s (12 of 56) above 220 Hz.



View larger version (20K):
[in this window]
[in a new window]
 
FIG. 7. Pitch estimation based on rate-place profiles: median absolute estimation error (A) and normalized Fisher Information (B) as a function of F0. Median is obtained over 100 bootstrap resamplings of the data. Triangles indicate data points for which the median was out of the range defined by the vertical axes. The Fisher Information is a measure of pitch strength defined as the curvature of the log-likelihood with respect to template F0 at the location of the maximum.

 
To quantify the salience of the rate-place cues to pitch as a function of F0, we used the Fisher Information, which is the expected value of the curvature of the log-likelihood function, evaluated at its maximum. The expected value was approximated by averaging the likelihood function over 100 bootstrap replications of the rate-place data. A steep curvature means that the likelihood varies fast with template F0 and therefore that the F0 estimate is very reliable. The Fisher information was normalized by the number of data points in the rate profile for each F0 to allow comparisons between data sets of different size. Figure 7B shows that the Fisher information increases monotonically with F0 up to about 1,000 Hz and then remains essentially constant. Overall, pitch estimation from rate-place profiles works best for F0s above 400–500 Hz, although reliable estimates were obtained for F0s as low as 250 Hz.

Harmonic templates were fit to rate-place profiles obtained in response to complex tones with harmonics in alternating phase and in Schroeder phase as well as in cosine phase to test whether the pitch estimates depend on phase. Figure 8 shows an example for an F0 of 392 Hz. The numbers of data points differ somewhat for the three phase conditions because we could not always "hold" a unit sufficiently long to measure responses to all three conditions. Despite these sampling differences, the pitch estimates for the three phase conditions are similar to each other (Fig. 8, AC) and similar to the pitch estimate obtained by combining data across all three phase conditions (Fig. 8D).



View larger version (35K):
[in this window]
[in a new window]
 
FIG. 8. Effect of the relative phase of the harmonics on pitch estimation based on rate-place profiles. Stimulus F0 = 392 Hz. A–C: normalized driven rate (symbols) and maximum-likelihood harmonic templates (black lines) for harmonics in cosine (squares), alternating (triangles), and Schroeder (circles) phase, respectively. D: normalized driven rate and maximum-likelihood harmonic template (gray line) for data pooled across phases (symbols as in AC). Maximum- likelihood harmonic template for data pooled across phases is plotted in gray also in AC.

 
We devised a statistical test for the effect of phase on pitch estimates. This test compares the likelihoods of the rate-place data given two different models. In one model, the estimated pitch is constrained to be the same for all three phase conditions by finding the F0 value that maximizes the likelihood of the combined data (Fig. 8D). In the other model, a maximum likelihood pitch estimate is obtained separately for each phase condition (Fig. 8, AC), and then the maximum log-likelihoods are summed over the three conditions, based on the assumption that the three data sets are statistically independent. If the rate-place patterns for the different phases differed appreciably, the maximum likelihood for the phase-dependent model should be higher than that for the phase-independent model because the phase-dependent model has the additional flexibility of fitting each data set separately. Contrary to this expectation, when the two models were fit to 1,000 bootstrap replications of the rate-place data, the distributions of the maximum likelihoods for the two models did not significantly differ (P = 0.178), indicating that the additional free parameters of the phase-dependent model offer no significant advantage for this F0.

This test was performed for three different values of F0 (612, 670, and 828 Hz) in addition to the 392-Hz case shown in Fig. 8. 1 In three of these four cases, the results were as in Fig. 8 in that the differences in maximum likelihoods for the two models did not reach statistical significance (P < 0.05). For 612 Hz, the comparison did reach significance (P = 0.007), but for this F0, the rate-place profiles for harmonics in alternating and Schroeder phase showed large gaps in the distribution of data points over harmonic numbers, making the reliability of the F0-estimates for these two phases questionable. When the actual pitch estimates for the different phase conditions were compared, there was no clear pattern to the results across F0s, i.e., the pitch estimate for any given phase condition could be the largest in one case and the smallest in another case. These results indicate that phase relationships among the partials of a complex tone do not seem to greatly influence the pitch estimated from rate-place profiles, consistent with psychophysical data on the phase invariance of pitch based on resolved harmonics (Houtsma and Smurzynski 1990Go).

Pitch estimation from pooled interspike-interval distributions

Pitch estimates were derived from pooled interspike-interval distributions to compare the accuracy of these estimates with that of rate-place estimates for the same stimuli. Figure 9, A and B, shows pooled all-order interspike-interval distributions for two complex-tone stimuli with F0s of 320 and 880 Hz (harmonics in cosine phase). For both F0s, the pooled distributions show modes at the period of F0 and its integer multiples. However, these modes are less prominent at the higher F0 for which only the first few harmonics are located in the range of robust phase locking.



View larger version (29K):
[in this window]
[in a new window]
 
FIG. 9. Pitch estimation based on pooled all-order interspike-interval distributions for 2 F0s: 320 (A and C) and 880 Hz (B and D). A and B: pooled all-order interspike interval distributions with periodic templates maximizing the "contrast ratio" (vertical dotted lines). C and D: periodic template contrast ratio as a function of its F0.

 
In previous work (Cariani and Delgutte 1996aGo,bGo; Palmer 1990Go; Palmer and Winter 1993Go), the pitch period was estimated from the location of the largest maximum (mode) in the pooled interval distribution. This simple method is also widely used in autocorrelation models of pitch (Meddis and Hewitt 1991Go; Yost 1982Go, 1996Go). However, when tested over a wide range of F0s, this method was found to yield severe pitch estimation errors for two reasons. First, for higher F0s as in Fig. 9B, the first interval mode near 1/F0 is always smaller than the modes at integer multiples of the fundamental period, due to the neural relative refractory period. Moreover, the location of the first mode is slightly but systematically delayed with respect to the period of F0 (Fig. 9B), an effect also attributed to refractoriness (McKinney and Delgutte 1999Go; Ohgushi 1983Go). In fact, a peak at the pitch period is altogether lacking if the period is shorter than the absolute refractory period, about 0.6 ms for AN fibers (McKinney and Delgutte 1999Go). These difficulties at higher F0s might be overcome by using shuffled autocorrelograms (Louage et al. 2004Go), which, unlike conventional autocorrelograms, are not distorted by neural refractoriness. However, a more fundamental problem is that, at lower F0s, the modes at 1/F0 and its multiples all have approximately the same height (Fig. 9A) so that, due to the intrinsic variability in neural responses, some of the later modes will unavoidably be larger than the first mode in many cases and therefore lead to erroneous pitch estimates at integer submultiples of F0.

We therefore modified our pitch estimation method to make use of all pitch-related modes in the pooled interval distribution rather than just the first one. Specifically, we used periodic templates that select intervals at a given period and its multiples and determined the template F0 which maximizes the contrast ratio, a signal-to-noise ratio measure of the number of intervals within the template relative to the mean number of intervals per bin (see METHODS). When computing the contrast ratio, short intervals were weighted more than long intervals according to an exponentially decaying weighting function of interval length. This weighting implements the psychophysical observation of a lower limit of pitch near 30 Hz (Pressnitzer et al. 2001Go) by preventing long intervals to contribute significantly to pitch. Figure 9, C and D, shows the template contrast ratio as a function of template F0 for the same two stimuli as on top. For both stimuli, the contrast ratio reaches an absolute maximum when the template F0 is very close to the stimulus F0, although the peak contrast ratio is larger for the lower F0. The contrast ratio also shows local maxima one octave above and below the stimulus F0. In Fig. 9C, these secondary maxima are small relative to the main peak at F0, but in Fig. 9D, the maximum at F0/2 is almost as large as the one at F0. Despite the close call, F0 was correctly estimated in both cases of Fig. 9, and overall, our pitch estimation algorithm produced essentially no octave or sub-octave errors over the entire range of F0 investigated (110–3,520 Hz).

Figure 10 shows measures of the accuracy and strength of the interval-based pitch estimates as a function of F0 for harmonics in cosine phase. The accuracy measure is the median absolute value of the pitch estimation error over bootstrap replications of the pooled interval distributions. The estimates are highly accurate below 1,300 Hz, where their medians are within 1–2% of the stimulus F0 (Fig. 10A). However, the interval-based estimates of pitch abruptly break down near 1,300 Hz. While the existence of such an upper limit is consistent with the degradation in phase locking at high frequencies, the location of this limit at 1,300 Hz is low compared with the 4- to 5-kHz upper limit of phase locking, a point to which we return in the DISCUSSION.



View larger version (22K):
[in this window]
[in a new window]
 
FIG. 10. Pitch estimation based on pooled all-order interspike-interval distributions: median absolute estimation error and best-template contrast ratio as a function of F0. A: median (over 100 bootstrap resampling trials) pitch absolute estimation error, expressed as percentage of the true F0. Triangles: pitch absolute estimation error exceeded 10% of the true F0. B: contrast ratio of the best-matching periodic template.

 
Figure 10B shows a measure of the strength of the estimated pitch, the contrast ratio of the best-fitting periodic template, as a function of F0. The contrast ratio is largest below 500 Hz and decreases gradually with increasing F0, to reach essentially unity (meaning a flat interval distribution) at 1,300 Hz. For F0s >1,300 Hz, the modes in the pooled interval distribution essentially disappear into the noise floor. Thus the strength of interval-based estimates of pitch is highest in the F0 range where rate-based pitch estimates are the least reliable due to the lack of strongly resolved harmonics. Conversely, rate-based estimates of pitch become increasingly strong in the range of F0s where the interval-based estimates break down.

For a few F0s, interval-based estimates of pitch were derived for complex tones with harmonics in alternating phase and in Schroeder phase as well as for harmonics in cosine phase. Figure 11 compares the pooled all-order interval distributions in the three phase conditions for two F0s: 130 (left) and 612 Hz (right). Based on the rate-place results, the harmonics of the 130-Hz F0 are not resolved, whereas some of the harmonics of the 612-Hz F0 are resolved. This is because we obtained a reliable pitch estimate based on rate-place profiles at 612 Hz but not at 130 Hz (Fig. 7).



View larger version (43K):
[in this window]
[in a new window]
 
FIG. 11. Effect of the relative phase of the harmonics on pitch estimation based on pooled all-order interval distributions for 2 FOs (130 Hz in AD and 612 Hz in EH). Pooled interval distributions for harmonics in cosine phase (A and E), alternating sine-cosine phase (B and F), and Schroeder phase (C and G). Arrows in B point to secondary local maxima in the interval distribution at one-half the period (i.e., twice the frequency) of the stimulus F0 and its odd multiples. D and H: periodic template contrast ratio as a function of its F0 for the 3 phase conditions.

 
For both F0s and all phase conditions, the distributions have modes at the period of the fundamental frequency and its integer multiples (Fig. 11 AC and EG). For the lower F0 (130 Hz), the pooled interval distribution for harmonics in alternating phase (Fig. 11B) also shows secondary peaks at half the period of F0 and its odd multiples (arrows), reflecting the periodicity of the stimulus envelope at 2 x F0 (Fig. 1). Such frequency doubling has previously been observed in AN fiber responses to alternating-phase stimuli using both period histograms (Horst et al. 1990Go; Palmer and Winter 1992Go) and autocorrelograms (Palmer and Winter 1993Go). At 130 Hz, the pooled interval distribution for harmonics in negative Schroeder phase (Fig. 11C) shows pronounced modes at the period of F0 and its multiples and strongly resembles the interval distribution for harmonics in cosine phase (Fig. 11A), even though the waveforms of the two stimuli have markedly different envelopes (Fig. 1).

The interval-based pitch estimates are nearly identical for all three phase conditions, but the maximum contrast ratio is substantially lower for harmonics in alternating phase than for harmonics in cosine or in Schroeder phase (Fig. 11D). In addition, for harmonics in alternating phase, the contrast ratio of the periodic template at the envelope frequency 2 x F0 is almost as large as the contrast ratio at F0. In contrast, for the higher F0 (612 Hz), there are no obvious differences between phase conditions in the pooled all-order interval distributions (Fig. 11, EG). In particular, the secondary peaks at half the period of F0, which were found at 130 Hz for the alternating-phase stimulus, are no longer present at 612 Hz. Moreover, the maximum contrast ratios are essentially the same for all three phase conditions (Fig. 11H).

Overall, these results show that, while phase relationships among harmonics have little effect on the pitch values estimated from pooled interval distributions, which are always close to the stimulus F0, the salience of these estimates can be significantly affected by phase when harmonics are unresolved. These results are consistent with psychophysical results showing a greater effect of phase on pitch and pitch salience for stimuli consisting of unresolved harmonics than for stimuli containing resolved harmonics (Houtsma and Smurzynski 1990Go; Shackleton and Carlyon 1994Go). However, these results fail to account for the observation that the dominant pitch is often heard at the envelope frequency 2 x F0 for unresolved harmonics in alternating phase.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Harmonic resolvability in AN fiber responses

We examined the response of cat AN fibers to complex tones with a missing fundamental and equal-amplitude harmonics. We used low and moderate stimulus levels (15–20 dB above threshold) to minimize rate saturation that would prevent us from accurately assessing cochlear frequency selectivity and therefore harmonic resolvability from rate responses. In general, the average-rate of a single AN fiber was stronger when its CF was near a low-order harmonic of a complex tone than when the CF fell halfway in between two harmonics (Fig. 3). This trend could be predicted using a phenomenological model of single-fiber rate responses incorporating a band-pass filter representing cochlear frequency selectivity (Fig. 2). The amplitude of the oscillations in the response of the best-fitting single-fiber model, relative to the typical variability in the data, gave an estimate of the lower F0 of complex tones whose harmonics are resolved at a given CF (Fig. 3). This limit, which we call F0min, increases systematically with CF, and this increase is well fit by a power function with an exponent of 0.63 (Fig. 4). That the exponent is less than 1 is consistent with the progressive sharpening of peripheral tuning with increasing CF when expressed as a Q factor, the ratio CF/Bandwidth. The exponent for Q would be 0.37, which closely matches the 0.37 exponent found by Shera et al. (2002Go) for the CF dependence of Q10 in pure-tone tuning curves from AN fibers in the cat.

Our definition of the lower limit of resolvability F0min is to some extent arbitrary because it depends on the variability in the average discharge rates, which in turn depends on the number of stimulus repetitions and the duration of the stimulus. Nevertheless, our results are consistent with those of Wilson and Evans (1971Go) for AN fibers in the guinea pig using ripple noise (comb-filtered noise), a stimulus with broad spectral maxima at harmonically related frequencies. These authors found that the number of such maxima that can be resolved in the rate responses of single fibers (equivalent to our Nmax) increases with CF from 2–3 at 200 Hz to about 10 at 10 kHz and above. Similarly, Smoorenburg and Linschoten (1977Go) reported that the number of harmonics of a complex tone that are resolved in the rate responses of single units in the cat anteroventral cochlear nucleus (AVCN) increases from 2 at 250 Hz to 13 at 10 kHz. Despite the different metrics used to define resolvability, both studies are in good agreement with the data of Fig. 4 if we use the conversion F0min = CF/Nmax.

Consistent with a previous report for AVCN neurons (Smoorenburg and Linschoten 1977Go), we found that the ability of AN fibers to resolve harmonics in their rate response degrades rapidly with increasing stimulus level. This degradation could be due either to the broadening of cochlear tuning with increasing level or to saturation of the average rate. Saturation seems to be the most likely explanation because a single-fiber model with level-dependent bandwidth did not fit the data significantly better than a model with fixed bandwidth. However, the level dependence of cochlear filter bandwidths might have a greater effect on responses to complex tones if level were varied over a wider range than the 10–20 dB used here (Cooper and Rhode 1997Go; Ruggero et al. 1997Go).

Rate-place representation of pitch

A major finding is that the pitch of complex tones could be reliably and accurately estimated from rate-place profiles for fundamental frequencies above 400–500 Hz by fitting a harmonic template to the data (Figs. 6 and 7, A and B). The harmonic template was implemented as the response of a simple peripheral auditory model to a harmonic complex tone with equal-amplitude harmonics, and the estimated pitch was the F0 of the complex tone most likely to have produced the rate-place data assuming that the stimulus-response relationship is characterized by the model. Despite the nonuniform sampling of CFs and the moderate number of fibers sampled at each F0 (typically 20–40), these pitch estimates were accurate within a few percent.

Pitch estimation became increasingly less reliable for F0s below 400–500 Hz, with large estimation errors becoming increasingly common. Nevertheless, some reliable estimates could be obtained for F0s as low as 250 Hz. This result is consistent with the failure of previous studies to identify rate-place cues to pitch in AN responses to harmonic complex tones with F0s below 300 Hz (Hirahara et al. 1996Go; Sachs and Young 1979Go; Shamma 1985aGo,bGo), although Hirahara et al. did find a weak representation of the first two to three harmonics in rate-place profiles for vowels with an F0 at 350 Hz.

In interpreting these results, it is important to keep in mind that the precision of the rate-based pitch estimates depends on many factors such as the number of fibers sampled, the CF distribution of the fibers, pooling of data from two animals, the number of stimulus repetitions, and the particular method for fitting harmonic templates. For example, since the lowest CF sampled was 450 Hz, the second harmonic and, in some cases, the third could not be represented in the rate-place profiles for F0s <220 Hz, possibly explaining why we never obtained a reliable pitch estimate in that range. In fact, because our stimuli had missing fundamentals, we cannot rule out that the fundamental might always be resolved when it is present.

In one respect, our method may somewhat overestimate the accuracy of the rate-based pitch estimates because we only included data from measurements for which the rate response as a function of F0 oscillated sufficiently to be able to reliably fit a single-fiber model. This constraint was necessary because, for responses that do not oscillate, we could not reliably estimate the minimum and maximum discharge rates that are essential in fitting harmonic templates to the rate-place data. Thirty-five of 122 responses were thus excluded. Because our design minimizes rate saturation, and because 23 of these 35 excluded responses were from fibers with CFs <2 kHz, we infer that insufficient frequency selectivity for resolving harmonics rather than rate saturation was the primary reason for the lack of F0-related oscillations in these measurements.

A factor whose effect on pitch estimation performance is hard to evaluate is that the rate-place profiles included responses to stimuli presented at different sound levels. At first sight, pooling data across levels might seem to increase response variability and therefore decrease estimation performance. However, because the stimulus level was usually selected to be 15–20 dB above the threshold of each fiber so that responses would be robust without being saturated, our procedure might actually have reduced the variability due to threshold differences among fibers. The rationale for this procedure is that an optimal central processor would focus on unsaturated fibers because these fibers are the most informative. Because level (re. threshold) rather than absolute level is the primary determinant of rate responses, we are effectively invoking a form of the "selective listening hypothesis" (Delgutte 1982Go, 1987Go; Lai et al. 1994Go), according to which the central processor attends to low-threshold, high-spontaneous rate fibers at low levels and to high-threshold, low-spontaneous rate fibers at high levels.

Our harmonic template differs from those typically used in pattern recognition models of pitch in that it has very broad peaks at the harmonic frequencies. Most pattern recognition models (Duifhuis et al. 1982Go; Goldstein 1973Go; Terhardt 1974Go) use very narrow templates or "sieves," typically a few percent of each harmonic's frequency. One exception is the model of Wightman (1973Go), which effectively uses broad cosinusoidal templates by performing a Fourier transform operation on the spectrum. Our method also resembles the Wightman model and differs from the other models in that it avoids an intermediate, error-prone stage that estimates the frequencies of the individual resolved harmonics; rather, a global template is fit to the entire rate-place profile. Broad templates are well adapted to the measured rate-place profiles because the dips between the harmonics are often sharper than the peaks at the harmonic frequencies (Figs. 6 and 8). On the other hand, the templates are the response of the peripheral model to complex tones with equal-amplitude harmonics, which exactly match the stimuli that were presented. It remains to be seen how well such templates would work when the spectral envelope of the stimulus is unknown or when the amplitudes of the individual harmonics are roved from trial to trial, conditions that cause little degradation in psychophysical performance (Bernstein and Oxenham 2003aGo; Houtsma and Smurzynski 1990Go).

Given the uncertainties about HOW the various factors discussed above may affect our pitch estimation procedure, a comparison of the pitch estimation performance with psychophysical data should focus on robust overall trends as a function of stimulus parameters rather than on absolute measures of performance. Both the precision of the pitch estimates (Fig. 7A) and their salience (as measured by the Fisher information; Fig. 7B), improve with increasing F0 as the harmonics of the complex become increasingly resolved. This result is in agreement with psychophysical observations that both pitch strength and pitch discrimination performance improve as the degree of harmonic resolvability increases (Bernstein and Oxenham 2003bGo; Carlyon and Shackleton 1994Go; Houtsma and Smurzynski 1990Go; Plomp 1967Go; Ritsma 1967Go). However, the continued increase in Fisher information with F0 beyond 1,000 Hz conflicts with the existence of an upper limit to the pitch of missing-fundamental stimuli, which occurs at about 1,400 Hz in humans (Moore 1973bGo). This discrepancy between the rapid degradation in pitch discrimination at high frequencies and the lack of a concomitant degradation in cochlear frequency selectivity is a general problem for place models of pitch perception and frequency discrimination (Moore 1973aGo).

We also found that the relative phases of the resolved harmonics of a complex tone do not greatly influence rate-based estimates of pitch (Fig. 8). This result is consistent with expectations for a purely place representation of pitch, as well as with psychophysical results for stimuli containing resolved harmonics (Houtsma and Smurzynski 1990Go; Shackleton and Carlyon 1994Go; Wightman 1973Go).

The restriction of our data to low and moderate stimulus levels raises the question of whether the rate-place representation of pitch would remain robust at the higher stimulus levels typically used in speech communication or when listening to music. Previous studies have used signal detection theory to quantitatively assess the ability of rate-place information in the AN to account for behavioral performance in tasks such as intensity discrimination (Colburn et al. 2003Go; Delgutte 1987Go; Viemeister 1988Go; Winslow and Sachs 1988Go; Winter and Palmer 1991Go) and formant-frequency discrimination for vowels (Conley and Keilson 1995Go; May et al. 1996Go). These studies give a mixed message. On the one hand, the rate-place representation generally contains sufficient information to account for behavioral performance up to the highest sound levels tested. On the other hand, because the fraction of high-threshold fibers is small compared to low-threshold fibers, predicted performance of optimal processor models degrades markedly with increasing level, whereas psychophysical performance remains stable. Thus while a rate-place representation cannot be ruled out, it fails to account for a major trend in the psychophysical data. Extending this type of analysis to pitch discrimination for harmonic complex tones is beyond the scope of this paper. Given the failure of the rate-place representation to account for the level dependence of performance in the other tasks, a more productive approach may be to explore alternative spatio-temporal representations that would rely on harmonic resolvability like the rate-place representation, but would be more robust with respect to level variations by exploiting phase locking (Heinz et al. 2001Go; Shamma 1985aGo). Preliminary tests of one such spatio-temporal representation are encouraging (Cedolin and Delgutte 2005bGo).

Interspike-interval representation of pitch

Our results confirm previous findings (Cariani and Delgutte 1996aGo,bGo; Palmer 1990Go; Palmer and Winter 1993Go), that fundamental frequencies of harmonic complex tones are precisely represented in pooled all-order interspike-interval distributions of the AN. These interval distributions have prominent modes at the period of F0 and its integer multiples (Fig. 9, A and B). Pitch estimates derived using periodic templates that select intervals at a given period and its multiples were highly accurate (often within 1%) for F0s up to 1,300 Hz (Fig. 10). The determination of this upper limit to the interval-based representation of pitch is a new finding. Moreover, the use of periodic templates for pitch estimation improves on the traditional method of picking the largest mode in the interval distribution by greatly reducing suboctave errors.

While the existence of an upper limit to the representation of pitch in interspike intervals is expected from the degradation in phase locking at high frequencies, the location of this limit at 1,300 Hz is low compared with the usually quoted 4- to 5-kHz limit of phase locking in the AN (Johnson 1980Go; Rose et al. 1967Go). Of course, both the limit of pitch representation and the limit of phase locking depend to some extent on the signal-to-noise ratio of the data, which in turn depends on the duration of the stimulus, the number of stimulus repetitions and, for pooled interval distributions, the number of sampled fibers. However the discrepancy between the two limits appears too large to be entirely accounted for by differences in signal-to-noise ratio. Fortunately, the discrepancy can be largely reconciled by taking into account harmonic resolvability and the properties of our stimuli. For F0s near 1,300 Hz, all the harmonics within the CF range of our data (450–9,200 Hz) are well resolved (Fig. 4), so that information about pitch in pooled interval distributions must depend on phase locking to individu