|
|
||||||||
1Eaton-Peabody Laboratory, Massachusetts Eye and Ear Infirmary, Boston; 2Speech and Hearing Bioscience and Technology Program, Harvard-Massachusetts Institute of Technology Division of Health Sciences and Technology; and 3Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts
Submitted 26 October 2004; accepted in final form 17 March 2005
|
|
ABSTRACT |
|---|
|
|
|
INTRODUCTION |
|---|
|
Investigating the neural mechanisms underlying the perception of the pitch of harmonic complex tones is of great importance for a variety of reasons. Changes in pitch convey melody in music, and the superposition of different pitches is the basis for harmony. Pitch has an important role in speech, where it carries prosodic features and information about speaker identity. In tone languages such as Mandarin Chinese, pitch also cues lexical contrasts. Pitch plays a major role in auditory scene analysis: differences in pitch are a major cue for sound source segregation, while frequency components that share a common fundamental tend to be grouped into a single auditory object (Bregman 1990
; Darwin and Carlyon 1995
).
Pitch perception with missing fundamental stimuli is not unique to humans; it also occurs in birds (Cynx and Shapiro 1986
) and nonhuman mammals (Heffner and Whitfield 1976
; Tomlinson and Schwartz 1988
), making animal models suitable for studying neural representations of pitch. Pitch perception mechanisms in animals may play a role in processing conspecific vocalizations, which often contain harmonic complex tones.
The neural mechanisms underlying pitch perception of harmonic complex tones have been at the center of a debate among scientists for over a century (Ohm 1843
; Seebeck 1841
). This debate arises because the peripheral auditory system provides two types of cues to the pitch of complex tones: place cues dependent upon the frequency selectivity and tonotopic mapping of the cochlea and temporal cues dependent on neural phase locking.
The peripheral auditory system can be thought of as containing a bank of band-pass filters representing the mechanical frequency analysis performed by the basilar membrane. When two partials of a complex tone are spaced sufficiently apart relative to the auditory filter bandwidths, each of them produces an individual local maximum in the spatial pattern of basilar membrane motion. In this case, the two harmonics are said to be "resolved" by the auditory periphery. On the other hand, when two or more harmonics fall within the pass-band of a single peripheral filter, they are said to be "unresolved." Because the bandwidths of the auditory filters increase with their center frequency, only low-order harmonics are resolved. Based on psychophysical data, the first 610 harmonics are thought to be resolved in humans (Bernstein and Oxenham 2003b
; Plomp 1964
).
When a complex tone contains resolved harmonics, its pitch can be extracted by matching the pattern of activity across a tonotopic neural map to internally stored harmonic templates (Cohen et al. 1994
; Goldstein 1973
; Terhardt 1974
; Wightman 1973
). This type of model accounts for many pitch phenomena, including the pitch of the missing fundamental, the pitch shift associated with inharmonic complexes, and the pitch ambiguity of complex tones comprising only a few harmonics. However, a key issue in these models is the exact nature of the neural representation upon which the hypothetical template matching mechanism operates.
Pitch percepts can also be produced by complex tones consisting entirely of unresolved harmonics. In general, though, these pitches are weaker and more dependent on phase relationships among the partials than the pitch based on resolved harmonics (Bernstein and Oxenham 2003b
; Carlyon and Shackleton 1994
; Houtsma and Smurzynski 1990
). With unresolved harmonics, there are no spectral cues to pitch, and therefore harmonic template models are not applicable. On the other hand, unresolved harmonics produce direct temporal cues to pitch because the waveform of a combination of unresolved harmonics has a period equal to that of the complex tone. These periodicity cues, which are reflected in neural phase locking, can be extracted by an autocorrelation-type mechanism (Licklider 1951
; Meddis and Hewitt 1991
; Moore 1990
; Yost 1996
), which is mathematically equivalent to an all-order interspike-interval distribution for neural spike trains. The autocorrelation model also works with resolved harmonics, since the period of the F0 is always an integer multiple of the period of any of the harmonics; this common period can be extracted by combining (e.g., summing) autocorrelation functions from frequency channels tuned to different resolved harmonics (Meddis and Hewitt 1991
; Moore 1990
).
Previous neurophysiological studies of the coding of the pitch of complex tones in the auditory nerve and cochlear nucleus have documented a robust temporal representation based on pooled interspike-interval distributions obtained by summing the interval distributions from neurons covering a wide range of characteristic frequencies (Cariani and Delgutte 1996a
,b
; Palmer 1990
; Palmer and Winter 1993
; Rhode 1995
; Shofner 1991
). This representation accounts for a wide variety of pitch phenomena, such as the pitch of the missing fundamental, the pitch shift of inharmonic tones, pitch ambiguity, the pitch equivalence of stimuli with similar periodicity, the relative phase invariance of pitch, and, to some extent, the dominance of low-frequency harmonics in pitch. Despite its remarkable effectiveness, the autocorrelation model has difficulty in accounting for the greater pitch salience of stimuli containing resolved harmonics compared to stimuli consisting entirely of unresolved harmonics (Bernstein and Oxenham 2003a
; Carlyon 1998
; Carlyon and Shackleton 1994
; Meddis and O'Mard 1997
). This issue was not addressed in previous physiological studies because they did not have a means of assessing whether individual harmonics are resolved or not. Moreover, the upper F0 limit over which the interspike-interval representation of pitch is physiologically viable has not been determined. The existence of such a limit is expected due to the degradation in neural phase locking with increasing frequency (Johnson 1980
).
In contrast to the wealth of data on the interspike-interval representation of pitch, possible rate-place cues to pitch that might be available when individual harmonics are resolved by the peripheral auditory system have rarely been investigated. The few studies that provide relevant information (Hirahara et al. 1996
; Sachs and Young 1979
; Shamma 1985a
,b
) show no evidence for rate-place cues to pitch, even at low stimulus levels where the limited dynamic range of individual neurons is not an issue. The reason for this failure could be that the stimuli used had low fundamental frequencies in the range of human voice (100300 Hz) and therefore produced few, if any, resolved harmonics in typical experimental animals, which have a poorer cochlear frequency selectivity compared to humans (Shera et al. 2002
). Rate-place cues to pitch might be available in animals for complex tones with higher F0s in the range of conspecific vocalizations, which corresponds to about 5001,000 Hz for cats (Brown et al. 1978
; Nicastro and Owren 2003
; Shipley et al. 1991
). This hypothesis is consistent with a report that up to 13 harmonics of a complex tone could be resolved in the rate responses of high-CF units in the cat anteroventral cochlear nucleus (Smoorenburg and Linschoten 1977
).
In this study, we investigated the resolvability of harmonics of complex tones in the cat auditory nerve and compared the effectiveness of rate-place and interval-based representations of pitch over a much wider range of fundamental frequencies (1103,520 Hz) than in previous studies. We found that the two representations are complementary with respect to the F0 range over which they are effective, but that neither representation is entirely satisfactory in accounting for human psychophysical data. Preliminary reports of our findings have been presented (Cedolin and Delgutte 2003
, 2005a
).
|
|
METHODS |
|---|
|
Methods for recording from auditory nerve (AN) fibers in anesthetized cats are as described by Kiang et al. (1965
) and Cariani and Delgutte (1996a)
. Cats were anesthetized with Dial in urethane (75 mg/kg), with supplementary doses given as needed to maintain an areflexic state. The posterior portion of the skull was removed, and the cerebellum was retracted to expose the auditory nerve. The tympanic bullae and the middle-ear cavities were opened to expose the round window. Throughout the experiment, the cat was given injections of dexamethasone (0.26 mg/kg) to prevent brain swelling and Ringer solution (50 ml/d) to prevent dehydration.
The cat was placed on a vibration-isolated table in an electrically shielded, temperature-controlled, soundproof chamber. A silver electrode was positioned at the round window to record the compound action potential (CAP) in response to click stimuli, in order to assess the condition and stability of cochlear function.
Sound was delivered to the cat's ear through a closed acoustic assembly driven by an electrodynamic speaker (Realistic 401377). The acoustic system was calibrated to allow accurate control over the sound-pressure level at the tympanic membrane. Stimuli were generated by a 16-bit D/A converter (Concurrent DA04H) using sampling rates of 20 or 50 kHz. Stimuli were digitally filtered to compensate for the transfer characteristics of the acoustic system.
Spikes were recorded with glass micropipettes filled with 2 M KCl. The electrode was inserted into the nerve and mechanically advanced using a micropositioner (Kopf 650). The electrode signal was band-pass filtered and fed to a custom spike detector. The times of spike peaks were recorded with 1-µs resolution and saved to disk for subsequent analysis.
A click stimulus at
55 dB SPL was used to search for single units. Upon contact with a fiber, a frequency tuning curve was measured by an automatic tracking algorithm (Kiang et al. 1970
) using 100-ms tone bursts, and the characteristic frequency (CF) was determined. The spontaneous firing rate (SR) of the fiber was measured over an interval of 20 s. The responses to complex-tone stimuli were then studied.
Complex-tone stimuli
Stimuli were harmonic complex tones whose F0 was stepped up and down over a two-octave range. The harmonics of each complex tone were all of equal amplitude, and the fundamental component was always missing. Depending on the fibers CF, one of four presynthesized stimuli covering different F0 ranges was selected so that some of the harmonics would likely be resolved (Table 1). For example, for a fiber with a 1,760-Hz CF, we typically used F0s ranging from 220 to 880 Hz so that the order of the harmonic closest to the CF would vary from 2 to 8. In each of the four stimuli, the harmonics were restricted to a fixed frequency region as F0 varied (Table 1). For each fiber, the stimulus was selected so that the CF fell approximately at the center of the frequency region spanned by the harmonics. In some cases, data were collected from the same fiber in response to two different stimuli whose harmonics spanned overlapping frequency ranges.
|
We used mostly low and moderate stimulus levels in order to minimize rate saturation, which would prevent us from accurately assessing harmonic resolvability by the cochlea. Specifically, the sound pressure level of each harmonic was initially set at 1520 dB above the fiber's threshold for a pure tone at CF and ranged from 10 to 70 dB SPL, with a median of 25 dB SPL. Because our stimuli contain many harmonics, overall stimulus levels are about 510 dB higher than the level of each harmonic, depending on F0. In some cases, responses were measured for two or more stimulus levels differing by 1020 dB.
To compare neural responses to psychophysical data on the phase dependence of pitch, three versions of each stimulus were generated with different phase relationships among the harmonics: cosine phase, alternating (sine-cosine) phase, and negative Schroeder phase (Schroeder 1970
). The three stimuli have the same power spectrum and autocorrelation function, but differ in their temporal fine structure and envelope: while the cosine-phase and alternating-phase stimuli have very "peaky" envelopes, the envelope of the Schroeder-phase stimulus is nearly flat (Fig. 1). Moreover, the envelope periodicity is at F0 for the cosine-phase stimulus, but at 2 x F0 for the alternating- phase stimulus. Alternating-phase stimuli have been widely used in previous studies of neural coding (Horst et al. 1990
; Palmer and Winter 1992
, 1993
).
|
For each step in the F0 sequence, spikes were counted over a 180-ms window extending over the stimulus duration but excluding the transition period between F0 steps. Spikes counts from the two stimulus segments having the same F0 (from the ascending and descending parts of the F0 sequence) were added together because response to both directions were generally similar. The spike counts were converted to units of discharge rate (spikes/s) and plotted either as a function of F0 for a given fiber or as a function of fiber CF for a given F0 to form a "rate-place profile" (Sachs and Young 1979
).
To assess the statistical reliability of these discharge rate estimates, "bootstrap" resampling (Efron and Tibshirani 1993
) was performed on the data recorded from each fiber. One hundred resampled data sets were generated by drawing with replacement from the set of spike trains in response to each F0. Spike counts in the ascending and descending part of the F0 sequence were drawn independently from each other. Spike counts from each bootstrap data set were converted to discharge rate estimates as for the original data, and the standard deviation of these estimates was used as an error bar for the mean discharge rate.
Simple phenomenological models were used to analyze average- rate responses to the complex-tone stimuli. Specifically, a single-fiber model was fit to responses of a given fiber as a function of stimulus F0 to quantify harmonic resolvability, while a population model was used to estimate pitch from profiles of average discharge rate against CF for a given F0.
The single-fiber model (Fig. 2) is a cascade of three stages. The linear band-pass filtering stage, representing cochlear frequency selectivity, is implemented by a symmetric rounded exponential function (Patterson 1976
). The model of Sachs and Abbas (1974
) of rate- level functions is then used to derive the mean discharge rate r from the r.m.s. amplitude p at the output of the band-pass filter
![]() | (1) |
was fixed at 1.77 to obtain a dynamic range of about 20 dB (Sachs and Abbas 1974
|
While the population model has no free parameters, five fixed (i.e., stimulus-independent) parameters still need to be specified for each fiber in the modeled population. These parameters were selected so as to meet two separate requirements: 1) the model's normalized driven rate must vary smoothly with CF, and 2) the model must completely specify the Poisson distribution of spike counts for each fiber so as to be able to apply the maximum-likelihood method. To meet these requirements, three of the population-model parameters were directly obtained from the corresponding parameters for the single-fiber model: the center frequency of the band-pass filter (effectively the CF), the spontaneous rate rsp, and the maximum driven rate rdmax. The sensitivity parameter p50 in the population model was set to the median value of this parameter over our fiber sample. Finally, the bandwidth of the band-pass filter was derived from its center frequency by assuming a power law relationship between the two (Shera et al. 2002
). The parameters of this power function were obtained by fitting a straight line in double logarithmic coordinates to a scatter plot of filter bandwidth against center frequency for our sample of fibers.
Interspike-interval analysis
As in previous studies of the neural coding of pitch (Cariani and Delgutte 1996a
,b
; Rhode 1995
), we derived pitch estimates from pooled interspike-interval distributions. The pooled interval distribution is the sum of the all-order interspike-interval distributions for all the sampled auditory-nerve fibers and is closely related to the summary autocorrelation in the model of Meddis and Hewitt (1991
). The single-fiber interval distribution (bin width 0.1 ms) was computed for each F0 using spikes occurring in the same time window as used in the rate analysis.
To derive pitch estimates from pooled interval distributions, we used "periodic templates" that select intervals at a given period and its multiples. Specifically, we define the contrast ratio of a periodic template as the ratio of the weighted mean number of intervals for bins within the template to the weighted mean number of intervals per bin in the entire histogram. The estimated pitch period is the period of the template that maximizes the contrast ratio. In computing the contrast ratio, each interval is weighted by an exponentially decaying function of its length to give greater weight to short intervals. This weighting implements the idea that the lower F0 limit of pitch at about 30 Hz (Pressnitzer et al. 2001
) implies that the auditory system is unable to use very long intervals in forming pitch percepts. A 3.6-ms decay time constant was found empirically to minimize the number of octave and suboctave errors in pitch estimation. The statistical reliability of the pitch estimates was assessed by generating 100 bootstrap replications of the pooled interval distribution (using the same resampling techniques as in the rate analysis) and computing a pitch estimate for each bootstrap replication.
|
|
RESULTS |
|---|
|
Single-fiber cues to resolved harmonics
Figure 3 shows the average discharge rate as a function of complex-tone F0 (harmonics in cosine phase) for two AN fibers with CFs of 952 (A) and 4,026 Hz (B), respectively. Data are plotted against the dimensionless ratio of fiber CF to stimulus F0, which we call harmonic number (lower horizontal axis). Because this ratio varies inversely with F0, F0 increases from right to left along the top axis in these plots. The harmonic number takes an integer value when the CF coincides with one of the harmonics of the stimulus, while it is an odd integer multiple of 0.5 (2.5, 3.5, etc.) when the CF falls halfway between two harmonics. Thus resolved harmonics should appear as peaks in firing rate for integer values of the harmonic number, with valleys in between. This prediction is verified for both fibers at lower values of the harmonic number (higher F0s), although the oscillations are more pronounced and extend to higher harmonic numbers for the high-CF fiber than for the low-CF fiber. This observation is consistent with the higher quality factor (Q = CF/Bandwidth) of high-CF fibers compared with low-CF fibers (Kiang et al. 1965
; Liberman 1978
).
|
Figure 4 shows how F0min varies with CF for our entire sample of fibers. To be included in this plot, the variance of the residuals after fitting the single-fiber model to the data had to be significantly smaller (P < 0.05, F-test) than the variance of the raw data so that Nmax (and therefore F0min) could be reliably estimated. Thirty-five of 122 measurements were thus excluded; 23 of these had CFs <2,000 Hz. On the other hand, the figure includes data from fibers (shown by triangles) for which F0min was bounded by the lowest F0 presented and was therefore overestimated. F0min increases systematically with CF, and the increase is well fit by a power function with an exponent of 0.63 (solid line). This increase is consistent with the increase in tuning curve bandwidths with CF (Kiang et al. 1965
).
|
To more directly address the level dependence of responses, we held 24 fibers long enough to record the responses to harmonic complex tones at two or more stimulus levels differing by 1020 dB. In 23 of these 24 cases, the maximum resolved harmonic number Nmax decreased with increasing level. One example is shown in Fig. 5 for a fiber with CF at 1,983 Hz. For this fiber, Nmax decreased from 7.1 at 20 dB SPL to 4.9 at 30 dB SPL.
|
Pitch estimation from rate-place profiles
Having characterized the limits of harmonic resolvability in rate responses of AN fibers, the next step is to determine how accurately pitch can be estimated from rate-place cues to resolved harmonics. For this purpose, we fit harmonic templates to profiles of average discharge rate against CF and derive pitch estimates by the maximum likelihood method, assuming that the spike counts from each fiber are random variables with statistically independent Poisson distributions. In our implementation, a harmonic template is the response of a peripheral auditory model to a complex tone with equal-amplitude harmonics. The estimated pitch is therefore the F0 of the complex tone most likely to have produced the observed response if the stimulus-response relationship were defined by the model.
Figure 6 shows the normalized driven discharge rate of AN fibers as a function of CF in response to two complex tones (harmonics in cosine phase) with F0s of 541.5 (A) and 1,564.4 Hz (C). The rate is normalized by subtracting the spontaneous rate and dividing by the maximum driven rate (Sachs and Young 1979
), and these parameters are estimated by fitting the single-fiber model to the rate-F0 data. As for the single-fiber responses in Figs. 3 and 5, responses are plotted against the dimensionless harmonic number CF/F0, with the difference that F0 is now fixed while CF varies, instead of the opposite. Resolved harmonics should again result in peaks in firing rate at integer values of the harmonic number. Despite considerable scatter in the data, this prediction is verified for both F0s, although the oscillations are more pronounced for the higher F0. Many factors are likely to contribute to the scatter, including the threshold differences among fibers with the same CF (Liberman 1978
), pooling data from two animals, intrinsic variability in neural responses, and inaccuracies in estimating the minimum and maximum discharge rates used in computing the normalized rate.
|
To assess the reliability of the maximum-likelihood pitch estimates, estimates were computed for 100 bootstrap resamplings of the data for each F0 (see METHODS). Figure 7A shows the median absolute estimation error of these bootstrap estimates as a function of F0 for complex tones with harmonics in cosine phase. With few exceptions, median pitch estimates only deviate by a few percent from the stimulus F0 above 500 Hz. Larger deviations are more common for lower F0s. The number and CF distribution of the fibers had to meet certain constraints for each F0 to be included in the figure because, to reliably estimate F0, the sampling of the CF axis has to be sufficiently dense to capture the harmonically related oscillations in the rate-CF profiles. This is why Fig. 7 shows no estimates for F0s below 220 Hz and for a small subset of F0s (12 of 56) above 220 Hz.
|
Harmonic templates were fit to rate-place profiles obtained in response to complex tones with harmonics in alternating phase and in Schroeder phase as well as in cosine phase to test whether the pitch estimates depend on phase. Figure 8 shows an example for an F0 of 392 Hz. The numbers of data points differ somewhat for the three phase conditions because we could not always "hold" a unit sufficiently long to measure responses to all three conditions. Despite these sampling differences, the pitch estimates for the three phase conditions are similar to each other (Fig. 8, AC) and similar to the pitch estimate obtained by combining data across all three phase conditions (Fig. 8D).
|
This test was performed for three different values of F0 (612, 670, and 828 Hz) in addition to the 392-Hz case shown in Fig. 8. 1 In three of these four cases, the results were as in Fig. 8 in that the differences in maximum likelihoods for the two models did not reach statistical significance (P < 0.05). For 612 Hz, the comparison did reach significance (P = 0.007), but for this F0, the rate-place profiles for harmonics in alternating and Schroeder phase showed large gaps in the distribution of data points over harmonic numbers, making the reliability of the F0-estimates for these two phases questionable. When the actual pitch estimates for the different phase conditions were compared, there was no clear pattern to the results across F0s, i.e., the pitch estimate for any given phase condition could be the largest in one case and the smallest in another case. These results indicate that phase relationships among the partials of a complex tone do not seem to greatly influence the pitch estimated from rate-place profiles, consistent with psychophysical data on the phase invariance of pitch based on resolved harmonics (Houtsma and Smurzynski 1990
).
Pitch estimation from pooled interspike-interval distributions
Pitch estimates were derived from pooled interspike-interval distributions to compare the accuracy of these estimates with that of rate-place estimates for the same stimuli. Figure 9, A and B, shows pooled all-order interspike-interval distributions for two complex-tone stimuli with F0s of 320 and 880 Hz (harmonics in cosine phase). For both F0s, the pooled distributions show modes at the period of F0 and its integer multiples. However, these modes are less prominent at the higher F0 for which only the first few harmonics are located in the range of robust phase locking.
|
We therefore modified our pitch estimation method to make use of all pitch-related modes in the pooled interval distribution rather than just the first one. Specifically, we used periodic templates that select intervals at a given period and its multiples and determined the template F0 which maximizes the contrast ratio, a signal-to-noise ratio measure of the number of intervals within the template relative to the mean number of intervals per bin (see METHODS). When computing the contrast ratio, short intervals were weighted more than long intervals according to an exponentially decaying weighting function of interval length. This weighting implements the psychophysical observation of a lower limit of pitch near 30 Hz (Pressnitzer et al. 2001
) by preventing long intervals to contribute significantly to pitch. Figure 9, C and D, shows the template contrast ratio as a function of template F0 for the same two stimuli as on top. For both stimuli, the contrast ratio reaches an absolute maximum when the template F0 is very close to the stimulus F0, although the peak contrast ratio is larger for the lower F0. The contrast ratio also shows local maxima one octave above and below the stimulus F0. In Fig. 9C, these secondary maxima are small relative to the main peak at F0, but in Fig. 9D, the maximum at F0/2 is almost as large as the one at F0. Despite the close call, F0 was correctly estimated in both cases of Fig. 9, and overall, our pitch estimation algorithm produced essentially no octave or sub-octave errors over the entire range of F0 investigated (1103,520 Hz).
Figure 10 shows measures of the accuracy and strength of the interval-based pitch estimates as a function of F0 for harmonics in cosine phase. The accuracy measure is the median absolute value of the pitch estimation error over bootstrap replications of the pooled interval distributions. The estimates are highly accurate below 1,300 Hz, where their medians are within 12% of the stimulus F0 (Fig. 10A). However, the interval-based estimates of pitch abruptly break down near 1,300 Hz. While the existence of such an upper limit is consistent with the degradation in phase locking at high frequencies, the location of this limit at 1,300 Hz is low compared with the 4- to 5-kHz upper limit of phase locking, a point to which we return in the DISCUSSION.
|
For a few F0s, interval-based estimates of pitch were derived for complex tones with harmonics in alternating phase and in Schroeder phase as well as for harmonics in cosine phase. Figure 11 compares the pooled all-order interval distributions in the three phase conditions for two F0s: 130 (left) and 612 Hz (right). Based on the rate-place results, the harmonics of the 130-Hz F0 are not resolved, whereas some of the harmonics of the 612-Hz F0 are resolved. This is because we obtained a reliable pitch estimate based on rate-place profiles at 612 Hz but not at 130 Hz (Fig. 7).
|
The interval-based pitch estimates are nearly identical for all three phase conditions, but the maximum contrast ratio is substantially lower for harmonics in alternating phase than for harmonics in cosine or in Schroeder phase (Fig. 11D). In addition, for harmonics in alternating phase, the contrast ratio of the periodic template at the envelope frequency 2 x F0 is almost as large as the contrast ratio at F0. In contrast, for the higher F0 (612 Hz), there are no obvious differences between phase conditions in the pooled all-order interval distributions (Fig. 11, EG). In particular, the secondary peaks at half the period of F0, which were found at 130 Hz for the alternating-phase stimulus, are no longer present at 612 Hz. Moreover, the maximum contrast ratios are essentially the same for all three phase conditions (Fig. 11H).
Overall, these results show that, while phase relationships among harmonics have little effect on the pitch values estimated from pooled interval distributions, which are always close to the stimulus F0, the salience of these estimates can be significantly affected by phase when harmonics are unresolved. These results are consistent with psychophysical results showing a greater effect of phase on pitch and pitch salience for stimuli consisting of unresolved harmonics than for stimuli containing resolved harmonics (Houtsma and Smurzynski 1990
; Shackleton and Carlyon 1994
). However, these results fail to account for the observation that the dominant pitch is often heard at the envelope frequency 2 x F0 for unresolved harmonics in alternating phase.
|
|
DISCUSSION |
|---|
|
We examined the response of cat AN fibers to complex tones with a missing fundamental and equal-amplitude harmonics. We used low and moderate stimulus levels (1520 dB above threshold) to minimize rate saturation that would prevent us from accurately assessing cochlear frequency selectivity and therefore harmonic resolvability from rate responses. In general, the average-rate of a single AN fiber was stronger when its CF was near a low-order harmonic of a complex tone than when the CF fell halfway in between two harmonics (Fig. 3). This trend could be predicted using a phenomenological model of single-fiber rate responses incorporating a band-pass filter representing cochlear frequency selectivity (Fig. 2). The amplitude of the oscillations in the response of the best-fitting single-fiber model, relative to the typical variability in the data, gave an estimate of the lower F0 of complex tones whose harmonics are resolved at a given CF (Fig. 3). This limit, which we call F0min, increases systematically with CF, and this increase is well fit by a power function with an exponent of 0.63 (Fig. 4). That the exponent is less than 1 is consistent with the progressive sharpening of peripheral tuning with increasing CF when expressed as a Q factor, the ratio CF/Bandwidth. The exponent for Q would be 0.37, which closely matches the 0.37 exponent found by Shera et al. (2002
) for the CF dependence of Q10 in pure-tone tuning curves from AN fibers in the cat.
Our definition of the lower limit of resolvability F0min is to some extent arbitrary because it depends on the variability in the average discharge rates, which in turn depends on the number of stimulus repetitions and the duration of the stimulus. Nevertheless, our results are consistent with those of Wilson and Evans (1971
) for AN fibers in the guinea pig using ripple noise (comb-filtered noise), a stimulus with broad spectral maxima at harmonically related frequencies. These authors found that the number of such maxima that can be resolved in the rate responses of single fibers (equivalent to our Nmax) increases with CF from 23 at 200 Hz to about 10 at 10 kHz and above. Similarly, Smoorenburg and Linschoten (1977
) reported that the number of harmonics of a complex tone that are resolved in the rate responses of single units in the cat anteroventral cochlear nucleus (AVCN) increases from 2 at 250 Hz to 13 at 10 kHz. Despite the different metrics used to define resolvability, both studies are in good agreement with the data of Fig. 4 if we use the conversion F0min = CF/Nmax.
Consistent with a previous report for AVCN neurons (Smoorenburg and Linschoten 1977
), we found that the ability of AN fibers to resolve harmonics in their rate response degrades rapidly with increasing stimulus level. This degradation could be due either to the broadening of cochlear tuning with increasing level or to saturation of the average rate. Saturation seems to be the most likely explanation because a single-fiber model with level-dependent bandwidth did not fit the data significantly better than a model with fixed bandwidth. However, the level dependence of cochlear filter bandwidths might have a greater effect on responses to complex tones if level were varied over a wider range than the 1020 dB used here (Cooper and Rhode 1997
; Ruggero et al. 1997
).
Rate-place representation of pitch
A major finding is that the pitch of complex tones could be reliably and accurately estimated from rate-place profiles for fundamental frequencies above 400500 Hz by fitting a harmonic template to the data (Figs. 6 and 7, A and B). The harmonic template was implemented as the response of a simple peripheral auditory model to a harmonic complex tone with equal-amplitude harmonics, and the estimated pitch was the F0 of the complex tone most likely to have produced the rate-place data assuming that the stimulus-response relationship is characterized by the model. Despite the nonuniform sampling of CFs and the moderate number of fibers sampled at each F0 (typically 2040), these pitch estimates were accurate within a few percent.
Pitch estimation became increasingly less reliable for F0s below 400500 Hz, with large estimation errors becoming increasingly common. Nevertheless, some reliable estimates could be obtained for F0s as low as 250 Hz. This result is consistent with the failure of previous studies to identify rate-place cues to pitch in AN responses to harmonic complex tones with F0s below 300 Hz (Hirahara et al. 1996
; Sachs and Young 1979
; Shamma 1985a
,b
), although Hirahara et al. did find a weak representation of the first two to three harmonics in rate-place profiles for vowels with an F0 at 350 Hz.
In interpreting these results, it is important to keep in mind that the precision of the rate-based pitch estimates depends on many factors such as the number of fibers sampled, the CF distribution of the fibers, pooling of data from two animals, the number of stimulus repetitions, and the particular method for fitting harmonic templates. For example, since the lowest CF sampled was 450 Hz, the second harmonic and, in some cases, the third could not be represented in the rate-place profiles for F0s <220 Hz, possibly explaining why we never obtained a reliable pitch estimate in that range. In fact, because our stimuli had missing fundamentals, we cannot rule out that the fundamental might always be resolved when it is present.
In one respect, our method may somewhat overestimate the accuracy of the rate-based pitch estimates because we only included data from measurements for which the rate response as a function of F0 oscillated sufficiently to be able to reliably fit a single-fiber model. This constraint was necessary because, for responses that do not oscillate, we could not reliably estimate the minimum and maximum discharge rates that are essential in fitting harmonic templates to the rate-place data. Thirty-five of 122 responses were thus excluded. Because our design minimizes rate saturation, and because 23 of these 35 excluded responses were from fibers with CFs <2 kHz, we infer that insufficient frequency selectivity for resolving harmonics rather than rate saturation was the primary reason for the lack of F0-related oscillations in these measurements.
A factor whose effect on pitch estimation performance is hard to evaluate is that the rate-place profiles included responses to stimuli presented at different sound levels. At first sight, pooling data across levels might seem to increase response variability and therefore decrease estimation performance. However, because the stimulus level was usually selected to be 1520 dB above the threshold of each fiber so that responses would be robust without being saturated, our procedure might actually have reduced the variability due to threshold differences among fibers. The rationale for this procedure is that an optimal central processor would focus on unsaturated fibers because these fibers are the most informative. Because level (re. threshold) rather than absolute level is the primary determinant of rate responses, we are effectively invoking a form of the "selective listening hypothesis" (Delgutte 1982
, 1987
; Lai et al. 1994
), according to which the central processor attends to low-threshold, high-spontaneous rate fibers at low levels and to high-threshold, low-spontaneous rate fibers at high levels.
Our harmonic template differs from those typically used in pattern recognition models of pitch in that it has very broad peaks at the harmonic frequencies. Most pattern recognition models (Duifhuis et al. 1982
; Goldstein 1973
; Terhardt 1974
) use very narrow templates or "sieves," typically a few percent of each harmonic's frequency. One exception is the model of Wightman (1973
), which effectively uses broad cosinusoidal templates by performing a Fourier transform operation on the spectrum. Our method also resembles the Wightman model and differs from the other models in that it avoids an intermediate, error-prone stage that estimates the frequencies of the individual resolved harmonics; rather, a global template is fit to the entire rate-place profile. Broad templates are well adapted to the measured rate-place profiles because the dips between the harmonics are often sharper than the peaks at the harmonic frequencies (Figs. 6 and 8). On the other hand, the templates are the response of the peripheral model to complex tones with equal-amplitude harmonics, which exactly match the stimuli that were presented. It remains to be seen how well such templates would work when the spectral envelope of the stimulus is unknown or when the amplitudes of the individual harmonics are roved from trial to trial, conditions that cause little degradation in psychophysical performance (Bernstein and Oxenham 2003a
; Houtsma and Smurzynski 1990
).
Given the uncertainties about HOW the various factors discussed above may affect our pitch estimation procedure, a comparison of the pitch estimation performance with psychophysical data should focus on robust overall trends as a function of stimulus parameters rather than on absolute measures of performance. Both the precision of the pitch estimates (Fig. 7A) and their salience (as measured by the Fisher information; Fig. 7B), improve with increasing F0 as the harmonics of the complex become increasingly resolved. This result is in agreement with psychophysical observations that both pitch strength and pitch discrimination performance improve as the degree of harmonic resolvability increases (Bernstein and Oxenham 2003b
; Carlyon and Shackleton 1994
; Houtsma and Smurzynski 1990
; Plomp 1967
; Ritsma 1967
). However, the continued increase in Fisher information with F0 beyond 1,000 Hz conflicts with the existence of an upper limit to the pitch of missing-fundamental stimuli, which occurs at about 1,400 Hz in humans (Moore 1973b
). This discrepancy between the rapid degradation in pitch discrimination at high frequencies and the lack of a concomitant degradation in cochlear frequency selectivity is a general problem for place models of pitch perception and frequency discrimination (Moore 1973a
).
We also found that the relative phases of the resolved harmonics of a complex tone do not greatly influence rate-based estimates of pitch (Fig. 8). This result is consistent with expectations for a purely place representation of pitch, as well as with psychophysical results for stimuli containing resolved harmonics (Houtsma and Smurzynski 1990
; Shackleton and Carlyon 1994
; Wightman 1973
).
The restriction of our data to low and moderate stimulus levels raises the question of whether the rate-place representation of pitch would remain robust at the higher stimulus levels typically used in speech communication or when listening to music. Previous studies have used signal detection theory to quantitatively assess the ability of rate-place information in the AN to account for behavioral performance in tasks such as intensity discrimination (Colburn et al. 2003
; Delgutte 1987
; Viemeister 1988
; Winslow and Sachs 1988
; Winter and Palmer 1991
) and formant-frequency discrimination for vowels (Conley and Keilson 1995
; May et al. 1996
). These studies give a mixed message. On the one hand, the rate-place representation generally contains sufficient information to account for behavioral performance up to the highest sound levels tested. On the other hand, because the fraction of high-threshold fibers is small compared to low-threshold fibers, predicted performance of optimal processor models degrades markedly with increasing level, whereas psychophysical performance remains stable. Thus while a rate-place representation cannot be ruled out, it fails to account for a major trend in the psychophysical data. Extending this type of analysis to pitch discrimination for harmonic complex tones is beyond the scope of this paper. Given the failure of the rate-place representation to account for the level dependence of performance in the other tasks, a more productive approach may be to explore alternative spatio-temporal representations that would rely on harmonic resolvability like the rate-place representation, but would be more robust with respect to level variations by exploiting phase locking (Heinz et al. 2001
; Shamma 1985a
). Preliminary tests of one such spatio-temporal representation are encouraging (Cedolin and Delgutte 2005b
).
Interspike-interval representation of pitch
Our results confirm previous findings (Cariani and Delgutte 1996a
,b
; Palmer 1990
; Palmer and Winter 1993
), that fundamental frequencies of harmonic complex tones are precisely represented in pooled all-order interspike-interval distributions of the AN. These interval distributions have prominent modes at the period of F0 and its integer multiples (Fig. 9, A and B). Pitch estimates derived using periodic templates that select intervals at a given period and its multiples were highly accurate (often within 1%) for F0s up to 1,300 Hz (Fig. 10). The determination of this upper limit to the interval-based representation of pitch is a new finding. Moreover, the use of periodic templates for pitch estimation improves on the traditional method of picking the largest mode in the interval distribution by greatly reducing suboctave errors.
While the existence of an upper limit to the representation of pitch in interspike intervals is expected from the degradation in phase locking at high frequencies, the location of this limit at 1,300 Hz is low compared with the usually quoted 4- to 5-kHz limit of phase locking in the AN (Johnson 1980
; Rose et al. 1967
). Of course, both the limit of pitch representation and the limit of phase locking depend to some extent on the signal-to-noise ratio of the data, which in turn depends on the duration of the stimulus, the number of stimulus repetitions and, for pooled interval distributions, the number of sampled fibers. However the discrepancy between the two limits appears too large to be entirely accounted for by differences in signal-to-noise ratio. Fortunately, the discrepancy can be largely reconciled by taking into account harmonic resolvability and the properties of our stimuli. For F0s near 1,300 Hz, all the harmonics within the CF range of our data (4509,200 Hz) are well resolved (Fig. 4), so that information about pitch in pooled interval distributions must depend on phase locking to individual resolved harmonics rather than on phase locking to the envelope generated by interactions between harmonics within a cochlear filter. Moreover, because our stimuli have missing fundamentals, an unambiguous determination of pitch from the pooled distribution requires phase locking to at least two resolved harmonics. As F0 increases above 1,300 Hz, the third harmonic (3,900 Hz) begins to exceed the upper limit of phase locking, leaving only ambiguous pitch information and therefore leading to severe estimation errors.
A major finding is that the range of F0s over which interval-based estimates of pitch are reliable roughly covers the entire human perceptual range of the pitch of missing-fundamental stimuli, which extends up to 1,400 Hz for stimuli containing many harmonics (Moore 1973b
). It is widely recognized that the upper limit of phase locking to pure tones matches the limit in listeners' ability to identify musical intervals (Semal and Demany 1990
; Ward 1954
). The present results extend this correspondence to complex tones with missing F0.
Our results predict that pitch based on pooled interval distributions is strongest for F0s <400 Hz (Fig. 10B), a range for which the decreased effectiveness of pitch estimation based on rate-place information implies that individual harmonics are poorly resolved in the cat. Thus the interval-based representation of pitch seems to have trouble predicting the greater salience of pitch based on resolved harmonics compared to that based on unresolved harmonics (Bernstein and Oxenham 2003b
; Shackleton and Carlyon 1994
). This conclusion based on physiological data supports similar conclusions previously reached for autocorrelation models of pitch perception (Bernstein and Oxenham 2003a
; Carlyon 1998
; Meddis and O'Mard 1997
).
We found that neither the pitch values nor the pitch strength estimated from pooled interval distributions depend on the phase relationships among the harmonics at higher F0s where some harmonics are well resolved (Fig. 11, EG). This finding is consistent with psychophysical observations on the phase invariance of pitch and pitch salience for stimuli containing resolved harmonics (Carlyon and Shackleton 1994
; Houtsma and Smurzynski 1990
; Wightman 1973
). However, pitch salience and, in some cases, pitch values do depend on phase relationships for stimuli consisting of unresolved harmonics (Houtsma and Smurzynski 1990
; Lundeen and Small 1984
; Ritsma and Engel 1964
). In particular, for stimuli in alternating sine-cosine phase, the pitch often matches the envelope periodicity at 2 x F0 rather than the fundamental (Lundeen and Small 1984
). Consistent with previous studies (Horst et al. 1990
; Palmer and Winter 1992
, 1993
), we found a correlate of this observation in the pooled interval distributions in that our interval-based measure of pitch strength was almost as large at the envelope frequency 2 x F0 as at the fundamental F0 for alternating-phase stimuli with unresolved harmonics (Fig. 11D). Despite this frequency doubling, the pitch values estimated from interval distributions were always at F0 and never at 2 x F0, in contrast to psychophysical judgments for unresolved harmonics in alternating phase (Lundeen and Small 1984
). Thus pitch estimation based on interspike intervals does not seem to be sufficiently sensitive to the relative phases of unresolved harmonics compared to psychophysical data. A similar conclusion has been reached for the autocorrelation model of pitch, and modifications to the model have been proposed in part to handle this difficulty (de Cheveigné 1998
; Patterson and Holdsworth 1994
).
Vocalizations and pitch perception
A widely held view is that pitch perception for harmonic complex tones is closely linked to the extraction of biologically relevant information from conspecific vocalizations including human speech. For example, Terhardt (1974
) argued: "The virtual pitch cues can be generated only if a learning process previously has been performed... . In that process, the correlations between the spectral-pitch cues of voiced speech sounds ... are recognized and stored. The knowledge about harmonic pitch relations which is acquired in this way is employed by the system in the generation of virtual pitch." While the role of a learning mechanism may be questioned given that the perception of missing-fundamental pitch appears to be already present in young infants (Clarkson and Clifton 1985
; Montgomery and Clarkson 1997
), a link between vocalization and pitch perception is supported by other arguments. Many vertebrate vocalizations, such as the vowels of human speech, contain harmonic complex tones. Because the fundamental component is rarely the most intense component in these vocalizations, it would often be masked in the presence of environmental noise, thereby creating a selective pressure for a missing-fundamental mechanism. The link between vocalization and pitch perception has recently been formalized using a model which predicts many psychophysical pitch phenomena from the probability distribution of human voiced speech sounds, without explicit reference to any specific pitch extraction mechanism (Schwartz and Purves 2004
).
In cats, the fundamental frequency range most important for vocalizations lies at 5001,000 Hz (Brown et al. 1978
; Nicastro and Owren 2003
; Shipley et al. 1991
). In this range, pitch is robustly represented in both rate-place profiles and pooled interspike-interval distributions. The difficulties encountered by the rate-place representation over the F0-region below 300 Hz, which is the most important for human voice, may reflect the poorer frequency selectivity of the cat cochlea compared to the human (Shera et al. 2002
). If we assume that human cochlear filters are about three times as sharply tuned as cat filters, consistent with the Shera et al. data, the rate-place representation would hold in humans for F0s at least as low as 100 Hz, thereby encompassing most voiced speech sounds. Thus, in humans as well as in cats, the pitch of most conspecific vocalizations may be robustly represented in both rate-place profiles and interspike-interval distributions. Such a dual representation may be advantageous in situations when either mechanism is degraded by either cochlear damage or central disorders of temporal processing because it would still allow impaired individuals to extract pitch information.
Despite its appeal, the idea that the pitch of conspecific vocalizations has a dual representation in spatial and temporal codes is not likely to hold across vertebrate species. At one extreme is the mustached bat, where the F0s of vocalizations range from 8 to 30 kHz (Kanwal et al. 1994
), virtually ruling out any temporal mechanism. Evidence for a perception of the pitch of missing fundamental stimuli at ultrasonic frequencies is available for one species of bats (Preisler and Schmidt 1998
). At the other extreme is the bullfrog, where the fundamental frequency of vocalizations near 100 Hz appears to be coded in the phase locking of AN fibers to the sound's envelope rather than by a place mechanism (Dear et al. 1993
; Simmons et al. 1990
). Although this species is sensitive to the fundamental frequency of complex tones (Capranica and Moffat 1975
), it is not known whether it experiences a missing fundamental phenomenon similar to that in humans. These examples suggest that a tight link between pitch and vocalization may be incompatible with the existence of a general pitch mechanism common to all vertebrate species. Either different species use separate pitch mechanisms to different degrees or the primary function of the pitch mechanism is not to extract information from conspecific vocalizations, or both.
Conclusion
We compared the effectiveness of two possible representations of the pitch of harmonic complex tones in the responses of the population of AN fibers at low and moderate stimulus levels: a rate-place representation based on resolved harmonics and a temporal representation based on pooled interspike-interval distributions. A major finding is that the rate-place representation was most effective for F0s >400500 Hz, consistent with previous reports of a lack of rate-place cues to pitch for lower F0s, and with the improvement in cochlear frequency selectivity with increasing frequency. The interspike-interval representation gave precise estimates of pitch for low F0s, but broke down near 1,300 Hz. This upper limit is consistent with the psychophysical limit of the pitch of the missing fundamental for stimuli containing many harmonics, extending to missing-fundamental stimuli the correspondence between the frequency range of phase locking and that of musical pitch. Both rate-place and interspike-interval representations were effective in the F0 range of cat vocalizations, and a similar result may hold for human voice if we take into account the differences in cochlear frequency selectivity between the two species. Consistent with psychophysical data, neither of the two pitch representations was sensitive to the relative phases of the partials for stimuli containing resolved harmonics.
On the other hand, neither representation of pitch is entirely consistent with the psychophysical data. The rate-place representation fails to account for the upper limit of musical pitch and is known to degrade rapidly with increases in sound level and decreases in signal-to-noise ratio. The interval representation has trouble accounting for the greater salience of pitch based on resolved harmonics compared to pitch based on unresolved harmonics and appears to be insufficiently sensitive to phase for stimuli consisting of unresolved harmonics. These conclusions suggest a search for alternative neural codes for pitch that would combine some of the features of place and temporal codes to overcome the limitations of either code. One class of codes that may meet these requirements are spatio-temporal codes that depend on both harmonic resolvability and phase locking (Heinz et al. 2001
; Loeb et al. 1983
; Shamma 1985a
; Shamma and Klein 2000
).
|
|
GRANTS |
|---|
|
|
|
ACKNOWLEDGMENTS |
|---|
|
|
|
FOOTNOTES |
|---|
1 A programming error resulted in erroneous phase relationships among the harmonics for most F0s. We only performed the test for the four F0 values for which the phases were correct. ![]()
Address for reprint requests and other correspondence: L. Cedolin, Eaton-Peabody Laboratory, Massachusetts Eye and Ear Infirmary, 243 Charles St., Boston, MA 02114 (E-mail: cedro{at}mit.edu)
|
|
REFERENCES |
|---|
|
Bernstein JG and Oxenham AJ. Pitch discrimination of diotic and dichotic tone complexes: harmonic resolvability or harmonic number? J Acoust Soc Am 113: 33233334, 2003b.[CrossRef][Web of Science][Medline]
Bregman AS. Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, MA: MIT Press, 1990.
Brown CH, Beecher MD, Moody DB, and Stebbins WC. Localization of primate calls by old world monkeys. Science 201: 753754, 1978.
Capranica RR and Moffat AJM. Selectivity of the peripheral auditory system of spadefoot toads (Scaphiopus couchi) for sounds of biological significance. J Comp Physiol 100: 231249, 1975.[CrossRef]
Cariani PA and Delgutte B. Neural correlates of the pitch of complex tones. I. Pitch and pitch salience. J Neurophysiol 76: 16981716, 1996a.
Cariani PA and Delgutte B. Neural correlates of the pitch of complex tones. II. Pitch shift, pitch ambiguity, phase invariance, pitch circularity, rate pitch, and the dominance region for pitch. J Neurophysiol 76: 17171734, 1996b.
Carlyon RP. Comments on "A unitary model of pitch perception." [J Acoust Soc Am 102: 18111820, 1997]. J Acoust Soc Am 104: 11181121, 1998.[CrossRef][Web of Science][Medline]
Carlyon RP and Shackleton TM. Comparing the fundamental frequencies of resolved and unresolved harmonics: evidence for two pitch mechanisms? J Acoust Soc Am 95: 35413554, 1994.[CrossRef]
Clarkson MG and Clifton RK. Infant pitch perception: evidence for responding to pitch categories and the missing fundamental. J Acoust Soc Am 77: 15211528, 1985.[CrossRef][Web of Science][Medline]
Cedolin L and Delgutte B. Dual representation of the pitch of complex tones in the auditory nerve. Abstr Assoc Res Otolaryngol 26, 2003.
Cedolin L and Delgutte B. Representations of the pitch of complex tones in the auditory nerve. In: Auditory Signal Processing: Physiology, Psychoacoustics, and Models, edited by Pressnitzer D, deCheveigne A, McAdams S, and Collet L. New York: Springer, 2005a, p. 107116.
Cedolin L and Delgutte B. Spatio-temporal representation of the pitch of complex tones in the auditory nerve. Abstr Assoc Res Otolaryngol 28, 2005b.
Cohen MA, Grossberg S, and Wyse LL. A spectral network model of pitch perception. J Acoust Soc Am 98: 862879, 1994.
Colburn HS, Carney LH, and Heinz MG. Quantifying the information in auditory-nerve responses for level discrimination. J Assoc Res Otolaryngol 4: 294311, 2003.[CrossRef][Web of Science][Medline]
Conley RA and Keilson SE. Rate representation and discriminability of second formant frequencies for /
/-like steady-state vowels in cat auditory nerve. J Acoust Soc Am 98: 32233234, 1995.[CrossRef][Web of Science][Medline]
Cooper NP and Rhode WS. Mechanical responses to two-tone distortion products in the apical and basal turns of the mammalian cochlea. J Neurophysiol 78: 261270, 1997.
Cynx J and Shapiro M. Perception of missing fundamental by a species of songbird (Sturnus vulgaris). J Comp Psychol 100: 356360, 1986.[CrossRef][Web of Science][Medline]
Darwin CJ and Carlyon RP. Auditory grouping. In: The Handbook of Perception and Cognition, vol. 6, Hearing, edited by Moore BCJ. London: Academic, 1995, p. 387424.
de Cheveigné A. Cancellation model of pitch perception. J Acoust Soc Am 103: 12611271, 1998.[CrossRef][Web of Science][Medline]
Dear SP, Fritz J, Haresign T, Ferragamo M, and Simmons JA. Tonotopic and functional organization in the auditory cortex of the big brown bat. J Neurophysiol 70: 19882009, 1993.
Delgutte B. Some correlates of phonetic distinctions at the level of the auditory nerve. In: The Representation of Speech in the Peripheral Auditory System, edited by Granström RCAB. Amsterdam: Elsevier, 1982, p. 131150.
Delgutte B. Peripheral auditory processing of speech information: implications from a physiological study of intensity discrimination. In: The Psychophysics of Speech Perception, edited by Schouten M. Nijhof: Dordrecht, 1987, p. 333353.
Duifhuis H, Willems LF, and Sluyter RJ. Measurement of pitch in speech: An implementation of Goldstein's theory of pitch perception. J Acoust Soc Am 71: 15681580, 1982.[CrossRef][Web of Science][Medline]
Efron B and Tibshirani RJ. An Introduction to the Bootstrap. New York: Chapman and Hall, 1993.
Goldstein JL. An optimum processor theory for the central formation of the pitch of complex tones. J Acoust Soc Am 54: 14961516, 1973.[CrossRef][Web of Science][Medline]
Heffner H and Whitfield IC. Perception of the missing fundamental by cats. J Acoust Soc Am 59: 915919, 1976.[CrossRef][Web of Science][Medline]
Heinz MG, Colburn HS, and Carney LH. Rate and timing cues associated with the cochlear amplifier: level discrimination based on monaural cross-frequency coincidence detection. J Acoust Soc Am 110: 20652084, 2001.[CrossRef][Web of Science][Medline]
Hirahara T, Cariani PA, and Delgutte B. Representation of low-frequency vowel formants in the auditory nerve. In: Proceedings European Speech Communication Association Workshop on Auditory Basis of Speech Perception. Keele University, UK, July 1519, 1996, p. 8386.
Horst JW, Javel E, and Farley GR. Coding of spectral fine structure in the auditory nerve. II. Level-dependent nonlinear responses. J Acoust Soc Am 88: 26562681, 1990.[CrossRef][Web of Science][Medline]
Houtsma AJM and Smurzynski J. Pitch identification and discrimination for complex tones with many harmonics. J Acoust SocAm 87: 304310, 1990.[CrossRef]
Johnson DH. The relationship between spike rate and synchrony in responses of auditory-nerve fibers to single tones. J Acoust Soc Am 68: 11151122, 1980.[CrossRef][Web of Science][Medline]
Kanwal JS, Matsumura S, Ohlemiller K, and Suga N. Analysis of acoustic elements and syntax in communication sounds emitted by mustached bats. J Acoust Soc Am 96: 12291254, 1994.[CrossRef][Web of Science][Medline]
Kiang NYS, Moxon EC, and Levine RA. Auditory-nerve activity in cats with normal and abnormal cochleas. In: Sensorineural Hearing Loss. Ciba Foundation Symposium. edited by Wolstenholme G and Knight J. London: J. & A. Churchill, 1970, p. 241273.
Kiang NYS, Watanabe T, Thomas EC, and Clark LF. Discharge Patterns of Single FIbers in the Cat's Auditory Nerve. Cambridge, MA: MIT Press, 1965.
Lai YC, Winslow RL, and Sachs MB. A model of selective processing of auditory-nerve inputs by stellate cells of the antero-ventral cochlear nucleus. J Comput Neurosci 1: 167194, 1994.[CrossRef][Medline]
Liberman MC. Auditory-nerve responses from cats raised in a low-noise chamber. J Acoust Soc Am 63: 442455, 1978.[CrossRef][Web of Science][Medline]
Licklider JCR. A duplex theory of pitch perception. Experientia 7: 128134, 1951.[CrossRef][Web of Science][Medline]
Loeb GE, White MW, and Merzenich MM. Spatial cross-correlation. A proposed mechanism for acoustic pitch perception. Biol Cybern 47: 149163, 1983.[CrossRef][Web of Science][Medline]
Louage DH, van der Heijden M, and Joris PX. Temporal properties of responses to broadband noise in the auditory nerve. J Neurophysiol 91: 20512065, 2004.
Lundeen C and Small AMJ. The influence of temporal cues on the strength of periodicity pitches. J Acoust Soc Am 75: 15781587, 1984.[CrossRef][Web of Science][Medline]
May BJ, Huang AY, Le Prell GS, and Hienz RD. Vowel formant frequency discrimination in cats: comparison of auditory nerve representations and psychophysical thresholds. Aud Neurosci 3: 135162, 1996.
McKinney MF and Delgutte B. A possible neurophysiological basis of the octave enlargement effect. J Acoust Soc Am 73: 16941700, 1999.[CrossRef]
Meddis R and Hewitt MJ. Virtual pitch and phase sensitivity of a computer model of the auditory periphery. I. Pitch identification. J Acoust Soc Am 89: 28662882, 1991.[CrossRef]
Meddis R and O'Mard L. A unitary model of pitch perception. J Acoust Soc Am 102: 18111820, 1997.[CrossRef][Web of Science][Medline]
Montgomery C and Clarkson M. Infants' pitch perception: masking by low- and high-frequency noises. J Acoust Soc Am 102: 36653672, 1997.[CrossRef][Web of Science][Medline]
Moore BCJ. Frequency difference limens for short-duration tones. J Acoust Soc Am 54: 610619, 1973a.[CrossRef][Web of Science][Medline]
Moore BCJ. Some experiments relating to the perception of complex tones. Q J Exp Psychol 25: 451475, 1973b.[Web of Science][Medline]
Moore BCJ. Introduction to the Psychology of Hearing. London: Academic, 1990.
Nicastro N and Owren MJ. Classification of domestic cat (Felis catus) vocalizations by naive and experienced human listeners. J Comp Psychol 117: 4452, 2003.[CrossRef][Web of Science][Medline]
Ohgushi K. The origin of tonality and a possible explanation of the octave enlargement phenomenon. J Acoust Soc Am 73: 16941700, 1983.[CrossRef][Web of Science][Medline]
Ohm G. Über die Definition des Tones nebst daran geknüpfte Theorie der Sirene und ähnlicher tonbildender Vorrichtungen. Ann Phys Chem 59: 513565, 1843.
Palmer AR. The representation of the spectra and fundamental frequencies of steady-state single- and double-vowel sounds in the temporal discharge patterns of guinea pig cochlear-nerve fibers. J Acoust Soc Am 88: 14121426, 1990.[CrossRef][Web of Science][Medline]
Palmer AR and Russell IJ. Phase-locking in the cochlear nerve of the guinea pig and its relation to the receptor potential of inner hair cells. Hearing Res 24: 115, 1986.[CrossRef][Web of Science][Medline]
Palmer AR and Winter IM. Cochlear nerve and cochlear nucleus responses to the fundamental frequency of voiced speech sounds and harmonic complex tones. In: Auditory Physiology and Perception, edited by Horner K. Oxford: Pergamon, 1992, p. 231240.
Palmer AR and Winter IM. Coding of the fundamental frequency of voiced speech sounds and harmonic complex tones in the ventral cochlear nucleus. In: Mammalian Cochlear Nuclei: Organization and Function, edited by Mugnaini E. New York: Plenum, 1993, p. 373384.
Patterson R. Auditory filter shapes derived with noise stimuli. J Acoust Soc Am 59: 640654, 1976.[CrossRef][Web of Science][Medline]
Patterson R and Holdsworth J. A functional model of neural activity patterns and auditory images. In: Advances in Speech, Hearing and Language Processing, edited by Ainsworth W. London: JAI, 1994, p. 547563.
Plomp R. The ear as a frequency analyzer. J Acoust Soc Am 36: 16281636, 1964.[CrossRef]
Plomp R. Pitch of complex tones. J Acoust Soc Am 41: 15261533, 1967.[CrossRef][Web of Science][Medline]
Preisler A and Schmidt S. Spontaneous classification of complex tones at high and ultrasonic frequencies in the bat, Megaderma lyra. J Acoust Soc Am 103: 25952607, 1998.[CrossRef][Web of Science][Medline]
Pressnitzer D, Patterson RD, and Krumbholz K. The lower limit of melodic pitch. J Acoust Soc Am 109: 20742084, 2001.[CrossRef][Web of Science][Medline]
Rhode WS. Interspike intervals as a correlate of periodicity pitch in cat cochlear nucleus. J Acoust Soc Am 97: 24132429, 1995.
Ritsma RJ. Frequencies dominant in the perception of the pitch of complex sounds. J Acoust Soc Am 42: 191198, 1967.[CrossRef][Web of Science][Medline]
Ritsma RJ and Engel FL. Pitch of frequency-modulated signals. J Acoust Soc Am 36: 16371644, 1964.[CrossRef]
Rose JE, Brugge JR, Anderson DJ, and Hind JE. Phase-locked response to low-frequency tones in single auditory nerve fibers of the squirrel monkey. J Neurophysiol 30: 769793, 1967.
Ruggero MA, Rich NC, Recio A, Narayan SS, and Robles L. Basilar-membrane responses to tones at the base of the chinchilla cochlea. J Acoust Soc Am 101: 21512163, 1997.[CrossRef][Web of Science][Medline]
Sachs MB and Abbas PJ. Rate versus level functions for auditory-nerve fibers in cats: tone-burst stimuli. J Acoust Soc Am 56 No. 6: 18351847, 1974.
Sachs MB and Young ED. Encoding of steady-state vowels in the auditory nerve: representation in terms of discharge rate. J Acoust Soc Am 66: 470479, 1979.[CrossRef][Web of Science][Medline]
Schroeder MR. Synthesis of low peak-factor signals and binary sequences with low autocorrelation. IEEE Trans Inform Theory 16: 8589, 1970.
Schwartz DA and Purves D. Pitch is determined by naturally occurring periodic sounds. Hear Res 194: 3146, 2004.[CrossRef][Web of Science][Medline]
Seebeck A. Beobachtungen über einige Bedingungen der Entstehung von Tönen. Ann Phys Chem 53: 417436, 1841.
Semal C and Demany L. The upper limit of "musical" pitch. Music Percept 8: 165175, 1990.
Shackleton TM and Carlyon RP. The role of resolved and unresolved harmonics in pitch perception and frequency modulation discrimination. J Acoust Soc Am 95: 35293540, 1994.[CrossRef][Web of Science][Medline]
Shamma SA. Speech processing in the auditory system. I: The representation of speech sounds in the responses of the auditory nerve. J Acoust Soc Am 78: 16121621, 1985a.[CrossRef][Web of Science][Medline]
Shamma SA. Speech processing in the auditory system. II: Lateral inhibition and the central processing of speech evoked activity in the auditory nerve. J Acoust Soc Am 78: 16221632, 1985b.[CrossRef][Web of Science][Medline]
Shamma SA and Klein D. The case of the missing pitch templates: how harmonic templates emerge in the early auditory system. J Acoust Soc Am 107: 26312644, 2000.[CrossRef][Web of Science][Medline]
Shera CA, Guinan JJ Jr, and Oxenham AJ. Revised estimates of human cochlear tuning from otoacoustic and behavioral measurements. Proc Natl Acad Sci USA 99: 33183323, 2002.
Shipley C, Carterette EC, and Buchwald JS. The effect of articulation on the acoustical structure of feline vocalizations. J Acoust Soc Am 89: 902909, 1991.[CrossRef][Web of Science][Medline]
Shofner WP. Temporal representation of rippled noise in the anteroventral cochlear nucleus of the chinchilla. J Acoust Soc Am 90: 24502466, 1991.[CrossRef][Web of Science][Medline]
Siebert WM. Stimulus transformations in the peripheral auditory system. In: Recognizing Patterns, edited by Kollers PA and Eden M. Cambridge: MIT Press, 1968, p. 104133.
Simmons JA, Moss CF, and Ferragamo M. Convergence of temporal and spectral information into acoustic images of complex sonar targets perceived by the echolocating bat, Eptesicus fuscus. J Comp Physiol A 166: 449470, 1990.[Medline]
Smoorenburg GF and Linschoten DH. A neurophysiological study on auditory frequency analysis of complex tones. In: Psychophysics and Physiology of Hearing, edited by Wilson JP. London: Academic, 1977, p. 175184.
Suta D, Kvasnak E, Popelar J, and Syka J. Representation of species-specific vocalizations in the inferior colliculus of the guinea pig. J Neurophysiol 90: 37943808, 2003.
Terhardt E. Pitch, consonance, and harmony. J Acoust Soc Am 55: 10611069, 1974.[CrossRef][Web of Science][Medline]
Tomlinson RWW and Schwartz DWF. Perception of the missing fundamental in nonhuman primates. J Acoust Soc Am 84: 560565, 1988.[CrossRef][Web of Science][Medline]
Viemeister NF. Psychophysical aspects of auditory coding. In: Auditory Function. Neurobiological Bases of Hearing, edited by Edelman GM, Gall WE, and Cowan WM. New York: John Wiley, 1988, p. 213241.
Ward WD. Subjective musical pitch. J Acoust Soc Am 26: 369380, 1954.[CrossRef]
Wightman FL. The pattern-transformation model of pitch. J Acoust Soc Am 54: 407416, 1973.[CrossRef][Web of Science][Medline]
Wilson J and Evans E. Grating acuity of the ear: psychophysical and neurophysiological measures of frequency resolving power. Proc. 7th Int. Congr. on Acoustics, Budapest, 1971, p. 397400.
Winslow RL and Sachs MB. Single-tone intensity discrimination based on auditory-nerve rate responses in backgrounds of quiet, noise, and with stimulation of the crossed olivocochlear bundle. Hear Res 35: 165190, 1988.[CrossRef][Web of Science][Medline]
Winter IM and Palmer AR. Intensity coding in low-frequency auditory-nerve fibers of the guinea pig. J Acoust Soc Am 90: 19581967, 1991.[CrossRef][Web of Science][Medline]
Yost WA. The dominance region and ripple noise pitch: a test of the peripheral weighting model. J Acoust Soc Am 72: 416425, 1982.[CrossRef][Web of Science][Medline]
Yost WA. Pitch of iterated rippled noise. J Acoust Soc Am 100: 511518, 1996.[CrossRef][Web of Science][Medline]
This article has been cited by other articles:
![]() |
G. M. Bidelman and A. Krishnan Neural Correlates of Consonance, Dissonance, and the Hierarchy of Musical Pitch in the Human Brainstem J. Neurosci., October 21, 2009; 29(42): 13165 - 13171. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Zheng and M. A. Escabi Distinct Roles for Onset and Sustained Activity in the Neuronal Code for Temporal Periodicity and Acoustic Envelope Shape J. Neurosci., December 24, 2008; 28(52): 14230 - 14244. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Sayles and I. M. Winter Ambiguous Pitch and the Temporal Representation of Inharmonic Iterated Rippled Noise in the Ventral Cochlear Nucleus J. Neurosci., November 12, 2008; 28(46): 11925 - 11938. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Larsen, L. Cedolin, and B. Delgutte Pitch Representations in the Auditory Nerve: Two Concurrent Complex Tones J Neurophysiol, September 1, 2008; 100(3): 1301 - 1319. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Gutschalk, A. J. Oxenham, C. Micheyl, E. C. Wilson, and J. R. Melcher Human Cortical Activity during Streaming without Spectral Cues Suggests a General Neural Substrate for Auditory Stream Segregation J. Neurosci., November 28, 2007; 27(48): 13074 - 13081. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Ruggero and A. N. Temchin Unexceptional sharpness of frequency tuning in the human cochlea PNAS, December 20, 2005; 102(51): 18614 - 18619. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Visit Other APS Journals Online |