|
|
||||||||
1Department of Speech and Hearing Science, Arizona State University, Tempe, Arizona; and 2Department of Psychology, Utah State University, Logan, Utah
Submitted 19 November 2004; accepted in final form 29 July 2005
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
A simple stimulus for studying spectral segregation based on harmonicity is a complex tone in which one component in a harmonic series is shifted in frequency, or "mistuned" (Hartmann 1988
). A component mistuned by 12% or more can be heard as a second sound source, perceived simultaneously with the original harmonic tone but having a different pitch (Hartmann et al. 1990
; Moore et al. 1986
). Smaller amounts of mistuning that do not lead to the perception of a second tone may still be detectable (Lee and Green 1994
; Moore et al. 1985
). The mistuned tone appears to engage the same neural mechanisms that underlie the processing of simultaneous competing sounds in more natural listening situations. Sinex et al. (2002b)
described the responses of inferior colliculus (IC) neurons to mistuned tones. They varied stimulus parameters including the amount of mistuning and the harmonic number that was mistuned, and found that these stimulus manipulations could produce dramatic changes in the temporal discharge patterns of IC neurons. In contrast, comparable stimulus changes produced only modest quantitative changes in the discharge patterns of auditory nerve fibers (Sinex et al. 2003a
). They concluded that a significant transformation in the processing of complex sounds occurred between the auditory nerve and the IC, and that information related to the presence of multiple sound sources might be conveyed by the temporal discharge patterns of IC neurons.
This study extends the findings of Sinex et al. (2002b)
, who focused on the representation of many different mistuned tones. In the present study, the representation of one particular tone was examined in detail, with the goal of determining how the same stimulus was processed by different neurons. Although the amount of mistuning was not systematically varied, effects of changing additional stimulus parameters including level and component phases were examined. Observations of responses to stimuli synthesized with different component phases and presented at different levels provided a control against the possibility that the distinctive temporally patterned responses observed previously were a fortuitous consequence of the use of sine-phase multicomponent tones. Effects of component phase may also provide evidence about mechanisms that underlie the representation of mistuned tones. Hypotheses about those potential mechanisms were tested by presenting the same stimuli to a simple computational model that simulated the responses of brain stem and IC neurons. The model explored the consequences of combining the responses of simulated peripheral and brain stem neurons in various ways, with the goal of identifying integrative strategies capable of generating discharge patterns like those typically exhibited by IC neurons. Portions of these results were presented as abstracts (Sinex 2004
; Sinex and Li 2004
; Sinex et al. 2003a
).
| METHODS |
|---|
|
|
|---|
Responses were obtained from single neurons in the IC of the chinchilla. The procedures were similar to those described previously (Sinex et al. 2002b
), although several significant changes were made. Different anesthetics and different recording electrodes were used. In addition, complex tones were synthesized with individual components in different phase relations, component amplitudes were normalized to be equal at the eardrum, and different earphones were used. None of these changes produced results that were inconsistent with those of Sinex et al. (2002b)
. All procedures were approved by the Institutional Animal Care and Use Committee at Arizona State University.
Animals were anesthetized by injection of a mixture of 36 mg/kg ketamine and 4 mg/kg xylazine. Supplemental injections of ketamine or ketaminexylazine were given as required to maintain a surgical level of anesthesia. Animals were placed in a stereotaxic instrument (David Kopf Instruments) in a sound-attenuating booth (Acoustic Systems, Austin TX). The skull was opened and a portion of nonauditory cerebral cortex directly above the IC on the right side was aspirated to expose the dorsal surface of the IC. Recording electrodes were placed above the IC at predetermined stereotaxic coordinates referenced to visual landmarks (Nuding et al. 1999
), then advanced from outside the sound booth with a hydraulic microdrive (Trent-Wells, Coulterville, CA). The electrode trajectory passed through the central nucleus of the IC, in a parasagittal plane.
In early experiments, tungsten electrodes insulated with Parylene C were used. In those experiments, the locations of selected recording sites were marked with electrolytic lesions made by passing 5 µA of current through the electrode tip for 10 s, and their locations within the IC were confirmed histologically, using standard methods. In later experiments, recordings were obtained with carbon-fiber electrodes (Kation Scientific, Minneapolis, MN). These electrodes were directed toward the same stereotaxic coordinates as in earlier experiments, but recording sites were not marked or confirmed histologically. Most neurons recorded with either electrode type exhibited secure, short-latency responses to tone bursts, and had sharp tuning with characteristic frequencies (CFs) that increased with increasing depth. These properties are consistent with those reported in previous investigations of the central nucleus of the IC (Langner and Schreiner 1988
; Merzenich and Reid 1974
; Nuding et al. 1999
). Based on these physiological properties and the histological observations, it is likely, but not certain, that the data presented here were obtained from neurons in the central nucleus.
Stimuli
The stimuli were complex tones consisting of eight sinusoidal components. Tones were 500 ms long, presented once per second at levels between 10 and 70 dB SPL per component. The amplitudes of individual components were adjusted during synthesis so that they would have equal SPL at the eardrum. The overall SPL was always 9 dB higher than the level of each individual component. In the harmonic tones, the components were the first eight harmonics of a 250-Hz fundamental. In the mistuned tones, the frequency of component 4 was increased by 12%, from 1.000 to 1.120 kHz. In the previous study (Sinex et al. 2002b
), this stimulus often elicited responses with a distinctive modulated temporal pattern that was useful as a standard against which changes across stimulus conditions and neurons could be evaluated. That pattern is described in more detail below and is referred to in this report as the "stereotypical pattern."
The starting phase relation between individual components in the complex tones was also varied. In the "sine" condition, each individual component in the complex tone started in sine phase. In the "cosine" condition, each individual component started in cosine phase. In the "random" conditions, each component had a different, randomly chosen starting phase. As many as three different sets of randomly chosen phases were used. Varying component starting phase changes the waveform and amplitude envelope of the stimulus; however, in informal listening tests there were no audible differences between stimuli synthesized with different starting phases. Using a restricted set of possible starting phases allowed responses to a common stimulus to be compared across neurons. Using more than one set of phases, and more than one randomization, reduced the possibility that the observed responses reflected some idiosyncratic property of one set of phases.
The waveforms of the harmonic tones were perfectly periodic, with a flat envelope and a period of 4 ms (Fig. 1, AC). Mistuning changed the envelope and also the fine structure of each waveform (Fig. 1, DF). With 12% mistuning of component 4, the frequencies of the eight spectral components were integer multiples of 10 Hz, which became the actual fundamental frequency of the mistuned tone. The waveforms exhibited shallow modulation with a period of 100 ms, the reciprocal of 10 Hz. Mistuning did not change the 4-ms spacing of waveform peaks, although the amplitudes of successive peaks were no longer identical as they were in the harmonic tones (Fig. 1, AC). Mistuning did not change the overall level of the tone.
|
Stimulus generation, stimulus presentation, and data collection were controlled by computer. Waveforms were digitally synthesized, passed through digitalanalog converters, programmable attenuators, and antialiasing filters (all from TDT, Alachua, FL), then delivered to a closed acoustic system incorporating ER2A insert earphones and an ER7 probe-tube microphone (Etymotic Research, Elk Grove Village, IL). The acoustic system was calibrated for each experiment. Tones were nearly always delivered monaurally to the contralateral ear. Binaural presentation was used as a last resort if neurons were unresponsive to monaural stimuli; one example of binaural responses is included in the RESULTS below.
When a neuron was isolated, estimates of CF and (if necessary) binaural sensitivity were obtained manually. After this initial characterization of the neuron, a detailed frequencyresponse map was obtained with an automated procedure that presented tones at multiple frequencies and levels (Nuding et al. 1999
). These data were used to make a more precise estimate of CF and threshold at CF. In most cases the frequencyresponse map was remeasured in the presence of a fixed-level CF tone. The fixed tone elicited a consistent response that made it possible to estimate the frequencySPL regions that provided inhibitory input to the neuron.
Responses to 500-ms tone bursts at CF were also obtained and used to classify each neuron according to the shape of its peristimulus time (PST) histogram and first-spike latency (Nuding et al. 1999
). From the PST histogram obtained at 20
30 dB above threshold, the ratio of discharge rate in the first 100 ms to the rate during the final 400 ms was calculated. Sustained units responded throughout the CF tone, although the PST histogram often exhibited a peak at tone onset. First-spike latencies were usually <20 ms, and the ratio of the onset discharge rate to the steady-state discharge rate typically ranged between 1 and 5. Pauser units exhibited a short-latency transient response and a second sustained response, separated by a silent interval lasting several milliseconds; for pauser units, first-spike latencies were usually <20 ms, but the ratio of discharge rates was often <1. Some sustained neurons exhibited pauser responses at higher SPLs, so the distinction between these categories should be considered somewhat arbitrary.
Transient units exhibited spikes just after stimulus onset but were silent or nearly silent afterward. First-spike latencies were usually <20 ms, and discharge-rate ratios varied from 5 to infinite. Long-latency units exhibited sustained or transient discharge patterns, but their first-spike latencies were >30 ms. These units also tended to respond with lower discharge rates, and the possibility that at least some of these were located outside the central nucleus cannot be ruled out. Neurons could be assigned to the transient and long-latency categories with more confidence than was the case for the other categories.
The data presented below are representative of the responses of 101 IC neurons. Of these, 95 neurons were studied with at least one pair of complex tones (a harmonic tone and a mistuned tone that was equivalent in every respect except for the frequency of the mistuned component). Forty-eight of those units were classified as "sustained," 18 units as pauser, 11 as transient, and 18 as long-latency.
Spikes were displayed as PST or cycle histograms with 1-ms bins. Cycle histograms were constructed from spikes occurring between 100 and 500 ms after tone onset, with a 5-ms correction for response latency. A correction was made to include as many spikes as possible in the analysis. A constant correction was used, rather than a unique correction for each neuron, so that time shifts illustrated in the figures below can always be attributed to properties of the stimuli or the neurons, rather than to the latency corrections. The length of the cycle histograms, 200 ms, included two cycles of the fundamental period of the mistuned tone. For some analyses, the discrete Fourier transforms (DFTs) of cycle histograms were calculated; for this calculation, the histograms were regenerated with 256 bins per 200 ms.
Complex tones were presented in a quasi-random order intended to counterbalance harmonic and mistuned conditions, and the various phase conditions, as well as cover a range of SPLs. Typically, each complex tone was presented 100 times.
Computational model
The model incorporated three sequentially ordered stages of processing, which loosely correspond to the cochlear, brain stem, and midbrain levels of the auditory pathway. The model simplified or omitted many details of brain stem processing and it did not attempt to simulate membrane conductances, membrane voltages, or spike discharges, as has been done in more sophisticated biophysical models (Borisyuk et al. 2002
; Cai et al. 1998a
,b
; Hewitt and Meddis 1994
; Nelson and Carney 2004
). Gammatone filters (Patterson et al. 1992
), as implemented by Slaney (1994)
, were used to simulate the frequency selectivity of Stage 1 channels. Stage 2 channels represent neurons in nuclei of the lower auditory brain stem, including those that provide either excitatory or inhibitory input to the IC (Burger and Pollak 2001
; Li and Kelly 1992
; Oliver and Shneiderman 1991
). Stage 2 channels derived their frequency selectivity directly from a single Stage 1 input; no lateral interactions between Stage 2 channels were simulated. However, the output of each Stage 2 channel was low-pass filtered at 400 Hz to extract the response envelope produced by a restricted frequency region of the stimulus. Stage 3 explicitly represented the IC. The key features of Stage 3 were that each channel received input from Stage 2 channels with different CFs, and that excitatory and also inhibitory inputs were combined. The Stage 2 inputs were weighted and summed; the CFs, weights, and number of these inputs are the parameters that were varied in this study. The summed response was low-pass filtered at 200 Hz, to simulate the decreased ability of IC neurons to follow AM, compared with neurons in the lower brain stem (Krishna and Semple 2000
; Langner and Schreiner 1988
; Rees and Møller 1983
). Thus for a Stage 3 channel that received input from "N" Stage 2 frequency bands, the response at Stage 3 (before half-wave rectification and low-pass filtering) was
![]() |
At each stage of the model, the output of a channel was an analog waveform; spikes or spike trains were not generated. Temporal details of these output waveforms can be compared with the temporal discharge patterns of actual neurons.
| RESULTS |
|---|
|
|
|---|
The stereotypical discharge pattern described by Sinex et al. (2002b)
was observed in response to mistuned tones in many, but not all, IC neurons. The pattern was most likely to occur, and to be most distinctive, in neurons with CFs within a broad range between about 0.25 and 3 kHz that exhibited short-latency sustained or pauser responses to pure tones.
Responses of one representative sustained neuron to simple and complex tones are shown in Fig. 2. The neuron's sustained response to CF tones included two to three synchronized spikes at response onset, accounting for the peak in the PST histogram in Fig. 2A; later spikes occurred with more variable latencies. Except for the onset peak, the PST histogram exhibited no stimulus-related temporal pattern. The response to the harmonic complex tone (Fig. 2B) was similar, except that the discharge rate was lower. As was the case for the pure-tone stimulus, no obvious temporal pattern could be seen during the sustained response. However, synchrony indices (SIs; Johnson 1980
) obtained by Fourier analysis revealed that weak but statistically significant discharge synchrony did occur at 250 Hz (SI = 0.19,
< 0.01, Rayleigh statistic; Rhode 1976
). The periodicity of the response most likely reflected the envelope frequency of the harmonic tone. In contrast, the mistuned tone elicited a larger sustained response with a distinctive modulated temporal pattern, which was clearly different from the response to the harmonic tone (Fig. 2C). After the onset response, the discharge pattern was characterized by peaks separated by about 8 ms, and slow modulation with a period of 100 ms. This is the pattern described in detail in the preliminary report (Sinex et al. 2002b
) and referred to in the METHODS as the "stereotypical pattern" against which changes associated with stimulus parameters or characteristics of neurons will be evaluated. As noted previously, this discharge pattern bore no simple relation to the stimulus waveform, in which peaks occurred every 4 ms. Although the waveforms of mistuned stimuli were modulated with the same 100-ms period, the depth of modulation was much greater in this response than it was in the stimulus. Fourier analysis of this discharge pattern indicated that the response included large components synchronized to several frequencies of interest. In this example, the SIs were 0.19 at 10 Hz, 0.63 at 120 Hz, 0.38 at 130 Hz, and 0.39 at 250 Hz; each value was statistically significant (
< 0.01, Rayleigh statistic). The pronounced 10-Hz modulation most likely reflected beating between the two larger response components at 120 and 130 Hz (the term "beating" as used here refers to an interaction between phase-locked responses, not beating between acoustic stimulus components). The origin of response components at those particular frequencies is discussed in the following text and in Sinex et al. (2002b)
. By contrast, in the response to the harmonic tone, the SIs at the same four frequencies were lower, and as noted only the component at 250 Hz reached statistical significance.
|
< 0.01). Unlike the sustained neuron, there was only a small peak at the onset of the response to the harmonic tone. The duration of the pause was reduced, relative to the response to the CF tone. In response to the mistuned tone, the temporal discharge pattern of this pauser neuron exhibited some of the same features, although the pattern was less apparent than that for the sustained neuron shown in Fig. 2. As in the data shown in Fig. 2, statistically significant response components occurred at 10, 120, 130, and 250 Hz; however, the SIs were lower: 0.19, 0.25, 0.35, and 0.22, respectively.
|
|
|
|
Temporal discharge patterns that resembled the stereotypical pattern were observed in neurons with a range of CFs. Each panel in Fig. 7 presents the cycle histogram of responses from a different neuron to the same mistuned tone. The first 5 neurons (Fig. 7, AE) were recorded from one animal and their CFs covered a range of 3.4 octaves, from 0.25 to 2.7 kHz. A variant of the stereotypical pattern was observed in each example. Consistent features of the response included the presence of peaks separated by 8 ms and a response envelope modulated with a period of 100 ms. The figure emphasizes the similarity of the discharge patterns across a broad range of CFs. The responses of neurons with CFs falling within a narrow range, sampled from different animals, would also be similar to one another. That is, there was no systematic dependency of the discharge pattern on CF.
|
The stereotypical discharge pattern differed across neurons in one other way. In some responses, the response fine structure at envelope minima differed from the fine structure at the envelope maximum. In Fig. 7, this can be seen most clearly in the discharge pattern of the neuron with the lowest CF, 0.25 kHz (Fig. 7A). At envelope maxima, peaks in the PST histogram were separated by about 8 ms, as noted. At the envelope valley, the spacing of histogram peaks changed to 4 ms. The 4-ms spacing obviously corresponds to the period of f0, 1/250 Hz. For this low-CF neuron, that spacing could arise directly as a phase-locked response to the fundamental component of the stimulus. Similar patterns were observed in other higher-CF neurons not illustrated in this figure; in those cases, 4-ms periodicity would more likely arise from beating between adjacent tuned harmonics of the stimulus.
Figure 7F presents the responses of one additional neuron to the mistuned tone. This response is noteworthy because of the neuron's high CF, 5.9 kHz. Although the discharge rate was extremely low, the stereotypical pattern was unmistakable. Neurons with CFs >5.9 kHz were not studied, so it is not possible to say whether this CF represents the upper limit of the range over which the stereotypical pattern can be observed.
Effect of stimulus SPL
In general, in neurons that exhibited the stereotypical pattern, the pattern was preserved with changes in stimulus level. Exceptions occurred at levels near threshold, where responses typically exhibited no pattern, or no pattern that could be detected in the absence of large numbers of spikes. Examples are shown in Fig. 8, which presents cycle histograms obtained from one representative neuron at levels from 20 to 60 dB SPL per component. The cycle histograms were similar but not identical. The previously mentioned change in periodicity at the minimum of the response envelope can also be seen, especially at 40 dB SPL. However, the most obvious change was a shift in the phase of the response envelope, similar to the shifts observed across CFs in Fig. 7. The envelope shift could occur with little change in the underlying temporal discharge pattern, as illustrated in Fig. 9, for which the data shown in Fig. 8, B and E were reanalyzed and plotted on the same axes. For the reanalysis, the beginning of the analysis window for the histogram plotted as a dashed line was delayed by 35.2 ms. A cycle length of 100 ms was also used, in part to accommodate the change in analysis window and in part to allow the fine structure of the histogram to be seen more clearly in Fig. 9. The change in analysis times was derived from the difference in response envelope phase (REP) for the two cycle histograms. REP was defined as the phase of the 10-Hz component of the DFT of the cycle histogram. REP was calculated for each histogram, then the phase difference, 2.21 radians at 10 Hz, was converted into a delay in ms. The figure indicates that except for the time shift, the responses obtained at 30 and 60 dB SPL were quite similar, especially for the larger histogram peaks. Across all the levels shown in Fig. 8, the largest envelope time shift was 48 ms, approaching one-half of the modulation cycle (one-half cycle is the maximum possible shift if the direction is ignored). Shifts in REP of this magnitude were observed in many other units.
|
|
In the previous study (Sinex et al. 2002b
), each individual sine wave component in the complex tones was synthesized in sine phase. In the present study, the phases of individual components in the complex tones were varied, in part to determine whether the distinctive temporal pattern observed in many IC neurons occurred fortuitously as a result of the use of sine-phase stimuli. Responses of one neuron to mistuned tones synthesized with various component starting phases are shown in Fig. 10. The neuron exhibited the stereotypical temporal pattern for some but not all stimulus-phase conditions. Details of the discharge patterns varied with component phase, but the most obvious change was a shift in REP. The magnitude of the shift was comparable to the shifts seen across CF or across SPL in previous figures. The changes were large, but not obviously systematic or predictable, either within or across neurons. Varying the phases of stimulus components is likely to produce changes in the relative response latencies of neurons, but the changes cannot be larger than the period of the stimulus component(s) that elicit the responses, in the range 0.5 to 4 ms for the component frequencies used here. The way in which these small latency shifts could produce such large envelope changes is considered further in Output of the computational model compared with IC discharge patterns below.
|
The emphasis so far has been on changes in the temporal discharge pattern produced by mistuning. Mistuned tones also tended to evoke higher average discharge rates from IC neurons, compared with harmonic complex tones. An example of a large rate increase is shown in Fig. 11. The neuron's response to pure tones at CF adapted over the first 100200 ms of the 500-ms stimulus, and although a few sustained spikes did occur, the neuron met the definition for the transient category (Fig. 11A). The average response to a harmonic complex tone was larger, 41 spikes/s compared with 15 spikes/s for the CF tone, and discharge synchrony with the periodicity of f0 was apparent in the 4-ms spacing of peaks in the histogram (Fig. 11B). As in the response to the pure tone, adaptation occurred during the first 100200 ms of the 500-ms tone. In contrast, the mistuned tone elicited a discharge rate of 95 spikes/s, an increase of more than a factor of 2 over the response to the harmonic tone and a factor of 6 over the response to a CF tone presented at the same SPL as the individual components of the complex tones (Fig. 11C). The rate increase was not simply a result of the higher overall SPL of the mistuned tone; the mistuned tone presented at 40 dB SPL per component, an overall level 1 dB below that of the pure tone that elicited the response in Fig. 11A, elicited a rate of 49 spikes/s (Fig. 11D). Another effect of mistuning was that the magnitude of the response to the mistuned tone showed no sign of adapting over time. In conjunction with the rate increases, clear examples of the stereotypical temporal pattern also emerged.
|
|
10 spikes/s. Mistuning led to a rate decrease of
10 spikes/s in only 42 cases (9%). Although the increases produced by mistuning were small, they were statistically significant (Wilcoxon signed-rank test, n = 480 pairs, P < 0.0001).
|
As noted, tones were nearly always presented monaurally to the contralateral ear, but exceptions were made for neurons that responded poorly to monaural stimuli but vigorously to binaural tones. Binaural complex tones were presented to five neurons, and responses were obtained for a total of 24 phase and SPL conditions. Although only a few examples were collected, there was no indication in those data that responses elicited with binaural stimulus presentation differed in any significant way from responses to monaural contralateral tones. One representative binaural discharge pattern is presented in Fig. 14. Figure 14, A and C (top row) presents cycle histograms of the responses obtained with binaural stimulation. The temporal discharge pattern obtained in response to the harmonic tone exhibited particularly strong synchrony at 250 Hz; because of the neuron's low CF, this pattern could reflect synchrony to the 250-Hz component in the stimulus or to the envelope. The response to the mistuned tone exhibited the features described previously for contralateral stimulation: the response envelope modulated with a 100-ms period, and fine structure consisting largely of peaks separated by 8 ms. As in some previous examples, this neuron also exhibited the temporal pattern associated with f0 during minima in the response envelope. Responses to monaural contralateral tones are illustrated in Fig. 14, B and D. These tones generated only a few spikes, and no temporal pattern could possibly be detected.
|
As noted, the same stimuli were presented to a computational model incorporating three sequential processing stages. Figure 15 illustrates the outputs of channels from all three stages of the model in response to the sine-phase mistuned-tone stimulus. This simulation included channels with two CFs, 1.242 and 2.089 kHz. The higher CF closely matched the CF of the IC neuron whose discharge pattern is included in D of the figure. At Stage 1 (Fig. 15A), the output of each channel was simply a band-passfiltered, half-waverectified version of the stimulus waveform. Neither channel was able to resolve individual components of the spectrum of the complex tone, resulting in a response with a modulated envelope whose frequency was determined by the spacing between the unresolved components. For the channel with the lower CF, the largest unresolved components were at 1.120 kHz (the mistuned component) and 1.250 kHz (the next-higher unmodified harmonic), and the response waveform had a beat or envelope frequency of 130 Hz. Response modulation with a period of 100 ms can also be seen, but that modulation was quite shallow. For the higher-CF channel, the unresolved components were at 1.750 and 2.000 kHz, resulting in an envelope frequency of 250 Hz. Each Stage 1 output also exhibited fine structure determined by the frequencies passed by the gammatone filter; individual cycles of the fine structure were too short to be visible in the figure. The modulation and fine structure observed in Stage 1 outputs generally reproduced the characteristics of auditory nerve fiber responses to similar mistuned tones reported by Sinex et al. (2003a)
.
|
At Stage 3 (Fig. 15C), the outputs originating in the two separate Stage 2 channels were summed. For this example, the channel with CF = 2.089 kHz provided excitatory input and the channel with CF = 1.242 kHz provided inhibition. The weight of the inhibitory input was set to 1.15, a value chosen empirically to produce a waveform whose shape closely approximated the discharge pattern of the neuron shown in D. The match between the discharge pattern of the IC neuron and the output of model Stage 3 was quite good, even though none of the waveforms generated at Stages 1 or 2 bore much resemblance to the neuron's response.
Stage 3 outputs that resembled the stereotypical discharge patterns of IC neurons to mistuned tones could be achieved with many different parameter sets. The examples shown in Fig. 16 were all produced by integrating across two channels, one of excitation and one of inhibition. In Fig. 16, AC, the excitatory CF was held constant at 1.242 kHz. The waveform of Stage 2 output for that CF was dominated by a 130-Hz envelope, as was shown in Fig. 15B. The CF of the inhibitory channel was varied, as was the weight of the inhibitory input. Similar but not identical Stage 3 outputs could be generated with inhibitory inputs originating in different frequency regions and, in each case, the stereotypical modulated response was obtained. The similarity in Stage 3 outputs reflected the fact that the waveforms of these Stage 2 outputs were not strongly affected by the variation in inhibitory CF because in each case the response envelope was produced by beating between adjacent tuned harmonics. Although the particular harmonics varied with CF, the frequency difference was always 250 Hz, producing a response envelope modulated with 4-ms periodicity.
|
One other difference across the examples in Fig. 16 was that the fine structure of the Stage 3 waveform could change during envelope minima. In Fig. 16C, the spacing between waveform peaks changed from about 8 ms around the maximum of the envelope, to about 4 ms at the envelope minimum. In Fig. 16, A and B, the periodicity did not change. Both of these patterns were observed in the responses of IC neurons (e.g., Fig. 7).
In the examples shown in Fig. 16, DF, the CF of inhibition was fixed and the CF of excitation was varied; the weight of inhibition was also allowed to vary to achieve a match to the stereotypical pattern. As before, similar output waveforms could be obtained for each pair of input channels and, as before, the major effect of varying the excitatory CF was a shift in REP.
The effects of varying temporal stimulus parameters are examined in Fig. 17. The phases of individual components in the mistuned tone were varied for the simulations shown in Fig. 17, AC. Parameters of the model were held constant across the three stimulus conditions in the figure. Varying the component phase had little effect on the fine structure of the Stage 3 response. Model responses to sine- and cosine-phase stimuli were similar, except for a small difference in overall amplitude. For the random-phase condition, a shift in REP of nearly half a modulation cycle was observed, analogous to the effects of component phase seen in the responses of IC neurons (Fig. 10).
|
| DISCUSSION |
|---|
|
|
|---|
As noted above and in Sinex et al. (2002b)
, a 130-Hz input could easily be produced by beating between responses synchronized to the mistuned fourth harmonic and the fifth harmonic in the stimulus used here. That is, a hypothetical neuron receiving two excitatory inputs, one synchronized to 1.120 kHz and another synchronized to 1.250 kHz, would be more likely to produce spikes when the two inputs were in phase, and less likely to produce spikes when the two inputs were out of phase, each phase relation occurring 130 times/s. In the same way, beating between any two exact harmonics would produce a response synchronized to f0, in this case 250 Hz. Second-order beats, between inputs locked to 130 and 250 Hz, would occur at 120 Hz. The computational model was used to evaluate hypotheses about the kind of mechanisms that could produce and integrate inputs at those two frequencies. These are considered further in Integrative mechanisms that produce the stereotypical discharge pattern below.
Responses across neurons
With modest variation, the stereotypical pattern was observed in neurons with different pure-tone discharge patterns and with CFs as low as 0.25 kHz and as high as 5.9 kHz. Neurons that responded to pure tones with sustained or pauser discharge patterns were most likely to exhibit the stereotypical discharge pattern. In general, sustained neurons exhibited the most highly modulated patterns. Exceptions were observed in neurons with long latencies, even if they responded in a sustained manner to other tones. Neurons with pauser discharge patterns generally exhibited a recognizable temporal pattern in response to mistuned tones. The responses of pauser neurons exhibited the 8-ms periodicity that was part of the stereotypical pattern, but they tended to exhibit less-prominent slow modulation than the responses of sustained neurons. Neurons that exhibited transient responses to pure tones generally responded poorly to complex tones, although there were exceptions (e.g., Fig. 11). In addition, the temporal discharge patterns of these neurons to mistuned tones included the 120- and 130-Hz components that distinguished them from responses to harmonic tones in fewer than half of the tested conditions (Fig. 6C). Sinex et al. (2002a)
previously reported that transient neurons in the IC of the chinchilla responded poorly to sinusoidally amplitude modulated (SAM) tones. As noted above, the stereotypical pattern is interpreted to be a response that results from integration of inputs slowly modulated at two different frequencies. Responses to harmonic tones also appeared to reflect beating inputs; in that case, the beats occurred at a single frequency, equal to f0. The present results are consistent with the results of Sinex et al. (2002a)
, which suggest that harmonic and mistuned tones should not be effective stimuli for transient neurons.
Mistuned tones would not have evoked such similar discharge patterns from neurons with widely varying CFs if each neuron were processing only a restricted region of the spectrum, which is what the typical narrow excitatory tuning of IC neurons (and auditory neurons in general) implies. Sharply tuned neurons may receive broadband inhibition (LeBeau et al. 2001
; Li et al. 2002
) and, for many neurons in this study, pure tones an octave or more from CF were capable of inhibiting responses at CF. In addition, Li et al. (2002)
reported that inhibition originating at frequencies remote from CF preserves the temporal pattern of the eliciting stimulus. Patterned inhibitory input appears to be critically important for producing the distinctive modulated discharge patterns observed in the present study, and the results presented here, showing similar discharge patterns evoked by the same complex stimulus across a range of CFs, may suggest an important functional role for that inhibition. This unusual result can be accounted for if it is assumed that individual IC neurons receive excitatory input that reflects two or more components of the complex tones, and patterned inhibitory input that reflects two or more components from a different part of the spectrum. This view is explained in greater detail in Integrative mechanisms that produce the stereotypical discharge pattern.
Shifts in response envelope phase
Similar discharge patterns could be elicited over at least a 40-dB range of levels, and by stimuli in which the phases of individual stimulus components were varied. In each case, the most obvious difference in discharge patterns was a shift in the phase of the modulated response envelope. Shifts of nearly one-half a modulation cycle,
50 ms in time, were common. These shifts accompanied stimulus changes that had little or no effect on the ability of listeners to identify the mistuned component as a separate sound. For example, Hartmann (2004)
reported that, although the detectability of a mistuned component was a nonmonotonic function of level, performance was relatively consistent within the range of levels most often used in the present study. We interpret this to mean that envelope phase is not useful for conveying information about the stimulus. However, it may be useful as an indicator of the integrative mechanisms that produce the observed discharge patterns.
It is likely that large shifts in the response envelope originate in smaller latency or phase shifts in the responses of neurons that provide direct or indirect input to the IC. Support for this possibility was obtained from simulated responses, which confirmed that fixed time delays on the order of 12 ms added to one Stage 1 input produced much larger shifts in Stage 3 REP. For pure tones, the first-spike latency of many auditory neurons decreases with increases in SPL (auditory nerve: Heil and Irvine 1997
; Liberman 1978
; Siegel et al. 1982
; cochlear nucleus: Kitzes et al. 1978
; Winter and Palmer 1995
; Young et al. 1988
; IC: Nuding et al. 1999
; Semple and Kitzes 1985
). The latencies of later-occurring spikes are usually not discussed, and in the absence of sustained phase-locked discharges cannot be related to the stimulus as easily as the first spike can. The phases of auditory-nerve fiber discharges to CF tones are relatively stable with changes in level (Kiang 1965
; Rose et al. 1967
). However, for tones away from CF, systematic phase changes occur with level (Anderson et al. 1971
) and, for brain stem neurons that do not exhibit pronounced phase-locking, it is at least plausible to assume that a shift in first-spike latency will be reflected in the timing of later spikes. Thus a given change in stimulus SPL can easily produce differential changes in the response latencies neurons providing input to a particular IC neuron; recruitment of new neurons into the responding population would also amount to a change in input phases. In either case, a shift in REP would occur. Similarly, a shift in stimulus phase would also shift the latencies of synchronized spikes; for example, a shift of one-half cycle at 1 kHz would be expected to produce latency shifts of, on the average, 0.5 ms. These latency changes, of a few milliseconds or less, could produce shifts on the order of 50 ms in the envelope of the response to the mistuned tone, if that envelope was produced by beating between phase-locked inputs as is suggested here. The effect on the IC response envelope would be essentially the same, whether the shifts arose from changes in the phases of stimulus components or changes in stimulus SPL, as was observed in the data.
Response fine structure
Most of the discussion has focused on the envelope of the responses of IC neurons to mistuned tones. This is because the existence of the response envelope and the way stimulus manipulations change the response envelope appear to provide clues to the integrative mechanisms that affect the representation of mistuned tones. However, as noted in the previous section, because the phase of this envelope was not preserved across neurons or stimulus manipulations, it may not be particularly useful for conveying information about the stimulus.
Harmonic and mistuned tones also elicited a stereotypical fine structure from IC neurons, which may be even more important for conveying information about harmonic structure that can be used for perceptual segregation. For harmonic tones, response fine structure, if there was any, was consistent; the periodicity matched the 250-Hz fundamental period of the complex tone as can be se