|
|
||||||||
Animal Physiology and Behaviour Group, Institute for Biology and Environmental Sciences, Carl von Ossietzky UniversityOldenburg, D-26111 Oldenburg, Germany
Submitted 9 September 2003; accepted in final form 17 March 2004
| ABSTRACT |
|---|
|
|
|---|
F) between the A and B tones and the tone repetition time (TRT). The A tones were presented at the neurons' characteristic frequency (CF), and B tones differed from the CF over a one-octave range. Larger
F values and shorter TRTs promote the perceptual segregation of alternating tone sequences in humans and also resulted in larger differences in neural responses to alternating CF (A) and non-CF (B) tones. Our results are consistent with the hypothesis that preattentive auditory processes, such as frequency selectivity and forward masking, contribute to the perceptual segregation of sequential acoustic events having different frequencies into separate auditory streams, but also suggest that additional processes may be required to account for all known perceptual effects related to sequential auditory stream segregation. | INTRODUCTION |
|---|
|
|
|---|
One form of auditory scene analysis that has been the focus of psychophysical (Bregman 1990
; Moore and Gockel 2002
), theoretical (Beauvois and Meddis 1991
, 1996
; McCabe and Denham 1997
), and physiological (Fishman et al. 2001
; Hung et al. 2001
; Sussman et al. 1999
) studies, is sequential auditory stream segregation, which involves the segregation of temporally separated sounds from intervening and overlapping sounds, and their integration into separate "auditory streams." A well-known example of sequential stream segregation is the "streaming effect" (Fig. 1; Bregman 1990
). Under some acoustic stimulus conditions, human listeners presented with a repeated 3-tone sequence comprised of 2 alternating tones (ABA-ABA-ABA-...) report hearing a galloping rhythm. Under different acoustic stimulus conditions the A and B tones perceptually "split" into separate auditory streams, and listeners report hearing 2 separate tone sequences with different isochronous rhythms corresponding to 2 simultaneous sequences of tones occurring at different rates (A-A-A-A-A-A-... and -B---B---B-...). Two of the most important stimulus attributes that determine whether a sequence of alternating A and B tones is heard as one coherent stream of alternating tones, or 2 segregated streams of A and B tones, are the frequency separation (
F) between A and B tones and the tone repetition time (TRT) (reviewed in Bregman 1990
). As illustrated in Fig. 1, the perceptual streaming effect is more pronounced with a larger
F and at shorter TRTs.
|
F and TRT in ways that parallel their influence on the perceptual streaming effect. Two such auditory processes that have been implicated in previous theoretical and electrophysiological studies as primitive processes involved in sequential stream segregation are frequency selectivity and physiological forward masking (Beauvois 1998
Frequency selectivity, which ultimately arises from spectral filtering in the cochlea, is realized in the form of tonotopic maps throughout the ascending auditory system. Tonotopy can promote the spatial separation of excitation by ensuring that alternating tones with different frequencies are encoded by different populations of neurons, with the separation among populations of neurons increasing as a function of
F (Beauvois 1998
; Beauvois and Meddis 1991
, 1996
; Fishman et al. 2001
; Hartmann and Johnson 1991
; McCabe and Denham 1997
). Physiological forward masking involves the suppression of neural responses to a sound (the signal) after the presentation of a preceding sound (the masker). Although forward masking can be observed in the responses of auditory nerve fibers (Relkin and Turner 1988; Turner et al. 1994), additional processing is thought to contribute to masked neural responses at higher levels of the auditory system (e.g., Brosch and Schreiner 1997
; Calford and Semple 1995
; Oxenham 2001
). In the context of a series of tones alternating in frequency, such as in the ABA- stimulus paradigm, each tone potentially serves as a masker of a subsequent tone, and as a signal tone following a preceding masking tone. Studies of physiological forward masking in cat primary auditory cortex (AI) using pure tones to mask a pure-tone signal presented at the recording site's characteristic frequency (CF) have shown that masking is often more pronounced when the masker and signal are similar in frequency and occur with short masker-signal delays (Brosch and Schreiner 1997
; Calford and Semple 1995
).
In a recent study using a sequential stream segregation paradigm to study neural ensemble responses in macaque AI, Fishman et al. (2001)
provided evidence to suggest that frequency selectivity and forward masking play important roles in sequential auditory stream segregation. They proposed the hypothesis that the differential suppression by tones presented at best frequency (BF) and non-BF tones resulted from the relatively stronger physiological forward masking of non-BF tones by preceding BF tones. According to this hypothesis, both BF (A) and non-BF (B) tones are able to mask each other to varying degrees, but BF (A) tones are more potent maskers than non-BF (B) tones when these are arranged in an alternating pattern. Fishman et al. (2001)
suggested that the time course of suppressive interactions between neural responses to maskers and signals can explain the well-known effects of TRT on the streaming effect.
Here, we report results from a study in an awake songbird, the European starling (Sturnus vulgaris), that investigated the potential roles of frequency selectivity and forward masking in sequential stream segregation using the ABA- experimental paradigm. As in humans, starlings and other songbirds rely primarily on acoustic signals for social communication, and numerous parallels exist in the perceptual processing of birdsong and human language (Ball and Hulse 1998
; Doupe and Kuhl 1999
). Thus the hearing systems of both humans and starlings have likely experienced common evolutionary selection pressures to solve similar detection and perceptual organization tasks (Klump 1996
; Klump et al. 2000
). Psychophysical and neurophysiological studies of the starling hearing system confirm that starlings and humans share an impressive number of similarities in the spectral and temporal processing of acoustic stimuli (reviewed in Klump et al. 2000
). These similarities extend, for example, to frequency selectivity, frequency discrimination, temporal resolution, temporal summation, and duration discrimination. Furthermore, given that vocal communication by songbirds and humans often occurs in large social groups (e.g., dawn choruses for songbirds and cocktail parties for humans), we should expect that songbirds also face the problem of perceptually segregating multiple overlapping and interleaved sequences of sounds into distinct perceptual objects. Current evidence, in fact, suggests that starlings possess capabilities of auditory scene analysis similar to those of humans (reviewed in Hulse 2002
). Most relevant to our study is a previous study of sequential auditory stream segregation by MacDougall-Shackleton et al. (1998)
, which demonstrated that starlings, like humans, experience the streaming effect using the ABA- stimulus paradigm, and are able to discriminate between coherent and segregated streams based on perceived differences in rhythm (Fig. 1). Given such strong similarities between humans and starlings in perceptual and physiological aspects of hearing, and the fact that starlings are known to experience the perceptual streaming effect, starlings are an excellent animal system for investigating physiological mechanisms underlying auditory scene analysis.
The starlings in our study were passive listeners, in the sense that the birds were not required to perform a learned discrimination task during recordings, and had not been trained to attend to, or to discriminate between, coherent or segregated streams, thus minimizing the potential influence of schema-based processes. We recorded neural responses in the tonotopically organized avian auditory forebrain (field L2) to repeated ABA- triplets in which we varied in a factorial design the TRT and the
F between A tones presented at the site's CF and non-CF (B) tones. The tonotopically organized avian field L2 is the primary target of ascending projections from the auditory dorsal thalamus (reviewed in Carr and Code 2000
), and thus represents the avian equivalent of mammalian primary auditory cortex, which is believed to play a role in sequential stream segregation (Fishman and Steinschneider 2003
; Fishman et al. 2001
; Micheyl et al. 2003
).
Our study had 3 primary objectives. First, we investigated the effects of
F and TRT on neural responses to test the general hypothesis that the degree of overlap in excitation along a tonotopic gradient decreases under stimulus conditions known to promote the perceptual streaming effect. We tested the specific prediction that the differences in responses to CF (A) and non-CF (B) tones would increase as an increasing function of
F and as a decreasing function of TRT. Second, we tested the hypothesis proposed by Fishman et al. (2001)
that differential responses to CF and non-CF tones are influenced by the relatively stronger physiological forward masking of non-CF tones by preceding CF tones. We tested the specific prediction that the relatively greater forward suppression of non-CF tones by preceding CF tones would increase as a decreasing function of both
F and TRT (Brosch and Schreiner 1997
; Calford and Semple 1995
). Finally, we compare the effects of
F and TRT on neural responses in the starling forebrain to their well-known influences on the perceptual streaming effect observed in previous psychoacoustic studies in humans and starlings.
| METHODS |
|---|
|
|
|---|
Four wild-caught, adult starlings (2 males, 2 females; 71.294.3 g) were used as subjects in this experiment. The care and treatment of the animals were in accordance with the procedures of animal experimentation approved by the Bezirksregierung Weser-Ems. All procedures were performed in compliance with the American Physiological Society's Guiding Principles in the Care and Use of Animals.
Detailed descriptions of the manufacturing of electrodes and surgical procedures can be found elsewhere (Hofer and Klump 2003
). Briefly, 2 types of extracellular recording electrodes were fashioned from either commercially made tungsten microelectrodes (shank diameter = 75 µm, Frederick Haer and Co., Bowdoinham, ME) or Teflon-insulated platinumiridium wires (shank diameter = 25 µm, A-M Systems, Carlsborg, WA) that were sharpened at the tip using the procedures described by Hofer and Klump (2003)
. The impedance of both types of electrodes measured in 0.9% NaCl ranged from 3.6 to 12.1 M
(1 kHz a/c). An array of 4 electrodes was fixed with dental acrylic to a small head-mounted microdrive that was used to manually lower the electrodes into the brain. The array was fixed so that the recording tips of different electrodes protruded between 0.5 and 1.5 mm from the opening of a 0.8-mm-diameter tube in the microdrive. The absolute distances between electrode tips typically varied between 0.25 and 1.0 mm.
After an initial subcutaneous injection of atropine (0.05 ml) to reduce salivation, surgery was performed under general anesthesia (Isoflurane: 5% for induction, 1.52.5% for maintenance). Anesthetized animals were fixed in a stereotaxic holder, with the bill inclined about 45° below the horizontal plane. The caudal bifurcation of the sinus sagittalis served as a reference for making a small hole in the right hemisphere of the skull that was 0.80.9 mm lateral and 1.61.8 mm rostral of the bifurcation. These coordinates were chosen to reach the input layer of the field L complex, L2 (Nieder and Klump 1999
). Electrodes were implanted in the brain through a small incision made in the dura. Two indifferent electrodes (stainless steel wire, diameter = 75 µm, A-M Systems) were implanted through a second small opening in the skull made in the left rostral hemisphere. The microdrive, indifferent electrodes, and a small socket for attaching a radio transmitter were fixed to the exposed skull with dental acrylic. Recordings began 39 days after surgery.
Multiunit recordings from a total of 46 recording sites were made from awake and freely behaving birds placed in a test cage (56 x 36 x 33 cm) located inside a radio-shielded sound chamber (IAC 402A, Industrial Acoustics, Niederküchten, Germany). Neural activity was recorded by radio telemetry using a small FM radio transmitter (FHC type 40-71-1, Frederick Haer and Co). The radio signal was received by a dipole antenna inside the sound chamber and demodulated by an FM tuner (Technics ST-GT 550, Panasonic, Hamburg, Germany) located outside the chamber. The demodulated signals were band-pass filtered (6004,500 Hz), amplified, digitized (Sound Blaster PCI128, 16-bit, 44.1 kHz), and stored on the hard drive of a Linux workstation (AMD Athlon XP 1900+) for later analysis.
At the beginning of an experimental recording session, the radio transmitter was attached to the head-mounted socket and the bird was temporarily restrained in a cloth jacket to prevent wing and leg movements. The microdrive was lowered stepwise until a site was found at which auditory-evoked activity was elicited in response to a series of test tones. The cells in L2 of the field L complex have an average cell diameter of 57 µm (Saini and Leppelsack 1981
); therefore we advanced the electrodes a minimum distance of 40 µm into the brain between recordings from the same electrode to ensure that different cells were recorded. Once a suitable recording site was found, the bird was released into the test cage and given unrestricted access to food and water. Because the subjects were completely unrestrained inside the test cage, it was not possible to objectively quantify a bird's arousal level during a recording session. Instead, birds were monitored remotely using a video camera mounted in the chamber and a video monitor located outside the chamber. These video observations revealed that subjects were awake during recording sessions because the birds often took food and water, changed positions by hopping between 2 perches in the cage, and regularly exhibited other behaviors, such as head turning, scratching, ruffling feathers, and bill-wiping.
At the completion of recordings, animals were killed with an overdose of sodium pentobarbital, and their brains were fixed by transcardial perfusion of Zambonis reagent after an initial flush of the circulatory system with a warm solution containing 0.9% NaCl and 0.5% NaNO2. The brain was stored in 30 ml of the fixative containing an additional 30% saccharose for several days before frozen sagittal sections (50 µm) were sliced and stained with cresyl violet to confirm the position of the electrodes (for additional details see Nieder and Klump 1999
). These histological analyses confirmed that recordings were made in field L2.
Stimulus generation and presentation
Acoustic stimuli were generated at a sampling rate of 44.1 kHz and 16-bit resolution using custom-designed software running on the Linux workstation that allowed for the synchronous playback of acoustic stimuli and recording of neural responses. The analog sound output of the computer soundcard was attenuated (HewlettPackard 350D, Böblingen, Germany, and TDT PA4, Tucker-Davis Technologies, Alachua, FL), amplified (Rotel RB-1050, Sussex, UK), and presented through a speaker (Type SP3253, KEF Audio, Maidstone, UK) mounted from the ceiling of the sound chamber about 70 cm above the position of a starling sitting in the test cage. The frequency response inside the test cage was flat (±4 dB) over the range of frequencies used in this study.
Just before presenting the stimulus sequences described below, we generated a frequency tuning curve (Fig. 2) by presenting a series of 20 pure tones (200-ms duration, 10-ms Gaussian rise and fall times, 800-ms intertone interval) at each of 11 frequencies separated by 0.25 octaves within a 2.5 octave range centered around our estimate of the CF based on responses to the series of test tones. An on-line window discriminator automatically rejected responses containing artifacts attributed to the bird's movements and repeated tones until 10 artifact-free responses were obtained at each combination of frequency and level. Presentations began at the lowest level of 0 dB SPL (re 20 µPa) and were increased in 5-dB steps to a level of 70 dB SPL. We determined the recording site's CF as the frequency with the lowest threshold, where threshold was determined as the lowest stimulus amplitude at which the neural response was >1.8 times the spontaneous rate.
|
We included 4 types of stimuli as controls. The first control stimulus (the "AAA- stimulus") consisted of a repeated triplet with the same temporal arrangement as the ABA- triplet, but consisted of 3 repetitions of the A tone alone (AAA-AAA-AAA-...). Hereafter, this stimulus is usually denoted as an ABA- stimulus having 0 semitones frequency separation between the A and B tones. A second type of control stimulus (the "BBB- stimulus") consisted of a repeated triplet consisting only of the B tone (BBB-BBB-BBB-...). The AAA- and BBB- stimulus sequences were designed to allow us to compare the effects of surrounding A and B tones on responses to A and B tones in the middle position of a triplet. A third control stimulus (the "A-A- stimulus") was similar to the ABA- stimulus, except that the B tones were omitted and replaced with silences equivalent to the tone duration (A-A-A-A-A-A-...). A final type of control stimulus (the "-B-- stimulus") consisted of the B tone alone occurring in the same temporal arrangement, relative to the A tones, as it occurred in the ABA- stimuli, with the A tones replaced by silent intervals equivalent to the tone duration (-B---B---B---B-...). The A-A- and -B-- stimulus sequences allowed us to assess responses to isolated, single-frequency tone sequences in relation to triplets containing A tones, B tones, or both. In all 5 types of stimuli (ABA-, AAA-, BBB-, A-A-, and -B--), the "triplet" and the following silent intertriplet interval were repeated 30 times in sequence, and data were collected for artifact-free responses to 20 triplet repetitions.
At each recording site, we presented stimulus sequences in a different randomized order at 70 dB SPL. A silent interval of 7 s separated consecutive sequences to minimize any possible influence of auditory stream biasing between consecutive stimulus presentations (Beauvois and Meddis 1997
; Bregman 1978
). Spontaneous activity was recorded for 4 s preceding the onset of the first tone in each stimulus sequence. The generation of the frequency tuning curve, and a complete presentation of all stimulus sequences, required about 4 h.
Experimental design
We examined the effects of
F between CF and non-CF tones by fixing the frequency of A tones at the recording site's CF and varying the frequency of the non-CF (B) tone away from that of the CF (A) tone over a one-octave range along a semitone scale by a value of 2, 4, 6, 8, 10, or 12 semitones (see Fig. 2). The frequency of the B tone within a given stimulus sequence was constant over all 30 triplet repetitions; therefore there were 6 different ABA-, BBB-, and -B-- stimulus sequences corresponding to the 6 levels of
F between the CF (A) and non-CF (B) tones. Similar ranges of
F values have been used in human psychoacoustic studies of the streaming effect (Anstis and Saida 1985
; Carlyon et al. 2001
; Rogers and Bregman 1993
; van Noorden 1975
). In starlings (MacDougall-Shackleton 1998
), the streaming effect has been demonstrated to occur at a short TRT when the
F was about 9 semitones or larger, but not when the
F was less than about 1 semitone. MacDougall-Shackleton (1998)
did not test
Fs between 1 and 9 semitones, nor did they test longer TRTs. For recording sites with CFs above about 1 kHz and below about 3 kHz, the direction of
F imposed on the B tones relative to the CF was determined randomly; for CFs below 1 kHz or above 3 kHz, the frequency of the B tones was increased or decreased, respectively, to ensure that the frequencies of the B tones remained well within the starling's hearing range (Dooling et al. 1986
; Klump et al. 2000
).
To investigate the effects of TRT, the repeated tones within a stimulus sequence were presented at TRTs that where 100, 200, 400, or 800% of the tone duration (TD), which was constant within a stimulus sequence and was either 25, 40, or 100 ms (see following text). Here, we use TRT to refer to the time interval between the onsets of 2 consecutive tones within a sequence of 3 tones, for example, either A to B or B to A in the ABA- triplet. Thus shorter TRTs (expressed as percentages of TD) correspond to faster tone rates, and longer TRTs correspond to slower tone rates. The designated TRTs for the isolated tone sequences (A-A- and -B--) refer to the TRTs of the corresponding ABA- triplets, so that at a given TRT the respective periods of the isolated A and B tones were the same as those in the corresponding ABA- stimulus sequence. We included a TRT that was 100% of the TD as the fastest possible tone rate without tone overlap to simulate conditions under which the perceptual streaming effect is known to occur in humans (Miller and Heise 1950
; van Noorden 1975
) and in starlings (MacDougall-Shackleton et al. 1998
). Longer TRTs (e.g., TRT = 800% of TD) were chosen to simulate conditions under which the streaming effect should be weak or absent based on human psychophysical data (Beauvois and Meddis 1991
; van Noorden 1975
).
We performed the experiment using 3 different TDs (25, 40, and 100 ms) to provide a basis for comparing our results with a broad range of previously published studies. In their study of the streaming effect in macaque AI, Fishman et al. (2001)
used 25-ms-duration tones. In one of the most widely cited studies of the streaming effect in humans (van Noorden 1975
), a number of important properties of the streaming effect were demonstrated using 40-ms duration tones. A TD of 100 ms is similar to tone durations used in other studies of the streaming effect in humans (Rogers and Bregman 1993
; Rose and Moore 1997
, 2000
; Singh and Bregman 1997
), and, more important, it is the TD that was used by MacDougall-Shackleton et al. (1998)
to demonstrate that starlings experience the streaming effect. Within a particular stimulus sequence, the duration of all tones was the same, and the amplitude envelope of the individual tones in all stimuli had 5-ms Gaussian rise and fall times. For each of the 3 TDs, a subset of 80 stimulus sequences was created based on the 4 levels of TRT and the 20 possible combinations of
F and stimulus type (6 ABA-, 1 AAA-, 6 BBB-, 1 A-A-, 6 -B--), for a total set of 240 stimulus sequences.
Data analysis
For each recording site (n = 46), and for each of the 240 stimuli, we determined the mean discharge rate in spikes/s during the first, second, and third tone presentations in a triplet (or their corresponding silences in the A-A- and -B-- stimuli), averaged over artifact-free responses to 20 triplets (for additional details see Nieder and Klump 1999
). The timing of analysis windows was adjusted to compensate for the response latency (1114 ms). Normalized responses are expressed as a percentage of the average response to an isolated CF (A) tone. We normalized discharge rates by dividing the average response to each tone (or corresponding silence) by the average response to the isolated CF (A) tone having the same TD in the A-A- stimulus presented at the slowest rate (TRT = 800% of TD). Thus for the 25, 40, and 100 ms TDs, response rates were normalized to responses to isolated CF (A) tones that were separated by silent intervals of 375, 600, and 1,500 ms, respectively. In a neurophysiological study of forward masking in the starling auditory forebrain, Klump and Nieder (2001)
showed that masker-signal delays of just 80 ms lead to an approximately 55-dB reduction in forward masking relative to a 5-ms masker-signal delay. Thus our method of normalization standardized A-tone and B-tone responses to the response to isolated CF (A) tones presented at rates slow enough to minimize any potential effects of forward masking. A normalized spontaneous rate was determined by averaging the discharge rate, first, across 20 artifact-free, 100-ms time windows recorded just before each stimulus presentation and, second, across all 240 stimulus presentations (480 s total), and then by normalizing this average spontaneous rate to the average stimulus-driven rate in response to the 100-ms CF (A) tone presented at the longest TRT (800%).
Statistical analysis
We examined the effects of
F and TRT using repeated-measures ANOVA (rmANOVA). Tone duration (TD) was also included as a factor in these analyses to partition out any variance that could be attributed to differences in this variable. Planned comparisons were used to test a number of specific predictions (Rosenthal and Rosnow 1985
), which we describe below. For repeated-measures analyses with more than a single numerator degree-of-freedom (df), we calculated P values using the Greenhouse and Geisser (1959)
adjusted df for omnibus tests of within-subjects factors that violated the sphericity assumption of rmANOVA (Mauchley's sphericity test). The unadjusted df values are shown when reporting statistical results. We also computed for each rmANOVA the partial
2 as a measure of the effect size for all main effects and interactions. Partial
2, which can vary from 0 to 1, is the proportion of the combined effect and error variance that is attributable to the effect, and thus represents a nonadditive "variance-accounted-for" measure of effect size, which serves as an estimate of the extent to which the null hypothesis of "no effect" is false. The interpretation of partial
2 values is similar to that of the more familiar coefficient of determination (r2). Although statistical analyses are essential to determine whether there are significant differences among treatments in any experiment, there is the concomitant risk of detecting statistically significant effects of questionable biological importance, especially in studies with high statistical power. In the analyses described below, we pay special attention to the magnitudes of effect sizes in our analyses, and do not judge the influence of a variable solely by the magnitude of the associated P value. All analyses were performed with Statistica 5.5 or SPSS 11.5, and an experiment-wide criterion of
= 0.05 was used to determine statistical significance.
Preliminary analysis of the normalized responses to the ABA-, BBB-, and -B-- stimuli were conducted to estimate the magnitude of the between-recording-sites effects of the direction of
F between the A and B tones (increase vs. decrease). These analyses revealed that the between-recording-sites factor of the direction of
F usually explained <5% of the variation in responses to the stimuli. Therefore this factor was not included in subsequent statistical analyses.
| RESULTS |
|---|
|
|
|---|
We assessed the magnitudes of the effects of
F, TRT, TD, and position within a triplet (Tone 1, 2, or 3) on responses to the CF (A) and non-CF (B) tones in the ABA- triplet by computing a 7 (
F) x 4 (TRT) x 3 (TD) x 3 (Tone) rmANOVA (Table 1). Three important trends were evident in responses to the ABA- stimulus (Figs. 3 and 4A). First, there was a significant main effect of
F and a significant interaction between
F and position within a triplet (Table 1). Responses to the non-CF (B) tone in the middle triplet position decreased as an increasing function of
F, whereas responses to the CF (A) tones in the first and third triplet positions were largely unaffected by differences in
F, at least at the longer TRTs (e.g., 200, 400, and 800%). At the highest levels of
F (812 semitones), the pattern of responses was dominated by excitatory responses to the A tones, and responses to the B tone in the middle triplet position were often not significantly different from, or were significantly lower than, the spontaneous rate (Table 2). Second, there was a significant main effect of TRT and a significant interaction between TRT and position within a triplet (Table 1). Responses to the B tone in the middle triplet position, and the A tone in the third triplet position, were additionally suppressed at the shortest TRT (TRT = 100% of TD) compared with longer TRTs. Third, the magnitude of additional suppression present at the shortest TRT depended on
F, being greater when the A and B tones were more similar in frequency (
F = 06 semitones). This
F-dependent suppression of B tones and the second A tone presented at the shortest TRT accounts for the significant
F x TRT and
F x TRT x Tone interactions (Table 1).
|
|
|
|
Responses to single-frequency CF and non-CF tone sequences
In responses to all 3 tones in the repeated triplets of the BBB- stimulus (Fig. 4B), response magnitudes decreased with increasing
F. At the shortest TRT there was also additional suppression of responses to the B tones in the second and third triplet positions. This suppression at the shortest TRT was larger at small
F values, at which the frequency of the B tones was closer to the recording site's CF. Responses to the B tone in the -B-- stimulus (Fig. 4C) also decreased with increases in
F. Unlike responses to the middle B tones in the ABA- and BBB- stimulus sequences, however, responses to isolated B tones in the -B-- stimulus presented at the shortest TRT were similar to those at longer TRTs. Responses to all 3 A tones in the AAA- stimulus were generally similar and near the maximum normalized response rate at TRTs of 200, 400, and 800% (Fig. 4D). However, responses to the second and third A tones in the AAA- stimulus were suppressed at the shortest TRT. The responses to the isolated A tones in the A-A- stimulus (Fig. 4E) were similar at all TRTs, as would be expected for repetitions of isolated and temporally separated CF tones.
Effects of
F and TRT on differential responses to alternating CF and non-CF tones
One goal of the present study was to test the general hypothesis that the degree of overlap in excitation along a tonotopic gradient decreases under stimulus conditions known to promote the perceptual streaming effect. We predicted that, at any particular site along the tonotopic distribution of CFs in field L2, the differences in responses to CF (A) and non-CF (B) tones in the ABA- stimulus should be larger under acoustic stimulus conditions that promote the perceptual streaming effect, that is, at larger
F values and shorter TRTs (Fig. 1). To test these predictions, we examined differential responsiveness to the A and B tones in the ABA- stimulus sequence by computing a difference score (Difference 1, Table 3) as the difference between the normalized response to the B tone and the normalized responses to A tones, averaged over responses to the A tones in the first and third triplet positions. Recall that normalized responses are expressed as a percentage of the average response to an isolated CF (A) tone presented at the longest TRT. Therefore values of Difference 1 close to 0% indicate that the normalized responses to the A and B tones were similar, whereas values close to 100% correspond to the situation where there was a large difference between responses to the A and B tones, and the pattern of responses was dominated by excitatory responses to the A tone alone.
|
F and TRT were assessed in a 7 (
F) x 4 (TRT) x 3 (TD) rmANOVA of the relative responses to A and B tones (Difference 1). The results of this analysis, which are reported in Table 4, revealed significant main effects of
F (
2 = 0.87) and TRT (
2 = 0.54), and a significant
F x TRT interaction (
2 = 0.10). Although there was a significant main effect of TD (
2 = 0.31), the 2-way interactions of
F x TD and TRT x TD, and the 3-way interaction of
F x TRT x TD, were associated with relatively small effect sizes (
2
0.06; Table 4), indicating that differences in TD had little influence on the effects of
F and TRT.
|
F became larger by first examining the main effects of
F separately for each level of TRT, and then, for significant differences, by computing planned linear contrasts comparing the linearly ordered effects of
F (in semitones: 0 < 2 < 4... < 12). As the results in Table 5 show, the main effect of
F was significant at each level of TRT and explained approximately 85% of the variation in responses. Contrasts testing for a linearly ordered relationship across
F at each level of TRT were also significant at each level of TRT and explained more than 90% of the variance in responses. At intermediate
F values of 48 semitones, the discharge rate in responses to the non-CF (B) tones was about 2060% lower than responses to the surrounding CF (A) tones, whereas at the largest
F values of 1012 semitones, the rate responses to B tones was reduced by 7080% relative to the average responses to the surrounding A tones (Fig. 5). Even at a
F of 2 semitones, there was a 15% reduction in the rate response to B tones at the shortest TRT. The results for these analyses of the effects of
F are important because they demonstrate that increases in the
F between the CF (A) and non-CF (B) tones in an alternating series resulted in greater differences in the rate responses to CF and non-CF tones, as expected as a result of the frequency selectivity of neurons in field L2.
|
|
F. The main effect of TRT was significant at each level of
F (Table 5). The effects of differences in TRT were greatest at
F values of 26 semitones, at which 3749% of the variance was explained, compared with 2328% of the variance explained at
F values of 0, 8, and 10 semitones, and compared with 9% explained at 12 semitones (Table 5). Linear contrasts comparing the ordered relationships among the levels of TRT (800% < 400% < 200% < 100%) at each level of
F revealed the predicted pattern of significantly larger differences in responses at smaller TRTs (Table 5). For
F values between 2 and 8 semitones, the linear contrasts explained 4362% of the variance in responses. At these levels of
F, there was an additional 1015% reduction in the rate response to non-CF (B) tones, relative to CF-tone responses, in ABA- triplets that occurred at the shortest TRT (100%) compared with the longest TRT (800%) (Fig. 5). At
F values of 1012 semitones, the linear contrasts of TRT explained 30% or less of the variance, and the corresponding differences in the rate responses to A and B tones between the shortest and longest TRTs were smaller (57%). The important implications of these results are the following. First, shorter TRTs resulted in larger differences in responses to CF (A) and non-CF (B) tones compared with longer TRTs. Second, the effects of TRT on the relative responses to CF and non-CF tones depended on the level of
F; the effects of TRT were generally greater when
F was between 2 and 8 semitones compared with larger
F values.
Effects of
F and TRT on forward masking by alternating CF and non-CF tones
The second major goal of this study was to test the hypothesis of Fishman et al. (2001)
, which proposes that physiological forward masking results in the differential suppression of responses to CF and non-CF tones when these are arranged in an alternating tone sequence. If this hypothesis is true, then we should expect relatively greater forward suppression by CF tones, and this suppression should be more pronounced at shorter TRTs, especially when the A and B tones are similar in frequency (Brosch and Schreiner 1997
; Calford and Semple 1995
). We tested this prediction by determining the relative masking effects of CF (A) tones and non-CF (B) tones when these were arranged in the alternating ABA- stimulus sequence as functions of
F and TRT. We assessed these effects by computing a number of additional difference scores (see Table 3) between responses to A and B tones in various stimulus sequences.
First, we asked, what were the masking effects of surrounding CF (A) tones on the non-CF (B) tone in the middle position of the ABA- triplet relative to an isolated non-CF (B) tone condition, in which the surrounding CF tones were absent and no masking was expected? To address this question, we computed the difference between responses to isolated B tones in the -B-- stimulus and responses to middle B tones in the ABA- stimulus (Difference 2, Table 3, Fig. 6A). Because the
F-dependent changes in responses to the -B-- stimulus result from the neuron's tuning characteristics alone, any differences between responses to B tones in the ABA- and -B-- stimuli thus reflect the effects of CF (A) tones on non-CF (B) tones in the context of the ABA- stimulus. As Fig. 6A shows, responses to the non-CF (B) tones in the ABA- triplet were suppressed relative to the condition in which the surrounding CF (A) tones were absent. This suppression was most pronounced at the shortest TRT (100% of TD), at which the degree of suppression was also related to the magnitude of
F. At the shortest TRT and a
F of 2 semitones, the normalized discharge rate of responses to B tones in the ABA- triplet was 33% lower when compared with isolated B-tone responses. The greater suppression of B tones in the ABA- triplet decreased as
F increased toward 12 semitones, at which there was a difference of only 10% in the normalized rates. In contrast to these suppressive effects at the shortest TRT, suppression of responses to a non-CF (B) tone surrounded by CF (A) tones was much less pronounced at the longer TRTs, although there was a slight trend toward some suppressive effect at longer TRTs and intermediate
F values (26 semitones). These results for Difference 2 confirm that non-CF (B) tones in the middle position of an ABA- triplet were additionally suppressed when presented in the context of surrounding CF (A) tones relative to a condition in which surrounding CF tones were absent, especially at the shortest TRT and when the CF and non-CF tones were more similar in frequency.
|
F (Figs. 6B,C). This is in contrast to the results for the shortest TRT. At the TRT of 100%, the presence of non-CF (B) tones had a
F-dependent suppressive effect on responses to CF (A) tones in the third triplet position (Fig. 6C), but not on tones in the first triplet position (Fig. 6B). Suppression of responses to the A tone in the third triplet positions was greatest when the frequency of the non-CF (B) tone was similar to the CF, and decreased as
F increased (Fig. 6C). Finally, we asked, what were the relative masking effects of CF tones and non-CF tones when these were arranged in the alternating ABA- stimulus sequence? The relative suppressive effects of CF (A) tones and non-CF (B) tones in the ABA- stimulus correspond to the differences between Difference 2 (effects of A tones on B tones) and Differences 3 and 4 (effects of B tones on A tones). To compare the relative suppressive effects of A and B tones when these occurred in the ABA- stimulus arrangement, we therefore computed 2 additional differences scores: Difference 5 was the difference between Difference 2 and Difference 3, and Difference 6 was the difference between Difference 2 and Difference 4 (Table 3). Note that negative values of Difference 5 and Difference 6 indicate that the suppressive effects of CF (A) tones on non-CF (B) tones were relatively greater than the effects of non-CF (B) tones on CF (A) tones, whereas positive values indicate the opposite. The values for Difference 5 and Difference 6 are depicted in Fig. 6, D and E, respectively. Figure 6F depicts the mean difference scores averaged over the values of Difference 5 and Difference 6.
Two important trends are evident from an examination of the average of Difference 5 and Difference 6 (Fig. 6F). First, the values for the difference scores are generally either close to zero or negative. Two-tailed, paired-sample t-tests, followed by a sequential Bonferroni correction (Rice 1989
) for 72 multiple comparisons (6
F x 4 TRT x 3 TD), revealed that in 26 of the 72 possible combinations of
F, TRT, and TD, the negative values for the average of Difference 5 and Difference 6 were significantly below the null expectation of zero difference (3.6 < ts45 < 8.5, Ps < 0.001). In 23 of the 26 comparisons that were significantly different, the
F was 8 semitones or less, and the TRT was 100 or 200% of the TD. In no instances were the average values of Differences 5 and 6 significantly above the null expectation of zero difference. These results confirm that under some stimulus conditions the relative suppressive effects of CF (A) tones were significantly greater than the effects of non-CF (B) tones. Second, the relatively greater suppression of non-CF (B) tones by CF (A) tones varied as a function of
F and TRT. To statistically examine the effects of
F and TRT, we analyzed the average of Difference 5 and Difference 6 in a 6 (
F) x 4 (TRT) x 3 (TD) rmANOVA (Table 6). There were significant main effects of
F (
2 = 0.35) and TRT (
2 = 0.40), and the
F x TRT interaction (
2 = 0.10) was significant. No other effects in the main ANOVA model were significant (
2
0.05).
|
F levels compared with larger
F levels, and that the effects of TRT should be more pronounced at smaller
F values compared with larger
F values (Brosch and Schreiner 1997
F (in semitones: 2 > 4 >... > 12), and the second set compared separately at each level of
F the linearly ordered effects of TRT (100% > 200% > 400% > 800%). Results from the first set of contrasts indicated that the linearly ordered effects of
F on the greater relative masking by CF (A) tones were significant at all levels of TRT. The linear effects of
F were most pronounced at the shortest TRT (
2 = 0.45), and least pronounced at the longest TRT (
2 = 0.09). The effects at TRTs of 200 and 400% were intermediate. The second set of contrasts, which examined the linearly ordered effects of TRT, revealed significant effects at all levels of
F. The magnitude of the linearly ordered effects among the levels of TRT decreased as
F increased from 2 to 12 semitones. At a
F of 2 semitones, the linearly ordered effects of TRT explained 57% of the variation in the relatively greater suppressive effects of CF (A) tones on non-CF (B) tones, whereas at
F levels of 1012 semitones, 26% or less of the variation was explained by the linearly ordered effects of TRT.
The data depicted in Fig. 6F, and the statistical results reported in Table 6, can be summarized as follows. The relatively greater forward suppression caused by CF (A) tones, compared with non-CF (B) tones, was most pronounced at the stimulus combinations that included short TRTs (e.g., 100%) and small
F values (e.g., <68 semitones). At the shortest TRT and
F values of 26 semitones, the relatively greater forward suppression of non-CF (B) tones caused by surrounding CF (A) tones was equivalent to a 2025% greater reduction in the discharge rate. These results are important because they conform to the pattern of responses expected if forward masking played a role in determining the relative magnitude of responses to the A and B tones when these were embedded in the context of an alternating ABA- sequence.
| DISCUSSION |
|---|
|
|
|---|
Effects of
F and TRT
Frequency differences (
F) between tones and the tone repetition time (TRT) have strong influences on the perceptual segregation of interleaved tones into separate streams, with larger
F values and shorter TRTs promoting the streaming effect (Bregman 1990
; Fig. 1). Frequency selectivity and forward masking are believed to play roles in the perceptual segregation of sequences of alternating tones by promoting the spatial separation of neural responses in tonotopic space (Beauvois and Meddis 1991
, 1996
; Fishman et al. 2001
; Hartmann and Johnson 1991
; McCabe and Denham 1997
). Our goals were to determine whether larger
F values and shorter TRTs result in larger differential responses to CF (A) and non-CF (B) tones, and whether differential forward masking between CF and non-CF tones could account for the effects of TRT on neural responses.
In starlings, the 10-dB bandwidths of excitatory tuning curves in the auditory periphery and in the tonotopically organized field L2 are similar, the latter depending somewhat less on level (Klump et al. 2000
; Nieder and Klump 1999
). We therefore predicted that the overlap in neural excitation in field L2 elicited by CF (A) and non-CF (B) tones in an ABA- stimulus would decrease as
F increased as a result of frequency selectivity. Over the range of stimulus properties used in this study,
F consistently had large effects on the differential responses to CF (A) and non-CF (B) tones. A difference in responses to CF and non-CF tones was observed at the smallest
F used in this study (2 semitones), and the differences in responses increased as the
F between A and B tones increased up to the largest
F values (1012 semitones), at which responses were dominated by excitatory responses to the CF (A) tone alone. In recordings of neural ensembles from macaque AI, Fishman et al. (2001)
found a similar effect of
F on responses to A and B tones presented in an alternating ABAB sequence when the A tones were presented at the site's BF and B tones were situated at
F levels that were 1050% (1.657.0 semitones) away from the BF. Our results, and those from macaque AI, are thus consistent with a major prediction of hypotheses proposing that frequency selectivity in the auditory system plays a role as a low-level, preattentive process in sequential stream segregation, at least for sequences of pure tones (Beauvois and Meddis 1991
, 1996
; Fishman et al. 2001
; Hartmann and Johnson 1991
; McCabe and Denham 1997
).
We also predicted that if physiological forward masking plays a role in determining the responses to alternating CF and non-CF tones, as proposed by Fishman et al. (2001)
, then the differences in responses of starling forebrain neurons to CF (A) and non-CF (B) tones would increase as the TRT decreased, and that the effects of TRT would be larger at smaller
F values (Brosch and Schreiner 1997
; Calford and Semple 1995
). Our results show that differential neural responses to A and B tones increased as the TRT decreased, and that the magnitude of the effects of TRT were larger at
F values between 2 and 8 semitones compared with larger
F values. The smaller effects of TRT at
F values of 1