Journal of Neurophysiology

Processing of Sound Sequences in Macaque Auditory Cortex: Response Enhancement

Michael Brosch, Andreas Schulz, Henning Scheich

Abstract

It is well established that the tone-evoked response of neurons in auditory cortex can be attenuated if another tone is presented several hundred milliseconds before. The present study explores in detail a complementary phenomenon in which the tone-evoked response is enhanced by a preceding tone. Action potentials from multiunit groups and single units were recorded from primary and caudomedial auditory cortical fields in lightly anesthetized macaque monkeys. Stimuli were two suprathreshold tones of 100-ms duration, presented in succession. The frequency of the first tone and the stimulus onset asynchrony (SOA) between the two tones were varied systematically, whereas the second tone was fixed. Compared with presenting the second tone in isolation, the response to the second tone was enhanced significantly when it was preceded by the first tone. This was observed in 87 of 130 multiunit groups and in 29 of 69 single units with no obvious difference between different auditory fields. Response enhancement occurred for a wide range of SOA (110–329 ms) and for a wide range of frequencies of the first tone. Most of the first tones that enhanced the response to the second tone evoked responses themselves. The stimulus, which on average produced maximal enhancement, was a pair with a SOA of 120 ms and with a frequency separation of about one octave. The frequency/SOA combinations that induced response enhancement were mostly different from the ones that induced response attenuation. Results suggest that response enhancement, in addition to response attenuation, provides a basic neural mechanism involved in the cortical processing of the temporal structure of sounds.

INTRODUCTION

In audition, the temporal structure of sounds conveys significant information. This is clearly true for speech and other communication signals. The auditory cortex is necessary for the neural representation of the temporal structure and order of sounds as has been revealed in many behavior-lesion studies on monkeys and cats (Cowey and Weiskrantz 1976; Diamond and Neff 1957; Diamond et al. 1962; Heffner and Heffner 1984, 1989; Hupfer et al. 1977;Kaas et al. 1967; Scharlock et al. 1965;Strominger et al. 1980). The computations of neurons in auditory cortex underlying this representation, however, are widely unknown.

An important neural mechanism for the computation of the temporal structure of sounds seems to involve attenuation of the neurons’ responses by inhibition or adaptation. Using a variety of temporally structured sounds, including sequences of pure tones, click trains, amplitude modulated pure tones or noise, natural sounds, communication sounds, and modified versions of them, a number of studies has found that the response to a given acoustic event is attenuated if this event occurs shortly after (Brosch and Schreiner 1997;Calford and Semple 1995; Creutzfeldt et al. 1980; Eggermont 1991, 1994, 1998; Glass and Wollberg 1983; Hocherman and Gilat 1981;Phillips et al. 1989; Schreiner and Langner 1988; Schreiner and Urbas 1986, 1988;Steinschneider et al. 1982) or before another event (Brosch et al. 1998). For example, the response to the second tone of a sequence of two tones can be attenuated up to intertone intervals of ∼1 s. Attenuation becomes stronger for shorter intertone intervals and at intertone intervals below some limit, varying between 30 and 400 ms for different neurons, attenuation can be so strong that a neuron does not respond to the second tone at all. The influence of the first tone on the response to a second tone is not only determined by the temporal but also by the spectral separation of tones: the influence lasts longest if the second tone has the same frequency as the preceding tone and decreases with increasing frequency difference (Brosch and Schreiner 1997).

A small number of reports suggests the existence of another neural mechanism involved in the coding of the temporal structure of sounds. This mechanism is complementary to response attenuation and causes neurons to respond more strongly to a given tone if this tone is presented shortly after another tone. Thus far, this response enhancement has been investigated systematically only in nonprimary auditory cortex of echolocating bats (Esser et al. 1997;Suga et al. 1978, 1983). Suga and coworkers (1978, 1983) described temporal and spectral combination-selective cells in the FM-FM area of auditory cortex that were specialized for the requirements in echolocation: preferred temporal intervals between consecutive tones were in the range of ∼10 ms and preferred frequency intervals matched the frequencies contained in the emitted orientation sounds. With temporally modified versions of segments of species-specific communication calls, evidence for response enhancement was obtained in 21% of the neurons in the FM-FM area of the auditory cortex of a bat (Esser et al. 1997). Neurons were highly sensitive to the temporal interval between segments of a composite. Simply introducing a silent interval of 3 ms between two consecutive segments of a call could abolish the neural response.

For less specialized mammals, the existence of cortical neurons, selectively responding to specific sound sequences, has been postulated by Newman and Symmes (1979), but such “feature detectors” have not been found by them or others. Yet there is a small number of studies showing examples for response enhancement in primary auditory cortex (AI) (Brosch and Schreiner 1997;McKenna et al. 1989; Riquimaroux 1994;Sutter et al. 1996). These reports do not provide any information on the percentage of neurons exhibiting response enhancement and on the types of tone sequences that evoke response enhancement. Studies in humans suggest that response enhancement could be a quite common neural mechanism in auditory cortex: The acoustically evoked electric (Budd and Michie 1994) and magnetic (Loveless and Hari 1993; Loveless et al. 1989) potential N100 was augmented when tone pairs with a temporal separation of 70–300 ms were presented to subjects. Further indication for response enhancement comes from a study of paired-pulse facilitation in auditory cortex (Metherate and Ashe 1994) and from studies in visual cortex using temporal sequences of light flashes (Ganz and Felder 1984;Nelson 1991a).

On the basis of the cited reports, we hypothesized that response enhancement provides a general cortical mechanism involved in the processing of the temporal structure of sounds. To test this hypothesis, the following questions were addressed: how many neurons exhibit enhanced responses when two tones are presented in succession? What are the spectral and temporal characteristics of tone sequences producing response enhancement? How does response enhancement relate to response attenuation? What is the relation between response enhancement and spectral filter properties of neurons? Are there differences between neurons in different auditory cortical fields with regard to response enhancement?

These questions were addressed by recording action potentials from multiunit groups and single units in AI and in the caudomedial auditory belt of macaque monkeys. Sound sequences were pairs of two brief pure tones presented in succession. The second tone was held constant and served as a probe to explore how the neural responses to the second tone was influenced by the frequency of the first tone and the interval between the tones.

METHODS

Animal preparation

Three adult Macaca fascicularis (2.5–4 kg) were used for the present study. Animals initially received an injection of atropine (0.5 mg/kg). A few minutes later, anesthesia was induced by a mixture of ketamine HCl (4 mg/kg) and xylazine (5 mg/kg im). The trachea was intubated and the head was placed in a standard stereotaxic apparatus. An extensive craniotomy was performed over the left auditory cortex including the lateral inferotemporal sulcus and parts of the central sulcus. The dura was removed and parts of the parietal lobe were aspirated to facilitate the access to the auditory cortex. For the fixation of the head to the laboratory frame, previously implanted head holders were used in two animals, whereas the third animal was prepared for head restraint during the surgery. Animals received prophylactic injections of an antibiotic (Anicilin; 200 mg/kg) and dexamethasone (0.2 mg/kg). During the experiments, animals continuously were given, through an intraperitoneal catheter, ketamine (30 mg · kg−1 · h−1 ) and xylazine (25 mg · kg−1 · h−1 ) in a Ringer’s/glucose solution (5 ml · kg−1 · h−1 ) titrated to effect. Body temperature was maintained around 38°C with a heating pad. The exposed brain was kept moist with silicon oil.

Experiments lasted 80–85 h; 50–55 h after induction of anesthesia, recordings were briefly interrupted and 100 nl fluorescent latex beads (0.03–0.5 μm; Microprobe) was injected close to electrophysiologically characterized recording sites. About 30 h later, animals were anesthetized deeply for perfusion with 2 L of Ringer solution including 1% heparin followed by 2 L of 4%-paraformaldehyde including 0.1% glutaraldehyde. Brains were cut into 60-μm-thick frontal sections and stained for parvalbumin with an antibody (Sigma), which was visualized by α-chloronaphtol (Sigma). This stain allowed identification of the location of injection sites relative to the characteristic staining of AI (e.g., Kosaki et al. 1997). The border of the transition from dense to less dense staining corresponded with the electrophysiologically determined tonotopic gradient. Experiments were approved by the local committee for animal care and ethics and conformed with the rules for animal experimentation of the European Communities Council Directive (86/609/EEC).

Neural recording

Experiments were conducted in an electrically shielded sound-attenuated double-walled room (IAC, 1202-A). Recordings were made simultaneously from seven individually advanceable fiber-microelectrodes that were arranged in a row with an interelectrode distance of 330 μm (Thomas Recording). Signals on each electrode were amplified separately, band-pass filtered between 0.5 and 5 kHz, and displayed on oscilloscopes. The multiunit activity on each channel also could be monitored with an audio analyzer (FHC). Furthermore the filtered signals were fed into a 32-channel A/D data-acquisition system (DataWave). The system was set up to separate the action potentials of small groups of neurons from the background signal and to store the waveform of each action potential (0.7 ms before and 1 ms after the maximal amplitude) and its time stamps with a resolution of 100 s. Single units were extracted off-line from all waveforms recorded on the same electrode with a template-matching algorithm (e.g., Schmidt 1984) that extracted only those waveforms of the action potentials from a multiunit signal whose correlation with the template waveform of the action potential of a single unit was >0.9.

The first penetration was aimed at AI, the center of which was located around anterior 10 in stereotaxic coordinates. The electrode row was oriented in rostrocaudal direction, and electrodes entered the supratemporal plane almost at a right angle. Most recordings were made 200–600 μm after the first observation of neural discharges in the supratemporal plane and thus were presumably made from upper cortical layers. Consecutive recordings were made from adjacent locations by moving the electrode manipulator along the mediolateral axis in steps of 330 μm or along the rostrocaudal direction in steps of 2.3 mm. This approach yielded data from a region of ∼5∗5 mm2, which included parts of AI and parts of the auditory cortex caudomedially from AI.

Acoustic stimuli

A waveform generator (Tucker-Davis Technologies, TDT WG1) was used for the production of acoustic search stimuli like tone pips, frequency sweeps, and noise bursts. For quantitative measurements, acoustic stimuli were generated digitally in a computer (pentium-PC with an array processor AP2-card, TDT) at a sampling rate of 100 or 200 kHz and with a dynamic range of 88 dB and D/A converted to an analogue signal (TDT DA1). Aliasing was reduced with a low-pass filter with a cutoff frequency of 35 kHz (TDT FT5). The amplitude of all signals could be adjusted by a passive attenuator (TDT PA4). Signals then were passed through an equalizer (Conrad Electronics, SEA 4500) to compensate for the frequency characteristics of the sound chamber, amplified (Pioneer A202), and coupled to a free-field loudspeaker (Manger), which was located 1 m on the right frontal side of the animal’s head. The sound pressure level (SPL) was measured with a ¼- inch Bruel and Kjaer 1435 microphone mounted close to the location of the monkey’s ear and a spectrum analyzer (2033-R). The output of the sound delivering system was essentially flat (10 dB-A) in the frequency range of 0.2–20 kHz. At sound pressure levels <90 dB SPL, harmonic distortions were ≥36 dB below the signal level.

Spectral filter properties of neurons were assessed quantitatively by presenting pure tones at 40 frequencies, each of which was repeated 10 times. Frequencies were equidistantly spaced on a logarithmic scale over a range of 4–8 octaves. All tones had the same moderate intensity of 40–60 dB SPL (constant for each sequence), such that the harmonic distortions were well below the threshold of most neurons. Tones were presented in a pseudorandom order at a rate of 0.66/s. Tone duration was 100 ms, including 5-ms rise and fall time. Often measurements of spectral filter properties were repeated with tones of longer duration (300 or 500 ms). Peristimulus rastergrams were generated on-line from the multiunit activity. They provided estimates of the spectral filter properties of the units at the different electrodes and were used to select the frequencies for the following two-tone paradigm.

Two-tone interactions were assessed by presenting various pairs of tones with the same intensity and duration as in the previous single-tone paradigm. The frequency of the second tone remained constant, whereas the frequency of the first tone and the interval between the onsets of the first and the second tone (stimulus onset asynchrony, SOA) were varied systematically in a pseudorandom order. The frequency of the second tone was selected to produce a response in most of the neurons recorded simultaneously. The frequency range of the first tone was chosen such that most tones produced responses and only a few tones fell outside the receptive field boundaries. A schematic of the stimulus paradigm is included in Figs. 3-5. We used six different SOAs, which were equally spaced on a logarithmic scale between 110 and 600 ms, and up to seven frequencies for the first tone that were equally spaced on a logarithmic scale over a range of two to eight octaves. In addition to the 42 tone pairs, the second tone also was presented alone without a first tone. Each tone pair was repeated 10 times. The interpair interval was 1.5 s.

Data analysis

FREQUENCY RESPONSE AREAS.

The spectral filter properties of neurons were assessed quantitatively off-line by calculating peristimulus time histograms (PSTH) with a bin size of 1 ms from the discharges recorded during the presentation of the single-tone paradigm. The 40 PSTHs then were combined into a two-dimensional plot in which PSTHs were arranged in an orderly fashion with regard to the frequency of the stimulus (Fig.1). Because of the small number of stimulus repetitions (viz. 10), estimates of the time courses of the responses were moderate. Therefore estimates were improved by weighting the response to a given tone with the responses to tones of neighboring frequencies and by low-pass filtering the time course. Specifically, the set of PSTHs was convolved with a two-dimensional Gaussian filter that had a standard deviation of 0.1 octaves in the frequency dimension and 5 ms in the temporal dimension, i.e., its amplitude decayed from 1 to 0.1 within 0.21 octaves and 10.3 ms, respectively. After the convolution, periods with a tone-evoked response were determined by identifying the bins, in which the number of discharges was significantly greater than the spontaneous discharge rate. The average spontaneous discharge rate and its variability were determined from the last 500 ms of the filtered PSTHs. From the distribution of the spontaneous discharge rate, a critical rate was derived such that <0.1% of all bins had a higher rate. Subsequently all bins in the initial 1,000 ms of the filtered PSTHs were marked, in which the discharge rate exceeded the critical rate. These bins were considered to indicate excitatory responses. The shape of this frequency response area (FRA) was widely invariant to selection of filter parameters.

Fig. 1.

Frequency response area of a multiunit group and peristimulus rastergram of its discharges recorded during the presentation of different tone bursts. Tones of 40 different frequencies were presented, each 1 was repeated 10 times, and tones were present during the initial 100 ms of a presentation cycle (tone offset indicated by the thin vertical line). Gray shading indicates periods during which the multiunit group responded to acoustic stimulation, i.e., during which the spike rate was above the undriven spike rate (seemethods for further details). Note that the neurons did not only respond during stimulation but also thereafter. Open rectangles mark periods of the response that were found to modulate or to be modulating in a 2-tone paradigm (compare Fig. 3). Rectangle closest to the ordinate indicates the period during which the neural response to a 13-kHz tone was enhanced by preceeding tones. Other four rectangles, in turn, mark parts of the response to preceeding tones that overlapped temporally with the period of the enhanced response to the subsequent 13-kHz tone. Asterisk indicates the best frequency (BF). Black bars on the axes indicate the frequency and temporal range of the neural response.

Several parameters were extracted from the FRA to characterize the spectral filter properties and the time course of the response of neurons. The best frequency (BF) was the frequency that evoked the highest number of discharges in any bin of the FRA. The lower (upper) frequency boundary was the lowest (highest) frequency at which a significantly elevated number of discharges was observed. Theresponse onset (offset) was the first (last) time bin after tone onset with a significantly elevated number of discharges. In addition, comparison of FRAs measured with tones of 100- and 300-ms duration revealed in many cases if a response component was time-locked to the onset of a tone (onset response) or to its offset (offset response).

TWO-TONE INTERACTIONS.

Response enhancement in the two-tone paradigm was identified by testing if the measured response (actual response) to a tone pair was significantly stronger than the sum of the responses to the individual tones of the pair (expected response). The expected response, thus simulates what we should expect to obtain from a stimulus sequence, if only simple linear summation of the two single-stimulus responses occurs.

For each of the 43 stimuli (6 SOAs ∗ 7 frequencies + 1 single tone), a PSTH with a bin size of 10 ms was computed from the discharges recorded during the two-tone paradigm. The responses to isolated tones, needed for the calculation of the expected response, were also obtained from the responses recorded during the two-tone paradigm. The response to the second tone was obtained from the response to that tone when it was not preceded by a first tone, whereas the response to the first tone was obtained from the initial part of the PSTH of the longest SOA up to the onset of the second tone. Because the maximal SOA of tone pairs was 400 ms (600 ms in a few cases), the PSTHs for the first tone had a length of 400 ms (600 ms). Thus very late parts of the response to the first tone could not be considered for the expected response. To get expected PSTHs and actual PSTHs with the same length, the length of the PSTHs for the first tones was increased beyond 400 (600) ms and the new bins were set to value of the average spontaneous discharge rate.

As illustrated in Fig. 2, the expected response was calculated by shifting the PSTH for the first tone, backward in time, by an amount equal to the SOA of the tone pair for which the actual response had been measured. Then this shifted PSTH was added to the PSTH for the second tone. For a SOA of 110 ms, for example, the number of discharges in the 1st bin of the PSTH for the second tone was added to the 12th bin of the PSTH for the first tone to yield the 1st bin of the expected PSTH. Likewise the 2nd bin of the PSTH for the second tone was added to the 13th bin of the PSTH for the first tone to yield the 2nd bin of the expected PSTH, etc.

Fig. 2.

Schematic of the method used for the determination of 2-tone interactions. Expected response to the sequence of a 1st and a 2nd tone (D, ░) was calculated by temporally shifting the response to the 1st tone alone by an amount equal to the onset interval between the 2 tones (compare response profile inA with the shifted response profile inC), and adding this shifted response to the response to the 2nd tone (see - - - in B and C). Two-tone interactions are indicated by those bins in which the expected response differed significantly from the actually observed response (D). Note that the expected response in Ddoes not include late parts of the response to the 2nd tone. This was because corresponding parts of the 1st tone were not known because this response was obtained from the initial parts of the peristimulus time histogram (PSTH) of the 2-tone condition and thus ended at the commencement of the 2nd tone.

Next, the expected PSTH was subtracted from the actually measured PSTH for each tone pair. In a last step, this difference PSTH (Fig. 6) was low-pass filtered by adding half of the value of the two neighboring bins to the value of the actual bin to reduce noise due the limited number of stimulus repetitions.

Response enhancement in the two-tone paradigm then was identified with two criteria. First, all bins in the difference PSTH were identified, in which the actual discharge rate was significantly greater than the expected discharge rate. Like for the definition of FRAs, the statistical significance of the difference was determined by analyzing the variability of the difference PSTH when neurons were not driven by any stimulus. The variability of the undriven difference of discharge rates was estimated by computing the distribution of the values during the last 300 ms of all difference PSTHs. From this distribution, a first critical value was derived such that <5% of all bins had a greater difference. As the significance level was quite moderate, this magnitude criterion produced a number of singular entries in the difference PSTH indicative of response enhancement. To exclude these spurious entries, a second, length, criterion was applied that considered the length of sequences of bins passing the magnitude criterion with no significant bin in between. For this, a distribution of the length of sequences was computed revealing another critical value which indicated that <5% of all sequences of labeled bins were longer. Next, all bins of the initial 1,000 ms of the difference PSTH were marked that passed the magnitude and the sequence length criterion. In the following, these bins were considered to indicate periods of enhanced response to the second tone.

To determine response attenuation rather than response enhancement, a somewhat different method was used (see also Brosch and Schreiner 1997; Calford and Semple 1995): a response to the second tone was considered to be attenuated in the two-tone condition if it was significantly weaker than the response observed in the single-tone condition. Thus the expected response did not include the response to the first tone. All other procedures were those used for response enhancement. The reasons for different definitions for the two types of response modulation are discussed later.

All off-line data analyses were carried out with MATLAB 4.2 and SPSS 8.0 software packages.

RESULTS

Two-tone interactions were studied in 130 multiunit groups and 69 single units from three M. fascicularis. Response enhancement was observed in 87 of the multiunit groups (67%) and in 29 of the single units (43%). There were no obvious differences regarding the characteristics of response enhancement between neurons recorded in AI and in the caudomedial auditory belt.

Examples of response enhancement

In some neurons, it could be seen readily that the response to a tone was enhanced when this tone was preceded by another one. An example is given in Fig. 3, which shows peristimulus rastergrams of the neural discharges recorded during the presentation of several tone pairs and a single tone. Inspection of the spike density in the single-tone condition (Fig. 3, top) revealed that this multiunit group responded only weakly to a single 13-kHz tone. This response, however, could be modified greatly when that tone was preceded by another tone, as becomes obvious by comparing the spike density during the presentation of the 13-kHz tone (light gray column) in the single-tone condition (top) with the corresponding spike densities in various two-tone conditions (bottom sections). The spike density within a 100-ms period, starting with the onset of the second tone, clearly was increased after the presentation of first tones with frequencies between 5.2 and 20.6 kHz and for SOAs <252 ms. For other first tones, the spike density in that window was unchanged, indicating that response enhancement was frequency and SOA dependent. The tone pairs were differentially effective in enhancing the response to the second tone. The response was strongest when SOA was small (120 ms) and when the frequency of the second tone was different from the frequency of the first tone (8.2 kHz). Enhanced responses were seen after the presentation of first tones, which themselves either excited the neurons (13–20.6 kHz) or did not (5.2–8.2 kHz). For some tone pairs, the response to the second tone was attenuated rather than enhanced. Response attenuation was indicated by a decreased spike density in the two-tone condition in comparison with the single-tone condition. In this example, attenuation was obvious during the initial part of the presentation of the second 13-kHz tone when this tone was preceded by another 13-kHz tone at a SOA of ≤177 ms.

Fig. 3.

Response enhancement in a multiunit group for various tone pairs. Thin horizontal lines divide the peristimulus rastergram into 7 sections.Top: discharges registered during 10 presentations of a single 13-kHz test tone. Bottom 6 sections: discharges observed during the presentation of various tone pairs. In each of these sections, the frequency of the 1st tone was constant (frequency indicated on the ordinate; presence indicated by dark gray shading) and the temporal interval between the 2 tones was varied between 110 and 400 ms. Second tone was the same in all pairs, namely a 13-kHz tone (light gray shading). All tone pairs are aligned graphically with respect to the 2nd tone. Thus periods with a stronger response can be identified by comparing the spike density during the presentation of the 2nd (“test”) tone in the different 2-tone conditions with the spike density in the single-tone condition.

In recordings from single units, similar two-tone interactions could be observed, though interactions were encountered less often than in multiunit groups and were not always as obvious as in the single unit displayed in Fig. 4. Like in the previous example, comparison of the spike density during the presentation of the second (8-kHz) tone demonstrated that the response to that tone was strongly enhanced when tones between 7.5 and 18.9 kHz were presented ≤400 ms before the second tone. In the single-tone condition, this neuron started responding after the offset of the 8-kHz tone (Fig. 4,top). Variation of tone duration revealed that the response was evoked by the offset of the tone and was not a late response to tone onset (results not shown). The dominance of the offset response, however, was observed only in the single-tone condition. In the two-tone condition, an additional onset response emerged which was as strong as the offset response.

Fig. 4.

Response enhancement in a single neuron. Same conventions as in Fig.3.

Two-tone interactions resulting in enhanced responses could not always be identified as easily as in the first two examples. In the multiunit group displayed in Fig. 5, several tone pairs were found in which the spike density was increased after the initial transient response to the second tone. For some of the tone pairs, however, it was not clear if the increased spike density during the presence of the second tone was due to genuine two-tone interactions or if it was a mere reflection of the persisting response to the first tone. In this example, the response to the first tone with a frequency of 7.5 or 11.9 kHz lasted for several 100 ms and thus could temporally overlap with the response to the second tone.

Fig. 5.

Response enhancement in another multiunit group. Same conventions as in Fig. 3.

To distinguish true response enhancement from overlap with the persisting response to the previous tone, the actual response to a given tone pair was compared with the expected response, i.e., with the response when the responses to the two tones simply superimposed (seemethods). For the multiunit group displayed in Fig. 5, resulting difference-PSTHs revealed that the actual response to the second tone was greater than the expected response in only 7 of the 10 tone pairs shown in Fig. 6(A–G). Thus the increased spike density during the presentation of the second tone was generated by true two-tone interactions in those pairs in which the first frequency was at 11.9 kHz. This was in contrast to tone pairs in which the first frequency was at 7.5 kHz. For the latter pairs, increased spike densities were not caused by two-tone interactions (except for SOAs ≥177 ms). For all tone pairs, there were also periods during which the actual response was significantly smaller than the expected response. During these periods, either the response to the second tone was suppressed by the preceding tone (Brosch and Schreiner 1997) or late parts of the response to the first tone were suppressed by the second tone (Brosch et al. 1998).

Fig. 6.

Quantitative determination of response enhancement for 10 tone pairs of the multiunit group displayed in Fig. 5. A–E: results for tone pairs in which the frequency of the 1st tone was 11.9 kHz and the frequency of the 2nd tone was 8 kHz. F–J: corresponding results for tone pairs in which the frequency of the 1st tone was 7.5 kHz. In each panel, the difference PSTH with a bin size of 10 ms for various stimulus onset asynchrony (SOA; ordinate) of the tone pair is plotted. Difference PSTHs were obtained by subtracting the expected PSTH from the actual PSTH (cf. methods). Origin of each PSTH corresponds to the onset of the 2nd tone that was present during the initial 100 ms of the PSTH. Red bars signify bins in which the actual response was significantly greater than the expected response. Blue bars identify bins in which the actual response was significantly smaller than the expected response. These bins were indicative of forward and backward response suppression. Black bars indicate bins in which the difference was not significant. Maximal PSTH difference was 28 spikes/bin (scale bar). Note that the last blue bin in each PSTH appears at a position that is equal to the SOA of the tone pair under consideration. This was because the response to the 1st tone was calculated from the 2-tone paradigm and thus ended with the onset of the 2nd tone, resulting in a length of the PSTH for the 1st tone of 400 ms. Therefore the expected response to the second tone had a duration of 400 ms minus the SOA of a given tone pair. For the shortest SOA, namely 110 ms, this period was 290 ms. For longer SOA, the period was shorter, and for the longest SOA, 400 ms, no expected PSTH could be calculated.

Spectrotemporal response enhancement areas

For a quantitative description of the stimulus conditions inducing response enhancement, spectrotemporal response enhancement areas (STREA) were constructed for all neurons. For each neuron, a STREA was generated by first determining the maximum period of all difference PSTHs in which the actual response was significantly greater than the expected response. Then the average difference during this interval was calculated for all available difference PSTHs. Finally, the average differences were plotted in a two-dimensional box (Fig.7) with the axes representing SOA and the frequency of the first tone and the gray scale showing the amount of enhancement. A STREA, thus provides a comprehensive overview of the frequency and SOA dependence of response enhancement of a given neuron. Representative STREAs, including those of the neurons depicted in Figs.3-5, are displayed in Fig. 7. STREAs were used further to extract several parameters characterizing the selectivity of neurons for tone sequences.

Fig. 7.

Spectrotemporal dependence of response enhancement and response attenuation in 4 multiunit groups (A, B, D, andF) and in 2 single units (C andE). Response enhancement and response attenuation are shown in the top and bottom. Each box displays the magnitude of response modulation that was observed during the presentation of a test tone (frequency indicated inside single thin squares) that occurred after presentation of another tone (frequency indicated on ordinate) at different intervals between the onsets of the 2 tones (SOA, indicated on abscissa). Magnitude of response modulation for each frequency/SOA combination is proportional to the shading in the corresponding cell of each box. Strongest response modulation is shown in black, no response modulation in white. Latency range of significant difference as well as difference between actual and expected response in this interval (in spikes/bin) are indicated above each box. Note that the latency range for response enhancement and response attenuation was pooled from results obtained with different tone pairs and that in some cases different tone pairs modulated different periods of the response. Therefore the latency range, in which either type of 2-tone interaction was detected, partially overlapped in some cases (B, E, andF). B–D: spectrotemporal response enhancement area (STREA) and spectrotemporal response attenuation area (STRAA) of the examples displayed in Figs. 3-5, respectively. Note inD, that the 2 pixel representing the tone pairs 11.9/8 and 7.5/8 kHz, which were presented with a SOA of 177 ms, are white. This was because the actual average response during the period 40–90 ms after onset of the 2nd tone was not greater than the expected response (compare Fig. 6, B and G).

Stimulus conditions inducing response enhancement

For all neurons exhibiting response enhancement, both the temporal separation between the first and second tone and the frequency of the first tone affected the response to the second tone (Figs.8 and 9). Although only a small number of frequencies with a wide spacing was tested, data uncover some characteristics of the frequency dependence of response enhancement. Three types of STREA could be distinguished based on the frequency range that produced response enhancement. In the first type, response enhancement was induced only after the presentation of tones within a single frequency band (Fig. 7,A–C). This type was observed in 59% of the multiunit groups and in 60% of the single units. The second type of STREA had two separate frequency bands that induced enhancement (Fig. 7,D and E). This type was found in 31% of the multiunit groups but also in 28% of the single units, suggesting that this type of STREA in the multiunit group was not an artifact of the superposition of different single frequency band STREAs of the individual neurons included in a multiunit recording. The third type of STREA was irregularly shaped (Fig. 7 F) and comprised 10% of the multiunit groups and 12% of the single units. The frequency range, which produced response enhancement, varied between 0.4 and 32 kHz (Fig. 10, A, B, andD). Median bandwidth was 1.9 octaves for neurons with a single frequency region in their STREA (n = 44) and 2.7 octaves for neurons with two separate frequency regions (n = 25). To determine the best enhancing frequency (BEF), the cell in each STREA was obtained for which response enhancement was maximal. BEF was found to vary between 0.5 and 28.3 kHz (median 7 and 7.5 kHz in the multiunit and single-unit sample, respectively).

Fig. 8.

Spectral characteristics of tone pairs inducing response enhancement.A: frequency of the 2nd tone was significantly correlated (r = 0.37) with the lowest frequency of the 1st tone, which produced response enhancement (lower STREA border). Gray line indicates the linear regression of the 2nd frequency on the lower STREA border. Black diagonal line marks cases for which the 2 frequencies were equal. Area of each dot is proportional to the number of cases. Smallest dots represent one case and largest dot represents 7 cases. B: frequency of the 2nd tone was significantly correlated (r = 0.30) with the upper STREA border.C: distribution of the preferred frequency interval. Negative values denote tone pairs in which the 1st tone was higher than the second tone (“downward”). Positive values indicate tone pairs with an “upward” frequency progression. D: distribution of the probability of a specific frequency pair to produce maximal response enhancement. This distribution was generated by dividing the total set of tested pairs into 15 intervals of different width but with the same number of samples. Number of observed preferred frequency intervals in each interval was divided by the number of tested pairs in this interval to give the probability.

Fig. 9.

Temporal characteristics of tone pairs inducing response enhancement.A: distribution of the shortest SOA producing response enhancement. B: distribution of the longest SOA producing response enhancement. C: distribution of preferred SOA.

Fig. 10.

Relation between tones evoking a response and tones inducing an enhanced response to a consecutive tone. Parameters of tone-evoked responses were obtained from FRAs and parameters of response enhancement were obtained from STREAs. A: lowest frequency in frequency response area (FRA) (lower FRA border) was correlated with lowest frequency in STREA (lower STREA border;r = 0.52). B: highest frequency in FRA was correlated with highest frequency in STREA (r = 0.40). C: BF varied significantly with best enhancing frequency (BEF; r= 0.28). D: there was no significant correlation between FRA bandwidth and STREA bandwidth, neither for neurons with a single frequency band in the STREA ( · ) nor for neurons with a STREA with 2 separated frequency bands (▴). Bandwidth was calculated as the difference in octaves between the highest and the lowest frequency inducing response enhancement. E: BEF was mostly greater than the lowest frequency in the FRA.F: BEF was mostly smaller than the highest frequency in the FRA. Same conventions as in Fig. 8. Largest dot in Brepresents 10 cases.

To characterize which frequency pairs produce response enhancement, the spectral relation between the first and second tone of a pair was considered. This analysis showed that in most neurons (69%), the frequency of the second tone was within the frequency range of the first tone that produced response enhancement (Fig. 8, A andB). For further characterization, the difference in octaves between the BEF and the frequency of the second tone was calculated. Inspection of the distribution of the preferred frequency interval revealed several insights into the conditions that produced maximal response enhancement (Fig. 8 C). First, the vast majority of neurons preferred tone pairs in which the frequency of the first tone was different from the frequency of the second tone: 93% of the multiunit groups and 88% of the single units exhibited strongest response enhancement when the two tones differed by >0.25 octaves. Second, the preferred frequency separation varied over a wide range of ≤3.7 octaves. Third, there were about as many multiunit groups and single units (48%) that preferred tone pairs with downward frequency progressions as pairs with upward progressions. As the distribution of the preferred frequency interval was bimodal and almost symmetric to zero, the absolute value of the preferred frequency difference was calculated for all neurons. This analysis revealed that the median frequency difference between the first and the second tone of a pair was 1.1 octaves, both in the multiunit and single unit sample. These results were confirmed when we accounted for the unequal distribution of tested frequency pairs (Fig. 8 D).

In the temporal dimension, response enhancement was observed for SOAs between 110 and 329 ms, which was the range of SOAs for which our method allowed the determination of response enhancement (Fig. 9,A and B). Among the neurons with enhanced responses, the shortest SOA, namely 110 ms, produced response enhancement in 85% of the multiunit groups and in 92% of the single units. With increasing SOA, this percentage declined gradually to about half of the multiunit groups and to 36% of the single units for SOAs of 252 or 329 ms. Even though SOAs >252 ms (329 ms in a few cases) could not be analyzed with our method, there was some indication that response enhancement also could occur at SOAs of ≥400 ms. An example for this can be found in Fig. 4 in which the spike density was increased in response to the second tone 400 ms after presentation of a 18.9-kHz tone. In a given neuron, all tone pairs between the minimal and maximal enhancing SOA were effective in producing response enhancement, though the amount of response enhancement varied with SOA. To determine the preferred SOA, the cell in each STREA was determined for which maximal response enhancement was observed. Preferred SOAs were found within the entire range of SOAs tested, albeit most neurons exhibited maximal response enhancement for a SOA of 120 ms (Fig. 9 C).

Relation between spectrotemporal response enhancement areas and frequency response areas

The frequency range that evoked a neural response in the single-tone condition was related to the frequency range that induced response enhancement in the two-tone condition (Fig. 10). Comparison of corresponding parameters in FRAs and STREAs revealed moderate correlations between lower FRA border and lower STREA border (Fig.10 A; r = 0.52), between upper FRA border and upper STREA border (B; r = 0.40), and between BF and BEF (C; r = 0.28). Interestingly, the frequency that produced the strongest response to a single tone was generally not the one which produced strongest response enhancement. Rather, BF and BEF could differ by up to 4.5 octaves, with a median absolute difference between BF and BEF of 1.2 octaves. Although the correlation between BF and BEF was not strong, the BEF was within the FRA in 69% of the neurons (E and F).

Aside from the frequency of the first tone in a pair, response enhancement also depended on the position of the second tone within the FRA. Although the second tone was not systematically varied in the present study, population results suggested that the likelihood of observing response enhancement was the higher the closer the second tone was to the neuron’s BF: The difference between the frequency of the second tone and the BF was significantly smaller (t-test, P < 0.001) for neurons exhibiting response enhancement [0.6 ± 0.6 (SD) octaves; n= 80] than for neurons with no response enhancement (1.1 ± 1.1 octaves; n = 35).

Other nonspectral characteristics of FRAs were analyzed with regard to their relation to response enhancement. These included response onset, response offset, and response type, i.e., whether parts of the response were temporally linked to tone onset or tone offset. Among them, only response offset varied significantly with response enhancement (t-test, t = −4.49, P < 0.0001), that is, neurons with a late-ending response exhibited response enhancement more often than neurons whose response ended early. This was indicated by the finding that, on average, the response ended later in the group of neurons with response enhancement than in the group of neurons without response enhancement [537 ± 204 ms (n = 84) vs. 356 ± 220 ms (n = 41)].

What parts of the response are enhanced?

The first tone in a pair enhanced specific parts of the response to the second tone. The period with an enhanced response began 10–160 ms after onset of the second tone, with medians of 30 and 40 ms in the multi- and single-unit sample, respectively (Fig.11 A). In most neurons, the period during which the response was enhanced began later than the response to a tone (median difference 14 ms). Thus the initial part of the response was mostly unaffected by the preceding tone. The period of enhanced response lasted for 10–150 ms (medians of 50 and 30 ms in the multi- and single-unit sample, respectively) and ended 30–200 after onset of the second tone (median 100 and 90 ms, respectively). Response enhancement at latencies beyond 290 ms (490 ms in a few cases) could not be detected with our method. However, there was an indication in some neurons that enhanced responses could also be present ≤700 ms after tone onset. The period with an enhanced response always ended before or with the offset of the response to single tones (Fig.11 B).

Fig. 11.

Relation between timing of the response to single tones and the timing of the enhanced response in the 2-tone condition. A: latency to onset of the response to single tones (response onset) varied significantly with the onset of the period with an enhanced response in the 2-tone condition (onset of enhancement;r = 0.38). Note that, in most neurons, response onset preceeded onset of enhancement (median 14 ms) and that the 2 parameters were obtained with different temporal resolutions (bin size was 1 and 10 ms on the abscissa and ordinate, respectively). Same conventions as in Fig. 8. Largest dot represents 4 cases.B: response offset varied significantly with offset of enhancement (r = 0.32). Note that no response enhancement was observed after response offset.

An enhanced response observed in the two-tone condition generally occurred in periods during which the response to an isolated second tone was low or moderate. This was evident by comparing the bin of the maximally enhanced response with the bin of the maximal response (Fig.12). This analysis showed that, in half of the neurons, the enhanced response was ≤60% of the maximal response. Half of these neurons did not even respond during the period during which an enhanced response was observed in the two-tone condition.

Fig. 12.

In many neurons, response enhancement was observed during a period when the response was weak in the single-tone condition. Plot shows the cumulative distribution of the ratio of the maximal response observed in the period with enhanced responses, and the maximal response observed in the entire period of the response to the 2nd tone. If the discharge rate in the enhancement period was not above the undriven discharge rate, relative response magnitude of a neuron was set to 0.

To estimate the magnitude of the enhancing influence of the first tone on the response to the second tone, we calculated the difference between the number of actually recorded discharges and the number of expected discharges during the enhancement period. The resulting distribution is shown in Fig. 13.

Fig. 13.

Distribution of the magnitude of response enhancement of the 130 multiunits. Gray column signifies the multiunits in which no enhanced responses were detected.

Difference between response enhancement and response attenuation

The present two-tone experiments did not only reveal enhancement but also attenuation of the response to the second tone. Response attenuation was observed in 72 of the 130 multiunit groups that were investigated and in 55 of the subset of 87 multiunit groups that exhibited response enhancement. The two antagonistic types of two-tone interaction depended on the spectrotemporal composition of tone pairs in an almost complementary fashion and occurred at different latencies of the response to the second tone.

For a quantitative comparison of the two-tone conditions producing response enhancement and response attenuation, spectrotemporal response attenuation areas (STRAA) were constructed in an analogous fashion to STREAs. Examples of STRAAs are included in Fig. 7. For each recording site, overlap between STREA and STRAA was calculated by dividing the number of frequency/SOA combinations of tone pairs, which produced response enhancement and response attenuation, by the number of frequency/SOA combinations that produced response enhancement or response attenuation. This analysis revealed that there was no overlap between the STREA and STRAA in more than half of the neurons (29/55; e.g., Fig. 7, A andE) and there was no case with a complete overlap of STREA and STRAA (Fig. 14 A). The little overlap was mostly due to the fact that response enhancement and response attenuation were induced from different frequency ranges rather than from different SOA ranges. This was found by cross-correlating the STREA of each neuron with the corresponding STRAA and obtaining the shift in the frequency and in the SOA direction. The shift in the SOA direction was uniformly distributed (results not shown), whereas the shift in the frequency direction was bimodal and symmetric around the origin, and there were only a few neurons that had a zero frequency shift (Fig. 14 B). This indicates that the tones evoking response enhancement were mostly different from the tones evoking response attenuation.

Fig. 14.

Comparison between response enhancement and response attenuation. All parameters were extracted from STREAs and STRAAs. A: distribution of the overlap between STREA and STRAA. Note that there was no overlap in more than half of the neurons. B: distribution of the difference between the frequency range inducing response enhancement and the frequency range inducing response attenuation. Negative values denote neurons, in which the attenuating frequency range was above the enhancing frequency range. Positive values indicate the opposite relation. Note that only a few neurons had the same frequency range for either type of response modulation.

When the time course of the single-tone response was compared with the response in the two-tone condition, it was noticed that response attenuation was generally effective from the very beginning of the response to the second tone, whereas, as stated earlier, periods with enhanced responses began mostly after the initial response to the second tone (Fig. 15). Comparison between STREAs and STRAAs revealed that in 66% of the multiunit groups, response attenuation affected earlier parts of the response than did response enhancement, and that the median difference between onsets of response attenuation and response enhancement was 10 ms.

Fig. 15.

Relation between the onset of the period with an attenuated response and the onset of the period with an enhanced response. Largest dot represents 10 cases. Note that onset of response attenuation was mostly before onset of response enhancement. Logarithmic scaling was chosen for graphic reasons.

DISCUSSION

In ∼2/3 of the neurons recorded in auditory cortex, the response to a tone was enhanced when this tone was preceded by another one. Response enhancement could be obtained with a wide range of frequencies of preceding tones and a considerable range of SOAs between the first and second tone in a pair. On average, the stimulus that produced maximal response enhancement was a tone pair in which the frequencies of the tones were ∼1 octave apart and the SOA of which was ∼120 ms.

Similar results for single- and multiunit activity

The present study revealed that enhanced neuronal responses could be observed both in recordings from single neurons and from small groups of neurons. The stimulus conditions evoking response enhancement in the single-unit sample were mostly similar to those found in the multiunit sample. The main difference was that fewer single units than multiunit groups exhibited response enhancement and that for single units fewer tone pairs were effective. Several reasons could account for the differences. First, spike rates were lower in recordings from single neurons than from multiunit groups. Therefore the statistical power of our methods to detect response enhancement was lower for the single neuron data. Second, the discharges of the single neurons superimposed in a multiunit recording. If the single neurons in a multiunit recording had partially overlapping STREAs, characteristics of the neurons superimposed as well, resulting in a greater SOA range and in a greater frequency range of the first tone that produced response enhancement in a multiunit group compared with single units. Third, it is possible that single neurons and multiunit groups reflect activity from different cell classes: single-unit recordings were virtually exclusively from pyramidal cells, which generate the largest action potentials, whereas multiunit recordings also could have included the activity of other cell types.

Definitions of response modulation

Our methods provide a conservative estimate for response enhancement. We considered the response to the second tone of a tone sequence to be enhanced if the actual discharge rate was significantly higher than the linear superposition of the discharge rates measured during the presentation of two single tones. The reason for including the response to the first tone was that the neural response to 100-ms duration tones often lasted considerably longer than the stimulus itself. In our sample, about half of the neurons still exhibited an elevated discharge rate 400 ms after the cessation of a tone (unpublished observation). Thus for tone pairs with SOAs of a few hundred milliseconds, elevated discharge rates observed during the presence of the second tone could be simply due to a late persisting response to the first tone. Therefore a method was tailored that took into account only genuine two-tone interactions. Methods similar to the one used here have been applied in only a few previous studies reporting response enhancement in neurons (Esser et al. 1997; Ganz and Felder 1984) and in acoustically evoked potentials in humans (Budd and Michie 1994;Loveless et al. 1989). Most previous studies did not account for the potential contamination of presumably enhanced responses with the persisting response to previous stimulation (Brosch and Schreiner 1997; McKenna et al. 1989; Nelson 1991a; Riquimaroux 1994; Sutter et al. 1996). Hence results of the latter studies are comparable with present results only for neurons without persisting responses to the preceding tone.

Aside from response enhancement, our method also revealed periods during which the actual response was smaller than the expected response (e.g., Fig. 6). The latter observation indicated either forward suppression of the response to the second tone by the first tone (e.g.,Brosch and Schreiner 1997) or backward suppression of the late response to the first tone by the second tone (Brosch et al. 1998). Because one aim of the present study was to compare the stimulus conditions in which the response to a second tone was either enhanced or attenuated, a different method had to be employed for response attenuation, which revealed as little backward suppression as possible. This was achieved by comparing the response to the second tone in the two-tone condition only with the response to the same tone in the single-tone condition, without accounting for the persisting response to the first tone. As the expected response for response attenuation was the response only to the second tone and not the sum of the response to the first and second tone, this method detected response attenuation in fewer neurons for fewer tone pairs than would have been detected had response attenuation been assessed in relation to the sum of responses.

Forward or backward response enhancement?

For tone sequences, the response to a given tone can be suppressed by a preceding and by a succeeding tone. We have argued earlier (Brosch et al. 1998) that backward effects can be distinguished qualitatively from forward effects: backward influences are likely if periods with a decreased response are time-locked to the onset of the first tone when tone pairs with different SOA are compared. Likewise, forward influences are likely if periods with response modulations are time-locked to the onset of the second tone. An example for such a distinction between forward and backward influences can be seen in the difference PSTHs shown in Fig. 6,right. For the tone pair with the shortest SOA (J), two temporally separated periods can be distinguished during which response attenuation was strong: The first one occurred ∼30 ms and the other one ∼130 ms after the onset of the second tone. With increasing SOA (F–I), the first attenuation period remained approximately at the same temporal location, whereas the second attenuation period approached the origin of the PSTH, proportionally to the SOA. Thus the first attenuation period was time-locked to the onset of the second tone and indicated forward suppression of the response to the second tone by the first tone. Correspondingly, the second attenuation period was time-locked to the first tone and indicated backward suppression of the response to the first tone by the second tone.

The same reasoning applied to response enhancement revealed that periods with an enhanced response occurred always time-locked to the second tone, irrespective of the SOA (e.g., Fig. 6, A–E). Thus it appears that enhancing two-tone interactions always were caused by forward mechanisms. Evidence for forward response enhancement also can be obtained from other studies (Budd and Michie 1994; Ganz and Felder 1984; Loveless et al. 1989), although the issue of the direction of interaction has not been explicitly addressed there.

Prevalence of response enhancement

The present study confirms previous findings from the auditory cortices of different species showing that the neural response to a tone can be enhanced by a preceding tone (Brosch and Schreiner 1997; Esser et al. 1997; McKenna et al. 1989; Riquimaroux 1994; Suga et al. 1978,1983; Sutter et al. 1996). In the present study, ∼2/3 of the neurons in auditory cortex exhibited response enhancement. This number was unexpectedly high given that a quite conservative criterion for response enhancement was applied, that only a few studies previously have described this phenomenon at all, and that the only other quantitative study found enhanced responses in only 21% of the neurons in auditory cortex (Esser et al. 1997). It is possible that the actual proportion of neurons with enhancing two-tone interactions is even higher because only a rather limited set of acoustic stimuli has been tested in these studies.

Further evidence that many cortical neurons exhibit response enhancement is provided by findings that auditory evoked potentials can be enhanced by a preceding tone (Budd and Michie 1994;Loveless and Hari 1993; Loveless et al. 1989; Metherate and Ashe 1994). As negative deflections in slow wave field potentials chiefly reflect the summed excitatory postsynaptic potentials (EPSPs) in the vicinity of the electrode (e.g., Mitzdorf 1987) a considerable number of neurons in the supratemporal lobe must exhibit enhanced EPSPs. Additional indication for enhanced EPSPs comes from intracellular recordings from single cortical neurons in brain slice preparation that have demonstrated paired-pulse facilitation with electric stimulation (e.g., Metherate and Ashe 1994; Volgushev et al. 1997). Neurons in visual cortex also exhibit enhanced responses when stimulated with temporal sequences of light flashes (Ganz and Felder 1984; Nelson 1991a) and electric pulses (Volgushev et al. 1997), suggesting that response enhancement may be a general cortical mechanism involved in the processing of sequential stimuli.

Dependence of response enhancement on the temporal separation of tones

Enhanced responses could be observed for a considerable range of SOAs (Fig. 9). The majority of neurons exhibited response enhancement for a SOA of slightly more than 100 ms, which was the shortest SOA tested here. Studies of the human auditory evoked potential (Budd and Michie 1994; Loveless and Hari 1993; Loveless et al. 1989), paired-pulse facilitation with electric stimulation (Metherate and Ashe 1994), and preliminary results from neurons in cat AI (Sutter et al. 1996) suggest that tone pairs with SOAs <100 ms are also effective in inducing response enhancement. These studies further demonstrate that the magnitude of enhancement can vary nonmonotonically with SOA: Enhancement gradually increases from a minimal SOA up to some optimal SOA, at which enhancement is maximal, and then decreases again. The optimal SOA in the evoked potential studies was between 100 and 150 ms, overlapping the median preferred SOA of 120 ms observed in the present study. A similar correspondence also can been found with respect to the longest SOA which produces enhanced responses. The present study found response enhancement up to a SOA of 252 ms (329 ms in a few cases). As response enhancement still was present in almost half of the neurons at this SOA (which was the longest SOA tested in the current experiments), this finding suggests that enhanced responses also can be observed with tone pairs of greater SOAs. The upper SOA limit for response enhancement seems to be a few hundred milliseconds, which was the maximal SOA found to produce response enhancement in studies on the auditory evoked potential (Budd and Michie 1994; Loveless and Hari 1993; Loveless et al. 1989) and on paired-pulse facilitation (Metherate and Ashe 1994). Moreover, these relations have been observed with stimuli of different duration, suggesting that response enhancement is mainly dependent on the onset separation between two stimuli and to a lesser degree on the silent interval between stimuli. It is noteworthy that significantly shorter temporal intervals were found to produce enhanced responses in neurons in FM-FM areas of bat auditory cortex (Esser et al. 1997; Suga et al. 1978).

Dependence of response enhancement on the frequency of tones

Response enhancement occurred most often when the frequencies of the first and the second tone were different, and it was maximal, on average, when the frequency of the first tone was ∼1 octave above or below the second tone. Preferences for upward frequency progressions were encountered as often as for downward frequency progressions, whereas tone pairs with the same frequency produced little or no enhancement at all. This finding is in agreement with studies in echolocating bats in which temporal combination-sensitive neurons responded most strongly when the frequency content of the emitted pulse FM was different from the frequency content of the echo FM (Suga et al. 1978, 1983). Interestingly all preferred combinations were harmonically related.

Comparison between response attenuation and response enhancement

Many neurons in our study also exhibited response attenuation in addition to response enhancement. The characteristics of response attenuation in monkey auditory cortex were similar to those described in previous reports from cats (e.g., Brosch and Schreiner 1997; Calford and Semple 1995). Methodological reasons can account for the finding that the proportion of neurons exhibiting attenuated responses was lower than in previous studies. First, only SOAs >110 ms were tested. This SOA is rather long, because about half of the neurons in cat auditory cortex did no longer show attenuation at this SOA but only at shorter SOAs (Brosch and Schreiner 1997). Second, in some neurons no response attenuation could be observed because the second tone did not produce sufficient excitation that could have been subject to statistically significant attenuation (e.g., Fig. 4). Third, the response to the first tone sometimes overlapped with the response to the second tone and thus could have obscured a potentially attenuated response to the second tone (e.g., Fig. 5).

Comparison of response attenuation and response enhancement revealed substantial differences between the two types of response modulation (Figs. 14 and 15). Response attenuation was often strongest and of longest duration when the tones in a pair had the same frequency (Brosch and Schreiner 1997; present findings). Response enhancement, in contrast, was found most often when the two tones had different frequencies. This fact also was reflected in the little overlap of STREAs and STRAAs, and it argues against the view that enhanced responses result from a rebound of activity after a previous hyperpolarization (e.g., Grenier et al. 1998).

We observed that response enhancement and response attenuation depended on the SOA of tone pairs in a similar fashion. As discussed earlier, this finding may be valid only for tones with a duration of 100 ms because for tones of shorter duration, response enhancement varies nonmonotonically with SOA and is maximal at a SOA of ∼120 ms (Loveless et al. 1989; Metherate and Ashe 1994; Sutter et al. 1996). In contrast, response attenuation declines monotonically with increasing SOA (Brosch and Schreiner 1997; Calford and Semple 1995; present results).

Moreover, response enhancement and response attenuation modulate different parts of the response to the second tone. With appropriate stimulation, attenuation can be observed from the very beginning of the response to a tone, whereas enhancement emerges ∼10 ms later and is most often effective during periods in which neurons respond weakly or not at all to isolated tones. The previous discussion also suggests that attenuation can be acting both forward and backward in time, whereas enhancement is acting only forward.

Spectral and temporal filtering of cortical neurons

The results of the present study are consistent with and extend previous findings showing that the neural response to a tone depends strongly and specifically on the temporal stimulus context (Brosch and Schreiner 1997; Calford and Semple 1995; deCharms and Merzenich 1998). These findings thus demonstrate that neurons in auditory cortex have spectrotemporal receptive fields (Aertsen and Johannesma 1980). Whereas almost all earlier studies have reported that the context has an attenuating influence, the present study describes conditions in which an enhancing influence prevails.

Generally, knowledge of the spectral filter properties provide little information on which stimuli produce response enhancement. This was demonstrated by weak correlations between FRAs and STREAs and between BF and BEF (Fig. 10, A, B, and D).

In the present study, spectral filter properties were found to be unusually complex, both spectrally and temporally: neural responses were encountered frequently that lasted for a period of several hundred milliseconds and that consisted of several, distinct responses in different frequency ranges and in different latency ranges (e.g., Fig.1) (see also Brosch et al. 1998). The complex FRAs were probably not an artifact of the use of anesthetics because similarly complex responses commonly have been obtained in experiments on awake animals (see Pelleg-Toiba and Wollberg 1983 for the hitherto most systematic investigation). Rather, complex FRAs were probably observed in the present study because long intertone intervals were employed. Most other studies have used significantly shorter intertone intervals and have reported only simple and short responses to acoustic stimulation. Under these conditions, long-lasting responses may be completely abolished or simply neglected.

Neural mechanisms of response enhancement

Enhanced responses of cortical neurons could be of cortical origin or a mere reflection of the inputs to cortical neurons.

At subcortical stages of the auditory system, enhanced responses have been observed in the cochlear nerve compound action potential (Henry 1991) but not in inner hair cells of the cochlea (Henry and Price 1994). The compound action potential was enhanced maximally when tones had the same frequency and when the SOA was ∼35 ms. At higher stages of the auditory pathway, namely in the dorsal cochlear nucleus, enhanced responses (paired-pulse facilitation) also were found in brain slice preparations of guinea pigs (Manis 1989). Recordings of the human auditory brain stem response revealed a selective enhancement of wave V, thought to be generated in the superior olivary complex (e.g., Kraus and McGee 1992), when tone pairs of the same frequency and with a SOA <195 ms were presented (Ananthanarayan and Gerken 1983; see Bauer et al. 1980 for negative findings). Enhancement was maximal at a SOA of 75 ms. In summary, conditions evoking response enhancement in cortical neurons are largely different from those found in lower parts of the auditory pathway, suggesting that enhanced responses of cortical neurons have to be shaped, at least partially, by cortical mechanisms.

In cortex, basically two mechanisms can be distinguished to account for response enhancement, namely a facilitation mechanism and a disinhibition mechanism. The facilitation model proposes that late EPSPs evoked by the first stimulus superimpose with the EPSPs evoked by the second stimulus. If both EPSPs are subthreshold, the resulting depolarization could become suprathreshold and cause a neuron to fire more action potentials (e.g., Riquimaroux 1994). Enhanced responses also could be generated if individual EPSPs are suprathreshold but facilitate each other, resulting in a depolarization greater than the sum of the depolarizations induced by either EPSP. Experimental support for this mechanism comes from several experiments on paired-pulse facilitation which found that after a first electric shock persisting Ca2+ in the presynaptic terminal causes enhanced release in response to the second electric shock (e.g.,Vogulshev et al. 1997).

In an alternative view, response enhancement is generated by disinhibition. A first stimulus evokes a series of excitatory and inhibitory potentials with different time constants. Off them a late, long-lasting inhibitory potential acts presynaptically and causes a release from some of the inhibition that arrives on a neuron when a consecutive stimulus occurs within the duration of the presynaptic inhibition (Metherate and Ashe 1994; Mitzdorf 1987; Nelson 1991b). In vivo and in vitro experiments on paired-pulse facilitation in the auditory cortex of rats and in other parts of the neocortex (Metherate and Ashe 1994) have demonstrated that a single electric shock applied to the thalamus elicits, in cortex, a rapid monosynaptic EPSP followed by a rapid inhibitory postsynaptic potential (IPSP) and a late, long-lasting IPSP, which are mediated by glutamate, GABAA, and GABAB receptors, respectively. An additional, otherwise hidden, late EPSP can be detected both when the shock is preceded by another electrical shock at an interstimulus interval of 100–300 ms or when pharmacological manipulations are performed. The late EPSP has a latency-to-peak of 20–50 ms, lasts ∼240 ms, and isN-methyl-d-aspartate (NMDA)-receptor-mediated. Pharmacological manipulations revealed that the facilitated response results from activation of presynaptic GABABreceptors; this depresses GABAergic IPSPs that normally act to suppress the late EPSP.

The present results are more compatible with a disinhibition mechanism generating response enhancement. This is suggested by the similarity between the temporal stimulus constraints for the induction of disinhibition (Metherate and Ashe 1994) and response enhancement (Fig. 9). Furthermore the temporal range in which responses were enhanced is in accordance with the range for paired-pulse facilitation (Metherate and Ashe 1994). In our material, the first tone mostly affected a specific part of the neural response to a consecutive tone, namely the period during which a neuron responded weakly or not at all to single tones (Fig. 12). In most neurons, this was ∼30 ms after the onset of the second tone and, thus after the initial part of the response to a single tone (Fig. 11). Enhancement persisted for an average duration of 60 ms and disappeared not later than 210 ms after tone onset. The present results are less consistent with a facilitation mechanism because, in many cases, periods of the late response to the first tone did not overlap with periods during which the discharge rate in response to the second tone was enhanced (indicated by rectangles in the example in Fig. 1).

If the disinhibition model is correct, response enhancement and response attenuation would be induced by similar mechanisms. Response attenuation was thought to involve presynaptic inhibition as well, however, acting on the geniculocortical synapse and resulting in a decrease of the excitatory drive to the cortical neuron during the presence of sensory stimuli (Brosch and Schreiner 1997;Nelson 1991b). Thus these two types of poststimulatory response inhibition would be different in their action on a neuron from the inhibition that modulates a neuron during the presence of a spectrally complex tone and that is transmitted by axodendritic or axosomatic synapses (e.g., Somogyi 1989).

Functional implications

There is a rich body of psychophysical observations on auditory facilitation (Rubin 1960; Zwislocki et al. 1959), enhancement (Galambos et al. 1972;Irwin and Zwislocki 1971; Viemeister and Bacon 1982), and sensitization (Hughes 1954), which result in a decrease of the threshold or an increase of the subjective loudness of tones presented in a sequence. However, the conditions for inducing these perceptual augmentation phenomena and their consequences are different from those inducing response enhancement in neurons in auditory cortex. Perceptual effects are strongest if the frequencies of consecutive tones are alike: auditory facilitation is only observed after brief (<5 ms) exposure to near threshold sounds (<25 dB), whereas auditory sensitization is experienced after exposure to intense (>70 dB) tones for a period of 1–3 min. For intermediately intense tones (40–70 dB), similar to those used in the present study, auditory enhancement is observed that increases the subjective loudness of the succeeding tone by ≤30 dB (Elmasian and Galambos 1975). This perceptual enhancement, however, is strongest immediately after the termination of the first tone and decreases monotonically within a period of ∼500 ms.

The present study demonstrates that many neurons in auditory cortex are more strongly activated by a sequence of two tones than by a single tone. If the number of discharges signals the certainty of the presence of specific sensory stimuli, as claimed in the neuron doctrine byBarlow (1972), a number of cortical neurons prefer a tone sequence to a single tone. In this view, cortical neurons can be considered to represent feature detectors for tone sequences (Newman and Symmes 1979). The discharge rate of cortical neurons is modulated by a wide range of frequency combinations of two tones and by a range of several tens to several hundreds of milliseconds of SOAs between tones. The most preferred stimulus is a tone pair with a SOA of 120 ms and with frequencies which are ∼1 octave apart. Thus if the cortical neurons are sequence detectors, they could be involved in extracting information on the temporal structure of sounds. Specifically, cortical neurons could contribute to the temporal segmentation of sounds because their sensitivity range for SOA is similar to the predominant segmentation rates found in many communication and behaviorally important natural sounds in humans (e.g., Bregman 1990) and in M. fascicularis(Palombit 1992). In the spectral domain, we are not aware of a perceptual correlate in which monkeys prefer sequences of tones which are about an octave apart to sequences with other frequency ratios. Even many naı̈ve human subjects have great difficulties in recognizing octave equivalencies in sequences of pure tones (Thurlow and Erchul 1977), although this does not exclude that octaves are preferred. However, whether or not and how cortical neurons potentially contribute to the mental representation of the temporal structure of sounds remains an open question.

Acknowledgments

The authors thank E. Budinger for help during experiments and brain reconstructions. They also thank Dr. E. Jürgens, E. Budinger, and especially Dr. P. Heil for valuable suggestions for the preparation of the manuscript.

Footnotes

  • Address for reprint requests: M. Brosch, Leibniz-Institut für Neurobiologie, Brenneckestraße 6, 39118 Magdeburg, Germany.

  • The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

REFERENCES

View Abstract