JN Fuel your research with LabChart
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J Neurophysiol 88: 1433-1450, 2002;
0022-3077/02 $5.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (45)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Harms, M. P.
Right arrow Articles by Melcher, J. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Harms, M. P.
Right arrow Articles by Melcher, J. R.

The Journal of Neurophysiology Vol. 88 No. 3 September 2002, pp. 1433-1450
Copyright ©2002 by the American Physiological Society

Sound Repetition Rate in the Human Auditory Pathway: Representations in the Waveshape and Amplitude of fMRI Activation

Michael P. Harms1,2 and Jennifer R. Melcher1,2,3

 1Eaton-Peabody Laboratory, Massachusetts Eye and Ear Infirmary, Boston 02114;  2Harvard-Massachusetts Institute of Technology Division of Health Sciences and Technology, Speech and Hearing Bioscience and Technology Program, Cambridge 02139; and  3Department of Otology and Laryngology, Harvard Medical School, Boston, Massachusetts 02115


    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

Harms, Michael P. and Jennifer R. Melcher. Sound Repetition Rate in the Human Auditory Pathway: Representations in the Waveshape and Amplitude of fMRI Activation. J. Neurophysiol. 88: 1433-1450, 2002. Sound repetition rate plays an important role in stream segregation, temporal pattern recognition, and the perception of successive sounds as either distinct or fused. This study was aimed at elucidating the neural coding of repetition rate and its perceptual correlates. We investigated the representations of rate in the auditory pathway of human listeners using functional magnetic resonance imaging (fMRI), an indicator of population neural activity. Stimuli were trains of noise bursts presented at rates ranging from low (1-2/s; each burst is perceptually distinct) to high (35/s; individual bursts are not distinguishable). There was a systematic change in the form of fMRI response rate-dependencies from midbrain to thalamus to cortex. In the inferior colliculus, response amplitude increased with increasing rate while response waveshape remained unchanged and sustained. In the medial geniculate body, increasing rate produced an increase in amplitude and a moderate change in waveshape at higher rates (from sustained to one showing a moderate peak just after train onset). In auditory cortex (Heschl's gyrus and the superior temporal gyrus), amplitude changed somewhat with rate, but a far more striking change occurred in response waveshape---low rates elicited a sustained response, whereas high rates elicited an unusual phasic response that included prominent peaks just after train onset and offset. The shift in cortical response waveshape from sustained to phasic with increasing rate corresponds to a perceptual shift from individually resolved bursts to fused bursts forming a continuous (but modulated) percept. Thus at high rates, a train forms a single perceptual "event," the onset and offset of which are delimited by the on and off peaks of phasic cortical responses. While auditory cortex showed a clear, qualitative correlation between perception and response waveshape, the medial geniculate body showed less correlation (since there was less change in waveshape with rate), and the inferior colliculus showed no correlation at all. Overall, our results suggest a population neural representation of the beginning and the end of distinct perceptual events that is weak or absent in the inferior colliculus, begins to emerge in the medial geniculate body, and is robust in auditory cortex.


    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

It is well known from human psychophysical experiments that the perception of a succession of sounds depends strongly on the rate of sound presentation. For instance, when bursts of noise are presented repeatedly at a low rate (e.g., <10/s), each burst can be separately resolved (Miller and Taylor 1948; Symmes et al. 1955). In contrast, bursts presented at a higher rate fuse to form a single, modulated percept. In experiments where multiple series of sounds are presented simultaneously (e.g., a series of high and a series of low frequency tone bursts), the rate of sound presentation influences whether the series are perceived as single or separate streams, as well as the perceived temporal pattern within each stream (Bregman 1990; Royer and Robin 1986). The dependencies on rate observed in controlled psychophysical experiments such as these suggest that rate plays an important role in the perception of the more complex acoustic conditions encountered in everyday life.

Since repetition rate plays so basic a role in determining how sounds are heard, it is not surprising that there have been numerous neurophysiological studies of rate in animals. Broad trends concerning the coding of rate in the auditory pathway have emerged from this work. For instance, the highest repetition rates at which neurons respond faithfully to each successive sound in a train (or each successive cycle of amplitude modulated stimuli) tends to decrease from brain stem to thalamus to cortex (e.g., Creutzfeldt et al. 1980; Langner 1992; Schreiner and Langner 1988). In cortex, the neural coding of low and high rates may be accomplished by different populations of neurons, one coding low-rate stimuli through stimulus-synchronized activity and the other coding high rates in the overall amount of discharge activity (Lu and Wang 2000; Lu et al. 2001). While the animal work has shed light on the neural representations of repetition rate, the degree to which the animal findings extend to humans remains uncertain because of interspecies differences, anesthesia differences, and a paucity of data in humans that can serve as a link to the animal work. In the end, direct neurophysiological data in human listeners is important if we are to understand how repetition rate is represented in the activity patterns of the human brain.

Most previous neurophysiological studies of repetition rate in humans have used noninvasive techniques for probing brain function, such as evoked potential and evoked magnetic field measurements. The evoked response work has examined averaged responses at short, middle, and long latencies to various types of brief stimuli (e.g., clicks, tone and noise bursts) presented at different rates (Näätänen and Picton 1987; Picton et al. 1974; Thornton and Coleman 1975). A particular strength of evoked potential and magnetic field measurements is that they can be used to examine responses to individual stimuli within a train up to much higher rates than with other noninvasive brain imaging techniques (see following paragraph). A limitation, however, is that the sites of response generation cannot always be reliably localized. Evoked magnetic field examinations of repetition rate are further limited in that they provide information mainly concerning cortical areas because of inherent limitations in probing subcortical function (Erné and Hoke 1990).

Positron emission tomography (PET) and functional magnetic resonance imaging (fMRI), two techniques for spatially mapping brain activity, have also been used to examine the dependence of human brain activation on repetition rate. Compared with evoked potential and magnetic field measurement, fMRI lacks the temporal resolution needed to separately resolve the responses produced by individual stimuli in a train (except at extremely low rates, e.g., approximately 0.1/s), and the temporal resolution of PET is even less. An important advantage, however, is that both PET and fMRI enable activation to be directly localized to brain stem, thalamic, and cortical structures of the auditory pathway (Griffiths et al. 2001; Guimaraes et al. 1998; Lockwood et al. 1999; Melcher et al. 1999). The localization provided by fMRI is particularly precise because of the technique's high spatial resolution and direct mapping to anatomy. Despite the fact that fMRI and PET can show activation at different stages of the auditory pathway, previous PET and fMRI studies varying rate have focused largely on cortical areas (Binder et al. 1994; Dhankhar et al. 1997; Frith and Friston 1996; Giraud et al. 2000; Price et al. 1992; Rees et al. 1997; Tanaka et al. 2000). Additionally, most of the studies focused on the low "rates" characteristic of speech (e.g., <2.5 words or syllables/s) because they were directed at understanding speech processing. Overall, there is limited PET or fMRI data concerning the representations of rate within the human auditory pathway. Specifically, there is little information concerning the transformation of rate representations from structure to structure within the pathway for a wide range of psychophysically relevant rates.

The present fMRI study compared the representation of repetition rate across cortical and subcortical structures of the human auditory pathway using a wide range of rates. Stimuli were trains of repeated noise bursts with repetition rates ranging from low (where each burst could be resolved individually) to high (where individual bursts were not distinguishable and the train was perceived as a continuous, but modulated, sound). Noise bursts were chosen as the elemental stimulus based on the assumption that broadband sound would elicit robust responses by activating neurons across a wide range of characteristic frequencies. fMRI was selected for its high spatial resolution, its localizing capabilities, and its higher temporal resolution (approximately 2 s) compared with PET (>10 s). The latter feature proved important because one of the most striking differences in rate representation across structures occurred in the temporal dynamics of the fMRI response.

Portions of this work were presented at the annual meeting of the Society for Neuroscience (1997), the 21st annual meeting of the Association for Research in Otolaryngology (1998), the 4th and 5th International Conferences on Functional Mapping of the Human Brain (1998 and 1999), and in M. P. Harms' doctoral thesis (Massachusetts Institute of Technology, 2002).


    METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

Four series of experiments were conducted. The first two examined the effect of repetition rate on the response to a noise burst train in the inferior colliculus (IC), Heschl's gyrus (HG), and the superior temporal gyrus (STG; exp. I), or the IC and medial geniculate body (MGB; exp. II). The remaining experiments (exps. III and IV) were aimed at understanding one of the findings from exp. I, namely an unusual form of temporal response in the cortex to trains with a high repetition rate.

A total of 12 subjects participated in these experiments. They ranged in age from 19 to 35 yr (mean = 25 yr). Ten of the subjects were male. Nine were right-handed. Subjects had no known audiological or neurological disorders.

This study was approved by the institutional committees on the use of human subjects at the Massachusetts Institute of Technology, Massachusetts Eye and Ear and Infirmary, and Massachusetts General Hospital. All subjects gave their written informed consent.

Exps. I and II: noise burst trains with different burst repetition rates

Nine subjects participated in a total of 11 imaging sessions for exps. I and II (exp. I: 5 sessions, subjects 1-5; exp. II: 6 sessions, subjects 2, 5, and 6-9).

The stimuli were bursts of uniformly distributed white noise. Individual noise bursts in all four experiments were 25 ms in duration (full-width half-maximum), with a rise/fall time of 2.5 ms. The bursts were presented at repetition rates of 1, 2, 10, and 35/s (exp. I) or 2, 10, 20, and 35/s (exp. II). The 1/s rate was used in only three of the five sessions of exp. I. The spectrum of the noise stimulus at the subjects' ears was low-pass (6-kHz cutoff), reflecting the frequency response of the acoustic system.

Noise bursts were presented in 30-s trains alternated with 30-s "off" periods, during which no auditory stimulus was presented (Fig. 1, top). Four alternations between "train on" and "off" periods constituted a single scanning "run" (total duration 240 s). For all but two sessions (in exp. I), each of the four rates was presented once during each run, and their order was varied across runs. Within a train, the repeated noise bursts were identical (i.e., "frozen"), but the noise bursts differed across trains and runs. For the other two sessions, the same rate was presented throughout a run, and this rate was varied across runs. For these two sessions, the noise burst was frozen within a run, but differed across runs. In each session, the total number of train presentations at each rate was between 8-13.



View larger version (37K):
[in this window]
[in a new window]
 
Fig. 1. Schematic of the stimulus paradigm for exps. I-III. In exps. I and II, trains of noise bursts at a given repetition rate were presented for 30 s, followed by a 30 s "off" period. This alteration was repeated 4 times for each imaging "run," typically using a different repetition rate for each "train on" period. Tick marks represent an image acquisition (approximately every 2 s). The expanded view uses a smaller time scale to illustrate the stimulus---in this case a portion of a prolonged 10/s noise burst train. In exp. III, "trials" consisting of either 1, 2, or 5 noise bursts were presented once every 18 s (15-16 trials per run). The interstimulus interval for the 2 noise bursts was either 500 ms or 28.6 ms (i.e., 2 NBs@2/s and 2 NBs@35/s; in this case the expanded view shows the complete stimulus for a given trial). The trials with 5 noise bursts used an interstimulus interval of 28.6 ms.

Exp. III: small numbers of noise bursts

To investigate how the initial bursts of a train contribute to cortical responses to the onset of a train, we examined the responses to a single noise burst and short clusters of noise bursts. Responses were collected in three imaging sessions with three subjects (exp. III; subjects 2, 5, and 10).

Either one noise burst or a cluster of noise bursts (2 or 5) was presented once every 18 s, constituting a single "trial" (Fig. 1, bottom). For the clusters of five noise bursts, the interstimulus interval (ISI, onset-to-onset) between noise bursts was 28.6 ms, equivalent to the ISI for a rate of 35/s. For clusters of two noise bursts, two different ISIs were used: 500 ms (2/s rate) and 28.6 ms (35/s rate). For two sessions, the same stimulus was used in all of the trials for a given run (12 runs total; 270 s per run; 45 total repetitions per trial type). In the third session (subject 10), the stimulus was randomized across trials (7 runs; 288 s per run; 28 repetitions per trial type).

Exp. IV: noise burst trains with different durations

The effect of train duration was examined in two imaging sessions with two subjects (exp. IV; subjects 11 and 12). Trains of four different durations (15, 30, 45, and 60 s) were presented with an "off" period of 40 s following each train. Noise burst repetition rate within each train was always 35/s. Each train duration was presented once per run (8-9 runs; 310 s per run) with the order of durations randomized across runs. Supplementary information concerning the effects of train duration was obtained in two additional experiments that used a single, long-train duration (60 s) and 35/s noise bursts.

Acoustic stimulation

Separately for each ear, the subject's threshold of hearing to 10/s noise bursts was determined in the scanner room. For all experiments, the stimuli were presented binaurally at 55 dB above this threshold. During both threshold determination and functional imaging, there was an on-going low-frequency background noise produced primarily by the pump for the liquid helium (used to supercool the magnet coils) (Ravicz et al. 2000). This sound reaches levels of ~80 dB SPL in the frequency range of 50-300 Hz. Additionally during functional imaging, each image acquisition generated a "beep" of approximately 115 dB SPL at 1.0 kHz (1.5 T scanner) or ~130 dB SPL at 1.4 kHz (3 T scanner, see Imaging). The stimuli for all experiments were clearly audible during functional imaging. For the low-rate trains, an individual burst was occasionally masked by a coinciding imaging "beep." However, because the imaging was cardiac gated (see Imaging), this coincidence of a noise burst with image acquisition occurred only infrequently throughout the low-rate trains.

Noise bursts were delivered through a headphone assembly that provided approximately 30 dB of attenuation at the primary frequency of the scanner-generated sounds (1.0 or 1.4 kHz; Ravicz and Melcher 2001). Specifically, the noise bursts were produced by a digital-to-analog board (running under LabView), amplified, and fed to a pair of audio transducers housed in a shielded box adjacent to the scanner. The output of the transducers reached the subject's ears via air-filled tubes that were incorporated into sound attenuating earmuffs.

Task

Subjects were instructed to listen to the noise burst stimuli. At the end of each scanning run, subjects reported their alertness on a qualitative scale ranging from 1 (fell asleep during run) to 5 (highly alert). Alertness ratings were almost always in the 3-5 range, and were never 1.1

For exp. II, subjects performed the following task to further ensure that they remained attentive: they indicated whenever they detected an occasional increment or decrement in intensity (of 6 dB, lasting 1 s) by raising or lowering their index finger. Each subject identified more than 90% of the intensity changes.

Imaging

Subjects were imaged using a 1.5 or 3 Tesla whole-body scanner (General Electric) and a head coil (transmit/receive). The scanners were retrofitted for high-speed imaging (i.e., single-shot echo-planar imaging; Advanced NMR Systems, Inc.). Exps. I and II were conducted at 1.5 T. Exps. III and IV were conducted at 3 T (except for one of the supplementary sessions of exp. IV). Subjects rested supine in the scanner. To reduce head motion, a bite bar was custom-molded to the subject's teeth and mounted to the head coil, or pillow and foam were packed snugly around the head. Each imaging session lasted approximately 2 h and included the following procedures:
1) Contiguous sagittal images of the whole head were acquired.
2) An automated, echo-planar-based shimming procedure was performed to increase magnetic field homogeneity within the brain regions to be functionally imaged (Reese et al. 1995).
3) The brain slice to be functionally imaged was selected using the sagittal images as a reference. For exps. I, III, and IV, the selected (near-coronal) slice intersected the IC and the posterior aspect of HG and STG in both hemispheres (Fig. 2, left and middle). When multiple transverse temporal gyri were present, the anterior one was intersected. Based on these criteria, we expect that a portion of primary auditory cortex was intersected in both hemispheres of all subjects for exps. I, III, and IV (Rademacher et al. 1993, 2001). For exp. II, the slice intersected the IC and MGB (located just ventral and lateral to the cerebral aqueduct; Fig. 2, right). A single slice, rather than multiple slices, was imaged in all experiments to reduce the impact of scanner-generated acoustic noise on auditory activation.2
4) A T1-weighted, high-resolution anatomical image was acquired of the selected brain slice for subsequent overlay of the functional data [thickness = 7 mm; in-plane resolution = 1.6 × 1.6 mm; TR = 10 s, TI = 1200 ms, TE = 40 ms (exps. I and II) or 57 ms (exps. III and IV)]. A second high-resolution anatomical image was acquired at the end of the session after functional imaging. A comparison of the initial and final T1 images allowed for a gross check of subject movement over the session.
5) Functional images of the selected slice were acquired using a blood oxygenation level-dependent (BOLD) sequence (sessions at 1.5 T: asymmetric spin echo, TE = 70 ms, tau  offset = -25 ms, flip = 90°; sessions at 3 T: gradient echo, TE = 40 or 30 ms, flip = 90 or 60°). Slice thickness was 7 mm. In-plane resolution was 3.1 × 3.1 mm. The beginning of each scanning "run" included four discarded images to ensure that image signal level had approached a steady state. During the remainder of the run, functional images of the selected slice were acquired repeatedly (Fig. 1).



View larger version (59K):
[in this window]
[in a new window]
 
Fig. 2. Functional imaging planes superimposed on sagittal, anatomical images. In exps. I, III, and IV, the plane (thick white line) passed through the inferior colliculi (left) and Heschl's gyri (middle). In exp. II, the plane passed through the inferior colliculi (located just lateral to the brachium of the inferior colliculi) and the medial geniculate bodies of the thalamus (located just ventral and lateral to the cerebral aqueduct; right).

Functional imaging was performed using a cardiac gating method that increases the detectability of activation in the inferior colliculus (Guimaraes et al. 1998) [except for one session of exp. III that used a fixed interimage interval (TR) of 2 s]. Image acquisitions were synchronized to every other QRS complex in the subject's electrocardiogram, and the interimage interval (TR) was recorded. The average TR across all sessions was 2,035 ms (the average within a session varied from 1,521 to 2,650 ms). Fluctuations in heart rate lead to variations in TR that result in image-to-image variations in image signal strength (i.e., T1 effects). Using the measured TR values, image signal was corrected to account for these variations (Guimaraes et al. 1998).

Analysis

IMAGE PREPROCESSING. The images for each scanning run were corrected for any movements of the head that may have occurred over the course of the imaging session. Each functional image of a session was translated and rotated to fit the first image of the first functional run using standard software (SPM95; without spin history correction; Friston et al. 1995, 1996). Because only one functional slice was acquired, these corrections for motion were necessarily limited to adjustments within the imaging plane. In most cases, the motion correction algorithm was well-behaved and resulted in an improvement in image alignment. However, for one session, the algorithm introduced some clearly artifactual movement, so the premotion corrected data were utilized. Additionally, we did not include the MGB of one subject in the analysis, because the image translations calculated by the motion-correction algorithm were smaller than the movement evident at the location of the MGB in the T1 anatomical images acquired pre- and post- functional imaging. A similar discrepancy did not occur for the IC of this subject, so the IC data were included. The images for each run were further processed in two ways to enhance the likelihood of detecting activation. 1) Image signal versus time for each voxel was corrected for linear or quadratic drifts in signal strength over each run (i.e., drift-corrected). 2) Image signal versus time for each voxel and run was normalized such that the time-average signal had the same (arbitrary) value for all voxels and runs. (Specifically, the signal vs. time data were ratio normalized based on the intercept of a least-square quadratic fit to the data). This normalization was done to eliminate artificial discontinuities in the signal level between runs in the subsequently concatenated data. All subsequent analyses were performed on the drift-corrected, normalized images.

GENERATING ACTIVATION MAPS. Maps of activation to noise burst trains (exps. I, II, and IV) were derived as follows. First, each image was assigned to either a "train on" or "off" period. Stimulus-evoked changes in image signal typically have a delay of 4-6 s (Bandettini et al. 1993; Buckner et al. 1996; Kwong et al. 1992). To account for this (hemodynamic) delay, the first three images taken after the onset of a noise burst train were assigned to the preceding "off" period, and the first three images after the train offset were assigned to the preceding "train on" period. For each rate, the images assigned to each "train on" period and its following "off" period were concatenated into a single file. Image signal strength during train on versus off periods was then compared for each voxel using an unpaired t-test (Press et al. 1992). The P value result of this statistical test was plotted for each voxel with P <=  0.01 to yield a spatial map of activation. P values were not corrected to account for the correlated nature of fMRI time-series (Purdon and Weisskoff 1998), nor were they adjusted for the repeated application (voxel-by-voxel) of a statistical test (Friston et al. 1994).

DEFINING REGIONS OF INTEREST. Responses were analyzed quantitatively within four anatomically defined regions of interest (ROIs): the IC, MGB, HG, and STG. These ROIs, which were defined separately for each hemisphere, were first identified in the high-resolution anatomical images (in-plane resolution: 1.6 × 1.6 mm) of the functional imaging plane. These "high resolution" ROIs were down-sampled to the same resolution as the functional images (3.1 × 3.1 mm), to yield the ROIs used for all subsequent analyses. The ROI borders were defined as follows:

IC. In exps. I, III, and IV, the IC were readily identified as distinct anatomical circular areas (e.g., Fig. 3). For exp. II, only the caudal edge of the IC were distinguishable (e.g., Fig. 6), so the area of each IC ROI was defined as a circle sized to fit this visible edge. The IC ROIs were defined liberally to include voxels at the edge of the IC.



View larger version (64K):
[in this window]
[in a new window]
 
Fig. 3. Activation maps for the inferior colliculus (IC) (2 subjects, exp. I). Stimuli were noise burst trains with repetition rates of 2, 10, or 35/s. Each panel shows a T1-weighted anatomic image (grayscale) and superimposed activation map (color). Rectangle superimposed on the diagrammatic image (bottom, right) indicates the area shown in each panel. For the activation maps, regions are colored according to the result of a t-test comparison of image signal strength during "train on" and "off" periods. In this and all subsequent figures, blue and yellow correspond to the lowest (P = 0.01) and highest (P = 2 × 10-9) significance levels, respectively. (Areas with P > 0.01 are not colored). The activation maps (based on functional images with an in-plane resolution of 3.1 × 3.1 mm) and anatomic images (1.6 × 1.6 mm) have been interpolated. Images are displayed in radiological convention, so the subject's right is displayed on the left. R, right; L, left.

MGB. The ambient cistern defined the caudal border of the MGB ROI. The distance from this caudal border to the rostral edge of the MGB ROI was determined from atlases of histologically prepared brains, as was the distance between the midline and medial edge of the MGB ROI. These distances were computed by first normalizing atlas measurements to maximum brain width, and then multiplying the normalized atlas measurements by the maximum width of the imaged brain slice for each subject. The lateral edge of the ambient cistern defined the lateral edge of the MGB ROI. The lateral extent of the ROI was liberal in that it probably included a portion of the lateral geniculate body in some subjects. Correspondingly, activation generally did not reach the lateral-most edge of the MGB ROI.

HG. When HG was visible as a "mushroom" protruding from the surface of the superior temporal plane, the lateral edge of this mushroom defined the lateral edge of the HG ROI. The medial edge of the ROI was the medial-most aspect of the Sylvian fissure. When a distinct mushroom was not present, the HG ROI covered approximately the medial third of the superior temporal plane (extending from the medial-most aspect of the Sylvian fissure). In the inferior-superior dimension, the HG ROI extended superiorly to the edge of the overlying parietal lobe and inferiorly so as to entirely encompass any activation centered on HG.

STG. The STG ROI was defined as the superior temporal cortex lateral to the HG ROI. The definition of the inferior and superior borders was the same as for the HG ROI.

The resulting average number of total voxels (3.1 × 3.1 × 7 mm) per hemisphere in the ROIs was 5 (IC), 10 (MGB), 29 (HG), or 36 (STG).

CALCULATING RESPONSE TIME COURSES. The time course of response was computed for specific voxels within each ROI. For exps. I and II, the voxels (3.1 × 3.1 × 7 mm) were chosen based on the activation maps for a particular "reference rate" (i.e., the rate that typically produced the strongest activation in the maps): 35/s for IC, 20/s for MGB, 10/s for HG, and 2/s for STG. For each IC and MGB ROI, we used the single voxel with the lowest P value at the reference rate. For each HG and STG ROI, we averaged the responses of the four voxels (not necessarily contiguous) with the lowest P values at the reference rate.3 Note that for a given structure, session, and hemisphere, the same voxels were used in computing the response time course at each rate. For exps. III and IV, which focused on auditory cortex, the same number of lowest P value voxels (four in each hemisphere) were selected for analysis within the HG and STG ROIs. However, the activation map for selecting voxels was based on a single run of music (4 repetitions of the first 30 s of the fourth movement in Beethoven's Symphony No. 7). Music was used because 1) it typically evokes larger magnitude cortical responses than noise burst trains, so robust cortical activation maps could be obtained with a single run, thereby allowing more time for collecting responses to the primary stimuli of interest in exps. III and IV, and 2) the dominant amplitude modulation frequencies of the music stimulus (<5 Hz) were comparable with the "reference rates" for auditory cortex.4

For exps. I and II, response time courses were computed as follows. Because cardiac gating results in an irregular temporal sampling, the time series for each imaging "run" and voxel was linearly interpolated to a consistent 2-s interval between images, using recorded interimage intervals to reconstruct when each image occurred. These data were then temporally smoothed using a three-point, zero-phase filter (with coefficients 0.25, 0.5, 0.25). A response "block" was defined to include the 10 s prior to a noise burst train, the period coinciding with the train, and the off period following the train. These response blocks were averaged according to rate to give an average signal versus time waveform for each rate, session, and hemisphere. The signal at each time point was then converted to a percent change in signal relative to a baseline. The baseline was defined as the average signal from t = -6 to 0 s, with time t = 0 s corresponding to the onset of the noise burst train. Finally, for each rate, the percent change time courses were averaged across sessions and hemispheres.5 For exp. IV, response time courses were calculated the same way, except response blocks were averaged according to train duration.

For exp. III, time courses for each stimulus were computed as described above, with the following exceptions: 1) no temporal smoothing was applied (to avoid disproportionally altering the responses, which were expected to be brief in duration) and 2) the baseline signal level for converting time courses to percent change was based on the average of just two time points, t = -2 to 0 s. [This was done because the "off" period between stimuli (18 s) was less for this experiment than for the others (30 or 40 s) and we wanted to avoid including time points where the response may have not yet returned to baseline from the preceding stimulus.]

QUANTIFYING RESPONSE MAGNITUDE. For exps. I and II, response magnitude in each auditory structure was quantified using two measures computed from the percent change time courses. "Time-average" percent change, a measure of the overall response strength during a noise burst train, was computed as the mean percent change from t = 4 to 30 s. "Onset" percent change, a measure of the response amplitude near the beginning of a noise burst train, was computed as the maximum percent change from t = 4 to 10 s. Since "time-average" and "onset" percent change were calculated from the percent change time courses, they indicate image signal deviations relative to a 6-s baseline immediately preceding the stimulus (i.e., the baseline period used in calculating the time courses). For exp. III, peak percent change was defined as the maximum value in the percent change time courses.


    RESULTS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

Response to noise burst trains: effect of burst repetition rate

INFERIOR COLLICULUS. Activation maps for the IC showed an increase in activation with increasing burst repetition rate. Figure 3 demonstrates this increase for two sessions from exp. I. The maps show activation that is greatest at 35/s, less at 10/s, and absent at 2/s. (Activation strength is reflected in the maps as a lower P value from the statistical comparison of image signal level during train "on" and "off" periods). Greater IC activation at higher rates is also demonstrated by the maps in Fig. 6, which correspond to two sessions from exp. II.

Figure 4 (left) shows the time course of the responses in the IC averaged across all sessions. At all rates, the response was "sustained" in that image signal increased when the noise burst train was turned on, remained elevated while the train was on, and decreased once the train was turned off. The amplitude of the sustained response during the "train on" period increased with increasing rate.



View larger version (44K):
[in this window]
[in a new window]
 
Fig. 4. Response time courses averaged across sessions and hemispheres [solid lines; IC, n = 22 for 2, 10, and 35/s, n = 12 for 20/s; medial geniculate body (MGB), n = 10 for all rates; HG and STG, n = 10 for 2, 10, and 35/s, n = 6 for 1/s]. Dashed lines give the mean ± SE at each time point. Note that the vertical scale for the IC and MGB responses differs slightly from the scale for Heschl's gyrus (HG) and superior temporal gyrus (STG).

The increase in response amplitude was quantified using two measures: maximum percent signal change near the beginning of the "train on" period ("onset" percent change), and percent signal change time-averaged over the on period ("time-average" percent change; defined in METHODS). On average, both measures increased with increasing rate (Fig. 5, top left). Onset and time-average percent change showed a significant increase from 2/s to 10/s (P = 0.01, onset; P = 0.05, average; paired t-test), and from 10/s to 35/s (P = 0.02, onset; P = 0.006, average). Plots of percent change versus rate for individual IC also showed an overall trend of increasing percent change with increasing rate (Fig. 5, top middle and right). For 19 of 22 IC, the response at 35/s was greater than the response at 2/s (for both measures).



View larger version (42K):
[in this window]
[in a new window]
 
Fig. 5. Response magnitude vs. repetition rate in the IC, MGB, HG, and STG. Left: time-average and onset percent change averaged across sessions and hemispheres. Bars indicate the SE (see Fig. 4 legend for the number of sessions and hemispheres represented by each data point). Middle and right: Time-average and onset percent change for each session and hemisphere vs. rate. To facilitate comparison of the trends across rate, each curve has been displaced vertically by adding a constant (specific to each curve), such that the resulting mean of the values for 2, 10, and 35/s is equal to the population mean for these rates (left). In all plots, the repetition rate axis uses a categorical scale. Note that there are no data at 20/s for all of the HG and STG curves and for 10 of the IC curves.

For the rates that overlapped between exps. I and II (2, 10, and 35/s) there was no significant difference between the percent signal change values (P > 0.1, t-test), suggesting that the two main differences between these experiments (intensity detection task and imaging plane) did not have a strong effect on inferior colliculus responses. There was no significant difference between the values obtained from the left and right IC (P > 0.3, paired t-test, collapsing the data across all rates).

In summary, the IC showed a sustained response to noise burst trains. The amplitude of this response increased with increasing burst repetition rate.

MEDIAL GENICULATE BODY. In contrast with the IC, activation maps for the MGB usually showed a nonmonotonic change in activation with rate. The trend for the MGB is illustrated by the maps for two sessions in Fig. 6. The maps show an increase in MGB activation with increasing rate in the 2/s-20/s range, but a decrease from 20/s to 35/s.



View larger version (95K):
[in this window]
[in a new window]
 
Fig. 6. Activation maps for the MGB and IC (2 subjects, exp. II). Stimuli were noise burst trains with repetition rates of 2, 10, 20, or 35/s. See Fig. 3 legend.

The trend in the activation maps parallels the rate dependence of time-average percent signal change in the MGB, but not onset percent change. The close correspondence between time-average percent change and activation maps is to be expected since the maps are based on a comparison of time-average signal levels during "train on" and "off" periods. Time-average percent change increased significantly from 2/s to 20/s (P = 0.005, paired t-test), and decreased from 20/s to 35/s (P = 0.03; Fig. 5, left). Onset percent signal change showed the same trend from 2/s to 20/s (i.e., a significant increase; P < 0.001), but not at high rates in that there was no difference between 20/s and 35/s (P = 0.9). The different rate dependence for onset versus time-average percent change is also apparent overall in the plots for individual MGBs (Fig. 5, middle and right), despite the intersession variability in the precise trends between the rates. Neither onset nor time-average percent change differed significantly between the left and right MGBs (P > 0.15, paired t-test, collapsing the data across all rates).

The different rate-dependencies for onset and time-average percent change indicate that the time course of the MGB response varies with rate. This variation is illustrated in Fig. 4 (2nd column). On average, responses to a 35/s train peaked just after train onset and then declined by approximately 50% during the remainder of the train. This moderate decrease in the response differs from the largely sustained responses at the lower rates of 2/s, 10/s, and 20/s. A quantitative comparison of onset percent change to the percent change at the end of the train (i.e., at 30 s) confirmed the response difference at the highest rate. For the 35/s train, response amplitude was significantly less at the end of the train (P = 0.04, paired t-test), consistent with a response decrease. In contrast, there was no significant difference at the lower rates (P > 0.1), consistent with a sustained response. Thus MGB responses varied over the course of high, but not lower rate trains.

The rate dependencies seen in time-average percent change and the activation maps can be explained in terms of the time course and onset amplitude of MGB responses. Between 2/s and 20/s, the increase in time-average percent change (and activation in the maps) is largely attributable to the increase in sustained response amplitude, which is simultaneously reflected as an increase in onset percent change. Given that 20/s and 35/s evoke equal onset responses, the decrease in time-average percent change (and in the maps) between these two rates can be primarily attributed to a change in the response to the latter portion of the train (i.e., the change from a sustained response to one with a moderate decrease following onset).

In summary, between 2/s and 20/s, onset percent change increased with increasing rate in MGB, while response time courses remained primarily sustained. Between 20/s and 35/s, there was no change in onset amplitude, but the response waveshape changed from sustained to moderately decreasing following the train onset.

HESCHL'S GYRUS AND SUPERIOR TEMPORAL GYRUS. A nonmonotonic relationship between rate and activation was apparent in the activation maps for HG and STG (Fig. 7). The maps showed an activation increase from 1/s to 2/s, and a decrease from 10/s to 35/s.



View larger version (90K):
[in this window]
[in a new window]
 
Fig. 7. Activation maps for HG and STG (2 subjects, exp. I). Stimuli were noise burst trains with repetition rates of 1, 2, 10, or 35/s. See Fig. 3 legend.

As expected, the trends in the activation maps paralleled the rate dependence of time-average percent signal change. In HG, time-average percent change increased from 2/s to 10/s (P = 0.05, paired t-test), but decreased markedly from 10/s to 35/s (P < 0.001; Fig. 5, left). These trends were observed consistently in individual HG (Fig. 5, middle). In STG, the rate of greatest time-average percent change (2/s) was less than in HG (10/s; Fig. 5, left). For the six STG with 1/s data, time-average percent change at 2/s was greater than at 1/s (P = 0.003, paired t-test). Time-average percent change in STG tended to decrease from 2/s to 10/s and from 10/s to 35/s, so that the overall decrease from 2/s to 35/s was significant (P = 0.002; Fig. 5).

Onset percent change again showed differences compared with time-average percent change. In HG, the difference was primarily one of degree. Onset percent change at 10/s was significantly greater than both 2/s and 35/s (P = 0.01, paired t-test; Fig. 5, left and right), but the decrease from 10/s to 35/s averaged only 20% for onset percent change compared with 50% for time-average percent change. In STG, onset and time-average percent change had overall different trends. Whereas time-average percent change decreased from 2/s to 35/s (P = 0.002), onset percent change was unchanged over this range (P = 0.4; Fig. 5, left and right).

A dramatic rate-dependent change in response waveshape accounts for the differences between onset and time-average percent change in HG and STG. At low rates, responses were sustained, whereas at high rates they were not (Fig. 4, 3rd and 4th columns).6 At the highest rate of 35/s, image signal increased to a peak occurring 6 s following train onset ("on-peak"), declined substantially over the next 8 s, increased slightly over the remainder of the on period, and peaked again 6 s following train offset ("off-peak"). The most prominent features of this "phasic" response are the peaks just after train onset and offset. In HG, the reduction in time-average percent change between 10/s and 35/s was partly because of 1) the decrease in onset percent change, and 2) the more dramatic signal decline during the on period for the 35/s train. In STG, onset percent change did not vary significantly with rate, so the decline in time-average percent change at high rates was primarily due to the change in response waveshape.

While the rate-dependencies of the HG and STG responses paralleled each other, there were also clear differences between the two structures. In both HG and STG, the signal decline during the on period became increasingly pronounced with increasing repetition rate, so responses had an increasingly phasic appearance. However, at any given rate, the magnitude of the signal decline was greater in STG. In STG, the magnitude of the decline (measured as a percentage decrease from the onset peak to the value at 14 s in the average response time courses; Fig. 4) was 22, 25, 58, and 93% for 1/s, 2/s, 10/s, and 35/s respectively. In HG, the corresponding values were all less: 15, 13, 32, and 78%.

In light of the phasic response for high-rate trains, it is not surprising that the activation maps frequently showed little evidence of activity at the 35/s rate (e.g., Fig. 7, top left). The activation maps were based on the difference in time-average signal between "train on" and "off" periods. It is clear that for the response to the 35/s train (Fig. 4), the difference between the time-average of these two periods will be close to zero, even though cortex is responding robustly (albeit transiently). Thus for cortex, the activation maps, calculated using a standard method, provide only a partial picture of cortical rate dependencies.

In HG and STG there were right-left differences in response magnitude. In HG, both time-average and onset percent changes were greater on the right for 16 of 18 possible comparisons (P < 0.001, paired t-test, collapsing the data across all rates). In STG, the same trend was apparent, but was weaker (right greater in 12 of 18 cases; P < 0.02). These right-left differences may reflect a functional difference between right and left auditory cortex. Alternatively, they may reflect a functional difference in auditory cortex in the anterior-posterior dimension. For example, since HG tends to be shifted more anteriorly on the right compared with the left hemisphere (Leonard et al. 1998; Penhune et al. 1996; Rademacher et al. 2001), the imaged slice may have sampled different regions of primary cortex in the two hemispheres (Morosan et al. 2001).

In summary, responses in HG were sustained at low rates, but became phasic at high rates. The most prominent features of this phasic response are signal peaks just after train onset and offset. The amplitude of the on-peak (onset percent change) was greatest at 10/s. Responses in STG also showed a progression from sustained to phasic with increasing rate. However, the amplitude of the on-peak did not vary significantly with rate.

Response to small numbers of noise bursts

To investigate how the initial bursts of a train contribute to cortical responses at train onset, we examined the responses to a single noise burst, and clusters of two or five noise bursts with a burst-to-burst ISI of 28.6 ms (35/s rate) or 500 ms (2/s rate). Both single and clustered noise bursts elicited measurable responses in HG and STG. The responses, averaged across subjects and hemispheres, peaked 4-6 s after the stimulus, and then returned to baseline by 8-10 s (Fig. 8, top). After 8-10 s, the average response dipped below baseline. However, this response feature, unlike the others, was dominated by the data for only one of the three subjects (subject 2).



View larger version (39K):
[in this window]
[in a new window]
 
Fig. 8. Top: average response time courses in HG and STG to either a single noise burst, or a cluster of 2 or 5 noise bursts. Each trace is an average across both hemispheres of 3 subjects. Bottom: normalized peak response for each subject and hemisphere. Dashed line indicates the prediction from a model in which each successive noise burst evokes a response identical to the 1 NB response and the responses to each burst add.

Figure 8 (bottom) shows normalized peak response versus number of noise bursts for each subject and hemisphere. These normalized responses were quantified as the peak percent signal change in the response time course (which always occurred at t = 4 or 6 s), divided by the peak percent change for a single noise burst. The normalized peak response generally increased with increasing number of noise bursts (Fig. 8, bottom). However, the response increase was always less than would be predicted by a model in which each successive noise burst evokes a response equivalent to the 1 NB response and the responses to each burst add (i.e., linear growth). Similarly, for every subject and hemisphere, the peak response to 5 NBs@35/s was <2.5 times the response to 2 NBs@35/s. These results are consistent with a model in which the responses to noise bursts at the beginning of a train are greater than those occurring later. The fact that the peak response for 2 NBs@2/s was always greater than for 2 NBs@35/s indicates that any decline in response from the first burst to the second was greater at high, compared with low, rates.

We compared the mean peak percent change for single and multiple noise bursts with the mean onset percent change for 35/s trains from exp. I to gain an appreciation for the proportion of the on-peak accounted for by the earliest noise bursts in the train.7 In STG, we estimated that the mean peak percent change for 1 NB and 5 NBs@35/s was approximately 40% and 65%, respectively, of the mean onset percent change. In HG, the corresponding estimates were approximately 25% and 40%. These values indicate that the earliest noise bursts of a high-rate train account for a substantial portion of the on-peak, especially in STG.

Response to high-rate (35/s) noise burst trains: effect of train duration

By considering noise burst trains with different durations, we tested whether the off-peak in cortical phasic responses is specifically linked to train termination. Two subjects were studied using 35/s noise burst trains with durations of 15, 30, 45, and 60 s. For both subjects and all durations, HG responses showed a distinct off-peak after train offset (Fig. 9, top). Regardless of train duration, the off-peak occurred approximately 6 s after train offset, indicating a strong coupling between off-peak and train termination. A similarly strong coupling between off-peak and train termination was also found in STG for both subjects (not shown). One subject (subject 11) was unusual in that the response in STG did not show a clear off-peak for voxels selected by our standard criteria. Nevertheless, there was a clear off-peak for other, nearby voxels, and this off-peak always occurred approximately 6 s after train offset, regardless of train duration. In contrast to cortex, IC responses were largely sustained for all durations and showed no sign of an off-peak (Fig. 9, bottom).



View larger version (32K):
[in this window]
[in a new window]
 
Fig. 9. Response time courses in HG (top) and IC (bottom) to 35/s noise burst trains, with durations of 15, 30, 45, and 60 s. Each trace is an average across hemispheres for a given subject. The off-peak in each HG response is indicated by an arrow.

Data for two additional subjects further support the strong coupling between cortical off-peak and train termination. These subjects, tested with a single train duration of 60 s, showed off-peaks in both HG and STG that occurred approximately 6 s after train offset. All of the train duration data taken together indicate that the cortical off-peak is specifically evoked by the termination of high-rate noise burst trains.


    DISCUSSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

fMRI responses to trains of noise bursts changed substantially with burst repetition rate in every studied structure, although the nature of the changes was highly structure-dependent. In the IC, response amplitude following train onset increased with increasing rate while response waveshape remained unchanged (i.e., sustained). In the MGB, increasing rate produced an increase in onset amplitude up to a point where a further increase in rate instead produced a change in waveshape (from a largely sustained response to one showing a distinct peak just after train onset). In HG, the site of primary auditory cortex, onset amplitude changed somewhat with rate, but the most striking change occurred in response waveshape. At low rates the waveshape was sustained, while at high rates it was strongly phasic in that there were prominent response peaks just after train onset and offset. In STG, which includes secondary auditory areas, onset amplitude showed no systematic dependence on rate, whereas response waveshape showed a strong and dramatic rate dependence paralleling that in HG. Overall, from midbrain to thalamus to cortex, there was a systematic shift in the form of response rate-dependencies from one of amplitude to one of waveshape.

Sustained response waveshapes (as seen in subcortical structures and for low rates in auditory cortex) are commonly reported in the fMRI literature. In contrast, phasic responses (as seen for higher rates in auditory cortex) are not, nor are their individual signature features. One signature feature---the prominent peak following stimulus onset---has been reported for a few prolonged acoustic, odorant, and visual stimuli (Bandettini et al. 1997; Giraud et al. 2000; Jäncke et al. 1999; Sobel et al. 2000) but is nevertheless a fairly uncommon feature for responses in the fMRI literature. A second signature feature of phasic responses---the peak following stimulus offset---is highly unusual. To our knowledge, the only other reported fMRI "off-response" occurred in a subregion of primary visual cortex following the transition from steady white light to darkness (Bandettini et al. 1997). The paucity of previous reports of phasic fMRI responses may be partly an issue of detection since phasic responses are poorly detected by some of the most commonly-used analysis approaches (e.g., a t-test comparison of stimulus "on" and "off" periods, or equivalently, correlation or analyses using the SPM software package that assume a sustained response; Bandettini et al. 1993; Sobel et al., 2000). It is also possible that phasic responses have not been seen because they reflect neurophysiological mechanisms that are only invoked in particular, largely unexplored stimulus regimes.

It is widely assumed that different sound features (e.g., frequency, bandwidth, repetition rate) are represented in the amplitude of fMRI activation or amplitude variations with position (e.g., Giraud et al. 2000; Talavage et al. 2000; Wessinger et al. 2001; Yang et al. 2000). In contrast, the possibility of representations in the temporal dimension is not generally entertained, and this makes the wide variations in cortical response waveshape of this study especially intriguing. A few other studies have also reported covariations between sound characteristics and temporal fMRI activation patterns in the auditory system. For instance, Gaschler-Markefski et al. (1997) examined the degree of temporal stationarity of auditory cortical fMRI responses and reported regional variations depending on stimulus and task. In studying fMRI responses to amplitude modulated noise, Giraud et al. (2000) found an increasingly prominent peak at stimulus onset with increasing modulation rate, a result that strongly parallels the findings of the present study (see Comparison to previous fMRI and PET studies). The present study and these previous reports suggest that fMRI temporal patterns---or more specifically the temporal variations in neural activity underlying these patterns---may be an important way in which sound is represented in the auditory system.

Role of rate per se in determining fMRI responses

In this study, noise burst duration was held constant while rate was varied, so overall stimulus energy and sound-time fraction (STF) covaried with rate (resulting in an approximately 12-dB differential in sound pressure level for 2/s vs. 35/s noise burst trains). While this raises the possibility that the wide range of response waveshapes in auditory cortex was due primarily to changes in parameters other than rate, we do not believe this to be the case for two reasons. First, in a separate study, we have found that varying the intensity of 2/s or 35/s noise bursts over a 20- to 30-dB range has no effect on response waveshape (Harms et al. 2001). Second, we have found that changing rate from 2/s to 35/s while holding STF constant (and therefore varying burst duration) still produces a change in waveshape from sustained to phasic (although STF does have some influence on response waveshape; Harms 2002; Harms and Melcher 1999). In the case of response amplitude, the precise rate dependencies might be somewhat different if stimulus energy and STF were held constant instead of burst duration, because varying energy alone can produce changes in response amplitude (Hall et al. 2001; Sigalovsky et al. 2001), as may also be the case for changes in STF.

fMRI responses and underlying neural activity

To understand the significance of the different fMRI response waveshapes, it is necessary to first consider the extent to which waveshape is governed by neural, metabolic, and hemodynamic factors. While the relationship between neural activity and fMRI responses is not fully understood, it is generally accepted that neural activity and image signal are ultimately linked through a chain of metabolic and hemodynamic events. For the form of fMRI in this study (BOLD fMRI), this linkage is as follows. When there is an increase in neural activity in the form of synaptic events or neural discharges,8 there is a corresponding increase in local brain metabolism and oxygen consumption (Sokoloff 1989). The increase in oxygen consumption is accompanied by an increase in blood flow and blood volume in the active brain region. However, the increase in flow dominates, such that the local concentration of deoxygenated hemoglobin actually decreases, which is important because deoxygenated hemoglobin is paramagnetic (Pauling and Coryell 1936) and thus influences local image signal levels. The net effect of a decrease in deoxygenated hemoglobin is an increase in image signal. When the entire chain of events is considered together, increases and decreases in neural activity result in concordant changes in image signal strength (Kwong et al. 1992; Ogawa et al. 1993; Springer et al. 1999). Since hemodynamic changes occur over the course of seconds, fMRI effectively provides a temporally low-pass filtered view of neural activity. More specifically, since fMRI is sampling activity over small volumes of brain (i.e., voxels), the responses can be thought of as showing the time-envelope of population neural activity on a voxel-by-voxel basis.

Previous work has shown that the relative timing and magnitude of stimulus-evoked changes in blood flow, blood volume, and oxygen consumption can influence the waveshape of the fMRI response (Buxton et al. 1998; Mandeville et al. 1998). While this raises the possibility that changes in waveshape from sustained to phasic reflect changes in hemodynamics rather than underlying neural activity, we believe this to be unlikely for both of the main components of the phasic response, namely the off-peak and the on-peak. It is particularly unlikely that a hemodynamic explanation accounts for the off-peak. Previous hemodynamic modeling and experimentation has not predicted an off-peak following stimulus termination, and we know of no plausible model that could generate such a component. Therefore the emergence of an off-peak with increasing repetition rate in auditory cortex is almost certainly attributable to a rate-dependent increase in neural activity at stimulus offset.

The other major feature that distinguishes phasic from sustained responses, namely the sharp decline in signal that forms the prominent onset peak, requires more detailed consideration because it is known that declines in signal can theoretically occur over the course of a prolonged stimulus for completely hemodynamic reasons (Buxton et al. 1998). However, measurements of BOLD signal, blood flow, and blood volume responses have failed to illustrate a case for which purely hemodynamic features could generate a signal decline as dramatic as those seen here (e.g., Hoge et al. 1999a; Mandeville et al. 1999). Additionally, separate evidence works against the idea that the signal decline is driven primarily by hemodynamic rather than neural influences. The reasoning follows from the fact that the same voxels were capable of showing either a phasic response (and the associated dramatic signal decline) or a sustained response depending on the stimulus. Since the time course of the phasic and sustained responses is very similar over the first 6-8 s, the "operational history" of the hemodynamic system is presumably similar as well. In light of this common initial response, a hemodynamic system that could subsequently generate grossly different response waveshapes seems unlikely, unless the differences reflect differences in underlying neural activity.

While response waveshape varied with rate within a structure, it also varied across structures for a given rate. It is known that there can be spatial heterogeneity in tissue hemodynamics (Chen et al. 1998; Davis et al. 1998), so the possibility that regional variations in hemodynamics play some role in the waveshape differences across structures cannot be discounted. Still, the heterogeneities in hemodynamics that have been documented are not sufficient to account for the dramatic waveshape changes that occur across the pathway as a whole (from the inferior colliculus to cortex).

fMRI response onset and neural adaptation

Given that fMRI responses reflect the time-envelope of population neural activity, the prominent declines in fMRI signal that occur at high rates in MGB, HG, and STG provide clear evidence for an overall decline in neural activity during the first seconds of a train (<10 s). This decline likely includes decreases in synaptic, as well as discharge activity, since both forms of activity can be reflected in fMRI signals.

While an overall decrease in neural activity early in high-rate trains is clear, the subsecond temporal details of activity during this decrease remain unresolved because fMRI provides a low-pass filtered view of neural activity. Previous electrophysiological data suggest various possible forms for the temporal details underlying the overall decline in neural activity. For instance, it may be that the fMRI signal decline reflects a burst-to-burst adaptation in neural activity in which each successive burst early in a train elicits progressively less activity (Fruhstorfer et al. 1970; Ritter et al. 1968; Roth and Kopell 1969). Alternatively, a variant of this may occur in which activity does not always decrease in a strictly progressive fashion across consecutive bursts but sometimes shows an increase from one burst to the next (e.g., facilitation or enhancement; Brosch et al. 1999; Budd and Michie 1994; Loveless et al. 1989). (As long as decreases from burst to burst occur more often than increases, the time-envelope of neural activity would still decrease.) Another possibility is that population neural activity is not synchronized to individual bursts (Lu and Wang 2000; Lu et al. 2001) but instead occurs in response to the train as a whole with an initial peak in activity followed by a lower level of activity. All of these possibilities would result in a decline in the time-envelope of population neural activity and are therefore consistent with the prominent declines seen in fMRI signal.

While we cannot conclusively determine the temporal details of activity during the declines in fMRI signal, it is worth recognizing that aspects of our data are consistent with the idea that there is a burst-to-burst adaptation in neural activity. For instance, the fMRI responses to small numbers of noise bursts are suggestive of an adaptation process because the fMRI response amplitude did not increase in proportion to the number of bursts, but rather showed a slower than linear growth. Whether neural activity and fMRI signal are coupled in an approximately linear manner (and under what circumstances) is an open question under active investigation (Boynton et al. 1996; Dale and Buckner 1997; Gratton et al. 2001; Hoge et al. 1999b; Logothetis et al. 2001; Vazquez and Noll 1998). However, if they are, the slower than linear growth in the fMRI responses to small numbers of noise bursts suggests that the neural activity produced by each successive burst in a train is progressively less (i.e., there is burst-to-burst adaptation in neural activity). If the decline in neural activity from burst to burst were to continue until there is little or no burst-evoked activity, the time-envelope of neural activity would decline substantially following the onset of the train, and so correspondingly would the fMRI signal (as observed for high-rate trains). Another aspect of the data consistent with neural adaptation is the growth in onset amplitude with rate. If neural activity and fMRI response amplitude vary in direct proportion, onset amplitude may be viewed roughly as an indicator of the time-average neural activity during the first seconds of a train. If there were no adaptation and each successive burst in a train produced an identical increase in neural activity, the time-average neural activity during the first seconds of a train would increase in direct proportion to rate, and onset amplitude would be expected to do the same. Instead, a proportional increase in onset amplitude did not occur in any structure. This is most obvious at high rates in MGB, HG, and STG, where onset amplitude either declined or did not change with increasing rate.9 However, it can also be seen at lower rates and in the IC. For instance, an increase in rate from 2/s to 10/s increased onset amplitude by less than twofold in every structure, well short of the fivefold increase expected if growth were proportional to rate. This result is consistent with neural adaptation occurring in all of the studied structures, even the IC where fMRI response waveshapes are sustained and do not immediately suggest an underlying adaptation.

Looking across structures, the data indicate that any neural adaptation increased with increasing position in the pathway. For instance, at any given rate, the percentage decline in signal following the on-peak increased progressively from IC to MGB to auditory cortex (HG and STG), suggesting an increasing degree of adaptation in the underlying population neural activity. An increase in adaptation across structures is also suggested by the fact that the growth in onset amplitude with rate falls increasingly short of predictions assuming no adaptation. For instance, the increase in onset amplitude from 2/s to 10/s falls increasingly short of the fivefold increase predicted in the absence of adaptation as one moves from IC (1.69) to MGB (1.42) to auditory cortex (HG, 1.26; STG, 0.92). Similarly, the increase in onset amplitude from 10/s to 35/s falls increasingly short of the 3.5-fold prediction [IC, 1.42; MGB, 1.36; auditory cortex, 0.80 (HG), 0.99 (STG)]. Thus if there is burst-to-burst adaptation in population neural activity early in a train, it increases from IC to MGB to auditory cortex.

RELATIONSHIP TO ELECTROPHYSIOLOGICAL DATA IN ANIMALS. A trend of increasing adaptation with increasing position in the auditory pathway has emerged from several animal neurophysiological studies explicitly designed to compare responses across structures. For instance, microelectrode recordings of the response to paired stimuli with different interstimulus intervals have shown an increase in recovery time with increasing level in the auditory pathway. In unanesthetized animals (cats and rabbits), the average interval required for 50% recovery of the response to the second of two clicks is 2 ms in the auditory nerve, cochlear nucleus, and superior olivary complex, but 7 ms in the inferior colliculus and 20 ms in auditory cortex (Fitzpatrick and Kuwada 1999). In unanesthetized guinea pig, Creutzfeldt et al. (1980) recorded responses to amplitude modulated tones simultaneously from thalamic and cortical neurons (specifically 9 thalamo-cortical unit pairs for which the correlation of spontaneous activity suggested a direct synaptic connection). Activity in the cortical neurons declined more rapidly over successive cycles of the AM tone than did the activity in the thalamic neurons, indicating greater adaptation in the cortical neurons. Finally, recording near-field potentials from the IC and auditory cortex in response to brief noise bursts in unanesthetized chinchilla, Burkard et al. (1999) found that the mean response amplitude (averaged across noise burst presentations) decreased more in cortex than IC as repetition rate was increased. Their results are again consistent with greater adaptation in cortex than lower structures in the auditory pathway.

The extensive animal literature regarding modulation transfer functions (MTFs) also suggests a change in temporal response properties from the IC to auditory cortex. Here we focus on studies that quantify their results in terms of rate MTFs (rMTF; average firing rate vs. modulation frequency), rather than temporal MTFs, since changes in the "synchronization" or phase locking of neural activity (in the absence of average rate changes) are unlikely to be reflected in fMRI activity. Furthermore, since most animal studies use short-duration stimulus trains (approximately 1 s), the most appropriate measure of the present study for comparison to the animal results is onset amplitude. In the IC, the best modulation frequency (BMF; the frequency at which the rMTF has its largest value) for individual neurons is generally greater than approximately 30 Hz (Krishna and Semple 2000; Langner and Schreiner 1988; Muller-Preuss et al. 1994). In contrast, BMFs in auditory cortex tend to be less than approximately 20 Hz (Bieser and Muller-Preuss 1996; Eggermont 1991; Schreiner and Raggio 1996; Schreiner and Urbas 1988). These values are consistent with this study in that onset amplitude steadily increased in the IC for noise burst rates <= 35/s (the highest rate employed), but peaked in HG at a lower rate (10/s). While the variation in onset amplitude with rate in HG was rather small (and in STG, onset amplitude did not vary at all), a similarly weak "tuning" also holds in population neural activity, in that the rMTF averaged across cortical neurons is primarily low-pass, or only weakly band-pass (Eggermont 1994, 1998; Eggermont and Smith 1995; Schreiner and Urbas 1986). The relatively flat nature of the average rMTF in cortex probably reflects a weak tuning in the rMTFs of many individual neurons (Eggermont 1998; Schreiner and Raggio 1996) but could also be due in part to the summation of activity across sharply tuned units having a wide range of BMFs. Overall, in both IC and HG, changes in onset amplitude as a function of repetition rate were consistent with what might be predicted based on microelectrode recordings in animals of neural spiking in response to amplitude-modulated trains.

RELATIONSHIP TO ELECTRIC RECORDINGS IN HUMANS. A trend of generally greater adaptation at cortical versus brainstem levels of the auditory pathway fits with data concerning two of the most studied components of human auditory evoked potentials: wave V of the brain stem-evoked potential and the long latency potential N1. Wave V is likely generated by neurons projecting to the IC (Melcher and Kiang 1996; Møller 1998), while the primary generators of N1 have been localized to auditory cortex (e.g., Näätänen and Picton 1987