|
|
||||||||
The Journal of Neurophysiology Vol. 88 No. 3 September 2002, pp. 1433-1450
Copyright ©2002 by the American Physiological Society
1Eaton-Peabody Laboratory, Massachusetts Eye and Ear Infirmary, Boston 02114; 2Harvard-Massachusetts Institute of Technology Division of Health Sciences and Technology, Speech and Hearing Bioscience and Technology Program, Cambridge 02139; and 3Department of Otology and Laryngology, Harvard Medical School, Boston, Massachusetts 02115
| |
ABSTRACT |
|---|
|
|
|---|
Harms, Michael P. and
Jennifer R. Melcher.
Sound Repetition Rate in the Human Auditory Pathway:
Representations in the Waveshape and Amplitude of fMRI Activation.
J. Neurophysiol. 88: 1433-1450, 2002.
Sound
repetition rate plays an important role in stream segregation, temporal
pattern recognition, and the perception of successive sounds as either
distinct or fused. This study was aimed at elucidating the neural
coding of repetition rate and its perceptual correlates. We
investigated the representations of rate in the auditory pathway of
human listeners using functional magnetic resonance imaging (fMRI), an
indicator of population neural activity. Stimuli were trains of noise
bursts presented at rates ranging from low (1-2/s; each burst is
perceptually distinct) to high (35/s; individual bursts are not
distinguishable). There was a systematic change in the form of fMRI
response rate-dependencies from midbrain to thalamus to cortex. In the
inferior colliculus, response amplitude increased with increasing rate
while response waveshape remained unchanged and sustained. In the
medial geniculate body, increasing rate produced an increase in
amplitude and a moderate change in waveshape at higher rates (from
sustained to one showing a moderate peak just after train onset). In
auditory cortex (Heschl's gyrus and the superior temporal gyrus),
amplitude changed somewhat with rate, but a far more striking change
occurred in response waveshape
low rates elicited a sustained
response, whereas high rates elicited an unusual phasic response that
included prominent peaks just after train onset and offset. The shift
in cortical response waveshape from sustained to phasic with increasing
rate corresponds to a perceptual shift from individually resolved
bursts to fused bursts forming a continuous (but modulated) percept.
Thus at high rates, a train forms a single perceptual "event," the
onset and offset of which are delimited by the on and off peaks of
phasic cortical responses. While auditory cortex showed a clear,
qualitative correlation between perception and response waveshape, the
medial geniculate body showed less correlation (since there was less
change in waveshape with rate), and the inferior colliculus showed no
correlation at all. Overall, our results suggest a population neural
representation of the beginning and the end of distinct perceptual
events that is weak or absent in the inferior colliculus, begins to
emerge in the medial geniculate body, and is robust in auditory cortex.
| |
INTRODUCTION |
|---|
|
|
|---|
It is well known from human
psychophysical experiments that the perception of a succession of
sounds depends strongly on the rate of sound presentation. For
instance, when bursts of noise are presented repeatedly at a low rate
(e.g., <10/s), each burst can be separately resolved (Miller
and Taylor 1948
; Symmes et al. 1955
). In
contrast, bursts presented at a higher rate fuse to form a single,
modulated percept. In experiments where multiple series of sounds are
presented simultaneously (e.g., a series of high and a series of low
frequency tone bursts), the rate of sound presentation influences
whether the series are perceived as single or separate streams, as well
as the perceived temporal pattern within each stream (Bregman
1990
; Royer and Robin 1986
). The dependencies on
rate observed in controlled psychophysical experiments such as these
suggest that rate plays an important role in the perception of the more
complex acoustic conditions encountered in everyday life.
Since repetition rate plays so basic a role in determining how sounds
are heard, it is not surprising that there have been numerous
neurophysiological studies of rate in animals. Broad trends concerning
the coding of rate in the auditory pathway have emerged from this work.
For instance, the highest repetition rates at which neurons respond
faithfully to each successive sound in a train (or each successive
cycle of amplitude modulated stimuli) tends to decrease from brain stem
to thalamus to cortex (e.g., Creutzfeldt et al. 1980
;
Langner 1992
; Schreiner and Langner
1988
). In cortex, the neural coding of low and high rates may
be accomplished by different populations of neurons, one coding
low-rate stimuli through stimulus-synchronized activity and the other
coding high rates in the overall amount of discharge activity
(Lu and Wang 2000
; Lu et al. 2001
). While
the animal work has shed light on the neural representations of
repetition rate, the degree to which the animal findings extend to
humans remains uncertain because of interspecies differences,
anesthesia differences, and a paucity of data in humans that can serve
as a link to the animal work. In the end, direct neurophysiological
data in human listeners is important if we are to understand how
repetition rate is represented in the activity patterns of the human brain.
Most previous neurophysiological studies of repetition rate in humans
have used noninvasive techniques for probing brain function, such as
evoked potential and evoked magnetic field measurements. The evoked
response work has examined averaged responses at short, middle, and
long latencies to various types of brief stimuli (e.g., clicks, tone
and noise bursts) presented at different rates
(Näätänen and Picton 1987
;
Picton et al. 1974
; Thornton and Coleman
1975
). A particular strength of evoked potential and
magnetic field measurements is that they can be used to examine
responses to individual stimuli within a train up to much higher rates
than with other noninvasive brain imaging techniques (see following
paragraph). A limitation, however, is that the sites of
response generation cannot always be reliably localized. Evoked
magnetic field examinations of repetition rate are further limited in
that they provide information mainly concerning cortical areas because
of inherent limitations in probing subcortical function
(Erné and Hoke 1990
).
Positron emission tomography (PET) and functional magnetic resonance
imaging (fMRI), two techniques for spatially mapping brain activity,
have also been used to examine the dependence of human brain activation
on repetition rate. Compared with evoked potential and magnetic field
measurement, fMRI lacks the temporal resolution needed to separately
resolve the responses produced by individual stimuli in a train (except
at extremely low rates, e.g., approximately 0.1/s), and the temporal
resolution of PET is even less. An important advantage, however, is
that both PET and fMRI enable activation to be directly localized to
brain stem, thalamic, and cortical structures of the auditory pathway
(Griffiths et al. 2001
; Guimaraes et al.
1998
; Lockwood et al. 1999
; Melcher et
al. 1999
). The localization provided by fMRI is particularly precise because of the technique's high spatial resolution and direct
mapping to anatomy. Despite the fact that fMRI and PET can show
activation at different stages of the auditory pathway, previous PET
and fMRI studies varying rate have focused largely on cortical areas
(Binder et al. 1994
; Dhankhar et al.
1997
; Frith and Friston 1996
; Giraud et
al. 2000
; Price et al. 1992
; Rees et al.
1997
; Tanaka et al. 2000
). Additionally, most of
the studies focused on the low "rates" characteristic of speech
(e.g., <2.5 words or syllables/s) because they were directed at
understanding speech processing. Overall, there is limited PET or fMRI
data concerning the representations of rate within the human auditory pathway. Specifically, there is little information concerning the
transformation of rate representations from structure to structure within the pathway for a wide range of psychophysically relevant rates.
The present fMRI study compared the representation of repetition rate across cortical and subcortical structures of the human auditory pathway using a wide range of rates. Stimuli were trains of repeated noise bursts with repetition rates ranging from low (where each burst could be resolved individually) to high (where individual bursts were not distinguishable and the train was perceived as a continuous, but modulated, sound). Noise bursts were chosen as the elemental stimulus based on the assumption that broadband sound would elicit robust responses by activating neurons across a wide range of characteristic frequencies. fMRI was selected for its high spatial resolution, its localizing capabilities, and its higher temporal resolution (approximately 2 s) compared with PET (>10 s). The latter feature proved important because one of the most striking differences in rate representation across structures occurred in the temporal dynamics of the fMRI response.
Portions of this work were presented at the annual meeting of the Society for Neuroscience (1997), the 21st annual meeting of the Association for Research in Otolaryngology (1998), the 4th and 5th International Conferences on Functional Mapping of the Human Brain (1998 and 1999), and in M. P. Harms' doctoral thesis (Massachusetts Institute of Technology, 2002).
| |
METHODS |
|---|
|
|
|---|
Four series of experiments were conducted. The first two examined the effect of repetition rate on the response to a noise burst train in the inferior colliculus (IC), Heschl's gyrus (HG), and the superior temporal gyrus (STG; exp. I), or the IC and medial geniculate body (MGB; exp. II). The remaining experiments (exps. III and IV) were aimed at understanding one of the findings from exp. I, namely an unusual form of temporal response in the cortex to trains with a high repetition rate.
A total of 12 subjects participated in these experiments. They ranged in age from 19 to 35 yr (mean = 25 yr). Ten of the subjects were male. Nine were right-handed. Subjects had no known audiological or neurological disorders.
This study was approved by the institutional committees on the use of human subjects at the Massachusetts Institute of Technology, Massachusetts Eye and Ear and Infirmary, and Massachusetts General Hospital. All subjects gave their written informed consent.
Exps. I and II: noise burst trains with different burst repetition rates
Nine subjects participated in a total of 11 imaging sessions for exps. I and II (exp. I: 5 sessions, subjects 1-5; exp. II: 6 sessions, subjects 2, 5, and 6-9).
The stimuli were bursts of uniformly distributed white noise. Individual noise bursts in all four experiments were 25 ms in duration (full-width half-maximum), with a rise/fall time of 2.5 ms. The bursts were presented at repetition rates of 1, 2, 10, and 35/s (exp. I) or 2, 10, 20, and 35/s (exp. II). The 1/s rate was used in only three of the five sessions of exp. I. The spectrum of the noise stimulus at the subjects' ears was low-pass (6-kHz cutoff), reflecting the frequency response of the acoustic system.
Noise bursts were presented in 30-s trains alternated with 30-s "off" periods, during which no auditory stimulus was presented (Fig. 1, top). Four alternations between "train on" and "off" periods constituted a single scanning "run" (total duration 240 s). For all but two sessions (in exp. I), each of the four rates was presented once during each run, and their order was varied across runs. Within a train, the repeated noise bursts were identical (i.e., "frozen"), but the noise bursts differed across trains and runs. For the other two sessions, the same rate was presented throughout a run, and this rate was varied across runs. For these two sessions, the noise burst was frozen within a run, but differed across runs. In each session, the total number of train presentations at each rate was between 8-13.
|
Exp. III: small numbers of noise bursts
To investigate how the initial bursts of a train contribute to cortical responses to the onset of a train, we examined the responses to a single noise burst and short clusters of noise bursts. Responses were collected in three imaging sessions with three subjects (exp. III; subjects 2, 5, and 10).
Either one noise burst or a cluster of noise bursts (2 or 5) was presented once every 18 s, constituting a single "trial" (Fig. 1, bottom). For the clusters of five noise bursts, the interstimulus interval (ISI, onset-to-onset) between noise bursts was 28.6 ms, equivalent to the ISI for a rate of 35/s. For clusters of two noise bursts, two different ISIs were used: 500 ms (2/s rate) and 28.6 ms (35/s rate). For two sessions, the same stimulus was used in all of the trials for a given run (12 runs total; 270 s per run; 45 total repetitions per trial type). In the third session (subject 10), the stimulus was randomized across trials (7 runs; 288 s per run; 28 repetitions per trial type).
Exp. IV: noise burst trains with different durations
The effect of train duration was examined in two imaging sessions with two subjects (exp. IV; subjects 11 and 12). Trains of four different durations (15, 30, 45, and 60 s) were presented with an "off" period of 40 s following each train. Noise burst repetition rate within each train was always 35/s. Each train duration was presented once per run (8-9 runs; 310 s per run) with the order of durations randomized across runs. Supplementary information concerning the effects of train duration was obtained in two additional experiments that used a single, long-train duration (60 s) and 35/s noise bursts.
Acoustic stimulation
Separately for each ear, the subject's threshold of hearing to
10/s noise bursts was determined in the scanner room. For all experiments, the stimuli were presented binaurally at 55 dB above this
threshold. During both threshold determination and functional imaging,
there was an on-going low-frequency background noise produced primarily
by the pump for the liquid helium (used to supercool the magnet coils)
(Ravicz et al. 2000
). This sound reaches levels of ~80
dB SPL in the frequency range of 50-300 Hz. Additionally during
functional imaging, each image acquisition generated a "beep" of
approximately 115 dB SPL at 1.0 kHz (1.5 T scanner) or ~130 dB SPL at
1.4 kHz (3 T scanner, see Imaging). The stimuli for all
experiments were clearly audible during functional imaging. For the
low-rate trains, an individual burst was occasionally masked by a
coinciding imaging "beep." However, because the imaging was cardiac
gated (see Imaging), this coincidence of a noise burst with
image acquisition occurred only infrequently throughout the low-rate trains.
Noise bursts were delivered through a headphone assembly that provided
approximately 30 dB of attenuation at the primary frequency of the
scanner-generated sounds (1.0 or 1.4 kHz; Ravicz and Melcher 2001
). Specifically, the noise bursts were produced by a
digital-to-analog board (running under LabView), amplified, and
fed to a pair of audio transducers housed in a shielded box adjacent to
the scanner. The output of the transducers reached the subject's ears
via air-filled tubes that were incorporated into sound attenuating earmuffs.
Task
Subjects were instructed to listen to the noise burst stimuli. At the end of each scanning run, subjects reported their alertness on a qualitative scale ranging from 1 (fell asleep during run) to 5 (highly alert). Alertness ratings were almost always in the 3-5 range, and were never 1.1
For exp. II, subjects performed the following task to further ensure that they remained attentive: they indicated whenever they detected an occasional increment or decrement in intensity (of 6 dB, lasting 1 s) by raising or lowering their index finger. Each subject identified more than 90% of the intensity changes.
Imaging
Subjects were imaged using a 1.5 or 3 Tesla whole-body scanner
(General Electric) and a head coil (transmit/receive). The scanners
were retrofitted for high-speed imaging (i.e., single-shot echo-planar
imaging; Advanced NMR Systems, Inc.). Exps. I and II were conducted at 1.5 T. Exps. III and
IV were conducted at 3 T (except for one of the
supplementary sessions of exp. IV). Subjects rested supine
in the scanner. To reduce head motion, a bite bar was custom-molded to
the subject's teeth and mounted to the head coil, or pillow and foam
were packed snugly around the head. Each imaging session lasted
approximately 2 h and included the following procedures:
1)
Contiguous sagittal images of the whole head were acquired.
2)
An automated, echo-planar-based shimming procedure was performed to increase magnetic field homogeneity within the brain regions to be functionally imaged (Reese et al. 1995
).
3)
The brain slice to be functionally imaged was selected using the sagittal images as a reference. For exps. I, III, and IV, the selected (near-coronal) slice intersected the IC and the posterior aspect of HG and STG in both hemispheres (Fig. 2, left and middle). When multiple transverse temporal gyri were present, the anterior one was intersected. Based on these criteria, we expect that a portion of primary auditory cortex was intersected in both hemispheres of all subjects for exps. I, III, and IV (Rademacher et al. 1993
, 2001
). For exp. II, the slice intersected the IC and MGB (located just ventral and lateral to the cerebral aqueduct; Fig. 2, right). A single slice, rather than multiple slices, was imaged in all experiments to reduce the impact of scanner-generated acoustic noise on auditory activation.2
4)
A T1-weighted, high-resolution anatomical image was acquired of the selected brain slice for subsequent overlay of the functional data [thickness = 7 mm; in-plane resolution = 1.6 × 1.6 mm; TR = 10 s, TI = 1200 ms, TE = 40 ms (exps. I and II) or 57 ms (exps. III and IV)]. A second high-resolution anatomical image was acquired at the end of the session after functional imaging. A comparison of the initial and final T1 images allowed for a gross check of subject movement over the session.
5)
Functional images of the selected slice were acquired using a blood oxygenation level-dependent (BOLD) sequence (sessions at 1.5 T: asymmetric spin echo, TE = 70 ms,
offset =
25 ms, flip = 90°; sessions at 3 T: gradient echo, TE = 40 or 30 ms, flip = 90 or 60°). Slice thickness was 7 mm. In-plane resolution was 3.1 × 3.1 mm. The beginning of each scanning "run" included four discarded images to ensure that image signal level had approached a steady state. During the remainder of the run, functional images of the selected slice were acquired repeatedly (Fig. 1).
|
Functional imaging was performed using a cardiac gating method
that increases the detectability of activation in the inferior colliculus (Guimaraes et al. 1998
) [except for one
session of exp. III that used a fixed interimage interval
(TR) of 2 s]. Image acquisitions were synchronized to every other QRS
complex in the subject's electrocardiogram, and the interimage
interval (TR) was recorded. The average TR across all sessions was
2,035 ms (the average within a session varied from 1,521 to 2,650 ms). Fluctuations in heart rate lead to variations in TR that result in
image-to-image variations in image signal strength (i.e., T1 effects).
Using the measured TR values, image signal was corrected to account for
these variations (Guimaraes et al. 1998
).
Analysis
IMAGE PREPROCESSING.
The images for each scanning run were corrected for any movements of
the head that may have occurred over the course of the imaging session.
Each functional image of a session was translated and rotated to fit
the first image of the first functional run using standard software
(SPM95; without spin history correction; Friston et al. 1995
,
1996
). Because only one functional slice was acquired, these
corrections for motion were necessarily limited to adjustments within
the imaging plane. In most cases, the motion correction algorithm was
well-behaved and resulted in an improvement in image alignment.
However, for one session, the algorithm introduced some clearly
artifactual movement, so the premotion corrected data were utilized.
Additionally, we did not include the MGB of one subject in the
analysis, because the image translations calculated by the
motion-correction algorithm were smaller than the movement evident at
the location of the MGB in the T1 anatomical images acquired pre- and
post- functional imaging. A similar discrepancy did not occur for the
IC of this subject, so the IC data were included. The images for each
run were further processed in two ways to enhance the likelihood of
detecting activation. 1) Image signal versus time for each
voxel was corrected for linear or quadratic drifts in signal strength
over each run (i.e., drift-corrected). 2) Image signal
versus time for each voxel and run was normalized such that the
time-average signal had the same (arbitrary) value for all voxels and
runs. (Specifically, the signal vs. time data were ratio normalized
based on the intercept of a least-square quadratic fit to the data).
This normalization was done to eliminate artificial discontinuities in
the signal level between runs in the subsequently concatenated data.
All subsequent analyses were performed on the drift-corrected,
normalized images.
GENERATING ACTIVATION MAPS.
Maps of activation to noise burst trains (exps. I, II, and
IV) were derived as follows. First, each image was assigned
to either a "train on" or "off" period. Stimulus-evoked changes
in image signal typically have a delay of 4-6 s (Bandettini et
al. 1993
; Buckner et al. 1996
; Kwong et
al. 1992
). To account for this (hemodynamic) delay, the first
three images taken after the onset of a noise burst train were assigned
to the preceding "off" period, and the first three images after the
train offset were assigned to the preceding "train on" period. For
each rate, the images assigned to each "train on" period and its
following "off" period were concatenated into a single file. Image
signal strength during train on versus off periods was then compared
for each voxel using an unpaired t-test (Press et al.
1992
). The P value result of this statistical test
was plotted for each voxel with P
0.01 to yield a
spatial map of activation. P values were not corrected to
account for the correlated nature of fMRI time-series (Purdon
and Weisskoff 1998
), nor were they adjusted for the repeated application (voxel-by-voxel) of a statistical test (Friston et al. 1994
).
DEFINING REGIONS OF INTEREST. Responses were analyzed quantitatively within four anatomically defined regions of interest (ROIs): the IC, MGB, HG, and STG. These ROIs, which were defined separately for each hemisphere, were first identified in the high-resolution anatomical images (in-plane resolution: 1.6 × 1.6 mm) of the functional imaging plane. These "high resolution" ROIs were down-sampled to the same resolution as the functional images (3.1 × 3.1 mm), to yield the ROIs used for all subsequent analyses. The ROI borders were defined as follows:
IC. In exps. I, III, and IV, the IC were readily identified as distinct anatomical circular areas (e.g., Fig. 3). For exp. II, only the caudal edge of the IC were distinguishable (e.g., Fig. 6), so the area of each IC ROI was defined as a circle sized to fit this visible edge. The IC ROIs were defined liberally to include voxels at the edge of the IC.
|
CALCULATING RESPONSE TIME COURSES. The time course of response was computed for specific voxels within each ROI. For exps. I and II, the voxels (3.1 × 3.1 × 7 mm) were chosen based on the activation maps for a particular "reference rate" (i.e., the rate that typically produced the strongest activation in the maps): 35/s for IC, 20/s for MGB, 10/s for HG, and 2/s for STG. For each IC and MGB ROI, we used the single voxel with the lowest P value at the reference rate. For each HG and STG ROI, we averaged the responses of the four voxels (not necessarily contiguous) with the lowest P values at the reference rate.3 Note that for a given structure, session, and hemisphere, the same voxels were used in computing the response time course at each rate. For exps. III and IV, which focused on auditory cortex, the same number of lowest P value voxels (four in each hemisphere) were selected for analysis within the HG and STG ROIs. However, the activation map for selecting voxels was based on a single run of music (4 repetitions of the first 30 s of the fourth movement in Beethoven's Symphony No. 7). Music was used because 1) it typically evokes larger magnitude cortical responses than noise burst trains, so robust cortical activation maps could be obtained with a single run, thereby allowing more time for collecting responses to the primary stimuli of interest in exps. III and IV, and 2) the dominant amplitude modulation frequencies of the music stimulus (<5 Hz) were comparable with the "reference rates" for auditory cortex.4
For exps. I and II, response time courses were computed as follows. Because cardiac gating results in an irregular temporal sampling, the time series for each imaging "run" and voxel was linearly interpolated to a consistent 2-s interval between images, using recorded interimage intervals to reconstruct when each image occurred. These data were then temporally smoothed using a three-point, zero-phase filter (with coefficients 0.25, 0.5, 0.25). A response "block" was defined to include the 10 s prior to a noise burst train, the period coinciding with the train, and the off period following the train. These response blocks were averaged according to rate to give an average signal versus time waveform for each rate, session, and hemisphere. The signal at each time point was then converted to a percent change in signal relative to a baseline. The baseline was defined as the average signal from t =
6
to 0 s, with time t = 0 s corresponding to
the onset of the noise burst train. Finally, for each rate, the percent
change time courses were averaged across sessions and
hemispheres.5 For
exp. IV, response time courses were calculated the same way, except response blocks were averaged according to train duration.
For exp. III, time courses for each stimulus were computed
as described above, with the following exceptions: 1) no
temporal smoothing was applied (to avoid disproportionally altering the responses, which were expected to be brief in duration) and
2) the baseline signal level for converting time courses to
percent change was based on the average of just two time points,
t =
2 to 0 s. [This was done because the "off"
period between stimuli (18 s) was less for this experiment than for the
others (30 or 40 s) and we wanted to avoid including time points
where the response may have not yet returned to baseline from the
preceding stimulus.]
QUANTIFYING RESPONSE MAGNITUDE. For exps. I and II, response magnitude in each auditory structure was quantified using two measures computed from the percent change time courses. "Time-average" percent change, a measure of the overall response strength during a noise burst train, was computed as the mean percent change from t = 4 to 30 s. "Onset" percent change, a measure of the response amplitude near the beginning of a noise burst train, was computed as the maximum percent change from t = 4 to 10 s. Since "time-average" and "onset" percent change were calculated from the percent change time courses, they indicate image signal deviations relative to a 6-s baseline immediately preceding the stimulus (i.e., the baseline period used in calculating the time courses). For exp. III, peak percent change was defined as the maximum value in the percent change time courses.
| |
RESULTS |
|---|
|
|
|---|
Response to noise burst trains: effect of burst repetition rate
INFERIOR COLLICULUS. Activation maps for the IC showed an increase in activation with increasing burst repetition rate. Figure 3 demonstrates this increase for two sessions from exp. I. The maps show activation that is greatest at 35/s, less at 10/s, and absent at 2/s. (Activation strength is reflected in the maps as a lower P value from the statistical comparison of image signal level during train "on" and "off" periods). Greater IC activation at higher rates is also demonstrated by the maps in Fig. 6, which correspond to two sessions from exp. II.
Figure 4 (left) shows the time course of the responses in the IC averaged across all sessions. At all rates, the response was "sustained" in that image signal increased when the noise burst train was turned on, remained elevated while the train was on, and decreased once the train was turned off. The amplitude of the sustained response during the "train on" period increased with increasing rate.
|
|
MEDIAL GENICULATE BODY. In contrast with the IC, activation maps for the MGB usually showed a nonmonotonic change in activation with rate. The trend for the MGB is illustrated by the maps for two sessions in Fig. 6. The maps show an increase in MGB activation with increasing rate in the 2/s-20/s range, but a decrease from 20/s to 35/s.
|
HESCHL'S GYRUS AND SUPERIOR TEMPORAL GYRUS. A nonmonotonic relationship between rate and activation was apparent in the activation maps for HG and STG (Fig. 7). The maps showed an activation increase from 1/s to 2/s, and a decrease from 10/s to 35/s.
|
Response to small numbers of noise bursts
To investigate how the initial bursts of a train contribute to cortical responses at train onset, we examined the responses to a single noise burst, and clusters of two or five noise bursts with a burst-to-burst ISI of 28.6 ms (35/s rate) or 500 ms (2/s rate). Both single and clustered noise bursts elicited measurable responses in HG and STG. The responses, averaged across subjects and hemispheres, peaked 4-6 s after the stimulus, and then returned to baseline by 8-10 s (Fig. 8, top). After 8-10 s, the average response dipped below baseline. However, this response feature, unlike the others, was dominated by the data for only one of the three subjects (subject 2).
|
Figure 8 (bottom) shows normalized peak response versus number of noise bursts for each subject and hemisphere. These normalized responses were quantified as the peak percent signal change in the response time course (which always occurred at t = 4 or 6 s), divided by the peak percent change for a single noise burst. The normalized peak response generally increased with increasing number of noise bursts (Fig. 8, bottom). However, the response increase was always less than would be predicted by a model in which each successive noise burst evokes a response equivalent to the 1 NB response and the responses to each burst add (i.e., linear growth). Similarly, for every subject and hemisphere, the peak response to 5 NBs@35/s was <2.5 times the response to 2 NBs@35/s. These results are consistent with a model in which the responses to noise bursts at the beginning of a train are greater than those occurring later. The fact that the peak response for 2 NBs@2/s was always greater than for 2 NBs@35/s indicates that any decline in response from the first burst to the second was greater at high, compared with low, rates.
We compared the mean peak percent change for single and multiple noise bursts with the mean onset percent change for 35/s trains from exp. I to gain an appreciation for the proportion of the on-peak accounted for by the earliest noise bursts in the train.7 In STG, we estimated that the mean peak percent change for 1 NB and 5 NBs@35/s was approximately 40% and 65%, respectively, of the mean onset percent change. In HG, the corresponding estimates were approximately 25% and 40%. These values indicate that the earliest noise bursts of a high-rate train account for a substantial portion of the on-peak, especially in STG.
Response to high-rate (35/s) noise burst trains: effect of train duration
By considering noise burst trains with different durations, we tested whether the off-peak in cortical phasic responses is specifically linked to train termination. Two subjects were studied using 35/s noise burst trains with durations of 15, 30, 45, and 60 s. For both subjects and all durations, HG responses showed a distinct off-peak after train offset (Fig. 9, top). Regardless of train duration, the off-peak occurred approximately 6 s after train offset, indicating a strong coupling between off-peak and train termination. A similarly strong coupling between off-peak and train termination was also found in STG for both subjects (not shown). One subject (subject 11) was unusual in that the response in STG did not show a clear off-peak for voxels selected by our standard criteria. Nevertheless, there was a clear off-peak for other, nearby voxels, and this off-peak always occurred approximately 6 s after train offset, regardless of train duration. In contrast to cortex, IC responses were largely sustained for all durations and showed no sign of an off-peak (Fig. 9, bottom).
|
Data for two additional subjects further support the strong coupling between cortical off-peak and train termination. These subjects, tested with a single train duration of 60 s, showed off-peaks in both HG and STG that occurred approximately 6 s after train offset. All of the train duration data taken together indicate that the cortical off-peak is specifically evoked by the termination of high-rate noise burst trains.
| |
DISCUSSION |
|---|
|
|
|---|
fMRI responses to trains of noise bursts changed substantially with burst repetition rate in every studied structure, although the nature of the changes was highly structure-dependent. In the IC, response amplitude following train onset increased with increasing rate while response waveshape remained unchanged (i.e., sustained). In the MGB, increasing rate produced an increase in onset amplitude up to a point where a further increase in rate instead produced a change in waveshape (from a largely sustained response to one showing a distinct peak just after train onset). In HG, the site of primary auditory cortex, onset amplitude changed somewhat with rate, but the most striking change occurred in response waveshape. At low rates the waveshape was sustained, while at high rates it was strongly phasic in that there were prominent response peaks just after train onset and offset. In STG, which includes secondary auditory areas, onset amplitude showed no systematic dependence on rate, whereas response waveshape showed a strong and dramatic rate dependence paralleling that in HG. Overall, from midbrain to thalamus to cortex, there was a systematic shift in the form of response rate-dependencies from one of amplitude to one of waveshape.
Sustained response waveshapes (as seen in subcortical structures and
for low rates in auditory cortex) are commonly reported in the fMRI
literature. In contrast, phasic responses (as seen for higher rates in
auditory cortex) are not, nor are their individual signature features.
One signature feature
the prominent peak following stimulus onset
has
been reported for a few prolonged acoustic, odorant, and visual stimuli
(Bandettini et al. 1997
; Giraud et al.
2000
; Jäncke et al. 1999
; Sobel et
al. 2000
) but is nevertheless a fairly uncommon feature for
responses in the fMRI literature. A second signature feature of phasic
responses
the peak following stimulus offset
is highly unusual. To
our knowledge, the only other reported fMRI "off-response" occurred
in a subregion of primary visual cortex following the transition from
steady white light to darkness (Bandettini et al. 1997
).
The paucity of previous reports of phasic fMRI responses may be partly
an issue of detection since phasic responses are poorly detected by
some of the most commonly-used analysis approaches (e.g., a
t-test comparison of stimulus "on" and "off"
periods, or equivalently, correlation or analyses using the SPM
software package that assume a sustained response; Bandettini et
al. 1993
; Sobel et al., 2000
). It is also possible that phasic responses have not been seen because they reflect
neurophysiological mechanisms that are only invoked in particular,
largely unexplored stimulus regimes.
It is widely assumed that different sound features (e.g., frequency,
bandwidth, repetition rate) are represented in the amplitude of fMRI
activation or amplitude variations with position (e.g., Giraud
et al. 2000
; Talavage et al. 2000
;
Wessinger et al. 2001
; Yang et al. 2000
).
In contrast, the possibility of representations in the
temporal dimension is not generally entertained, and this makes the wide variations in cortical response waveshape of this study
especially intriguing. A few other studies have also reported covariations between sound characteristics and temporal fMRI activation patterns in the auditory system. For instance,
Gaschler-Markefski et al. (1997)
examined the degree of
temporal stationarity of auditory cortical fMRI responses and reported
regional variations depending on stimulus and task. In studying fMRI
responses to amplitude modulated noise, Giraud et al.
(2000)
found an increasingly prominent peak at stimulus onset
with increasing modulation rate, a result that strongly parallels the
findings of the present study (see Comparison to previous fMRI
and PET studies). The present study and these previous reports
suggest that fMRI temporal patterns
or more specifically the temporal
variations in neural activity underlying these patterns
may be an
important way in which sound is represented in the auditory system.
Role of rate per se in determining fMRI responses
In this study, noise burst duration was held constant while rate
was varied, so overall stimulus energy and sound-time fraction (STF)
covaried with rate (resulting in an approximately 12-dB differential in
sound pressure level for 2/s vs. 35/s noise burst trains). While this
raises the possibility that the wide range of response waveshapes in
auditory cortex was due primarily to changes in parameters other than
rate, we do not believe this to be the case for two reasons. First, in
a separate study, we have found that varying the intensity of 2/s or
35/s noise bursts over a 20- to 30-dB range has no effect on response
waveshape (Harms et al. 2001
). Second, we have found
that changing rate from 2/s to 35/s while holding STF constant (and
therefore varying burst duration) still produces a change in waveshape
from sustained to phasic (although STF does have some influence on
response waveshape; Harms 2002
; Harms and Melcher
1999
). In the case of response amplitude, the precise rate
dependencies might be somewhat different if stimulus energy and STF
were held constant instead of burst duration, because varying energy
alone can produce changes in response amplitude (Hall et al.
2001
; Sigalovsky et al. 2001
), as may also be
the case for changes in STF.
fMRI responses and underlying neural activity
To understand the significance of the different fMRI response
waveshapes, it is necessary to first consider the extent to which
waveshape is governed by neural, metabolic, and hemodynamic factors.
While the relationship between neural activity and fMRI responses is
not fully understood, it is generally accepted that neural activity and
image signal are ultimately linked through a chain of metabolic and
hemodynamic events. For the form of fMRI in this study (BOLD fMRI),
this linkage is as follows. When there is an increase in neural
activity in the form of synaptic events or neural
discharges,8 there
is a corresponding increase in local brain metabolism and oxygen
consumption (Sokoloff 1989
). The increase in oxygen
consumption is accompanied by an increase in blood flow and blood
volume in the active brain region. However, the increase in flow
dominates, such that the local concentration of deoxygenated hemoglobin
actually decreases, which is important because deoxygenated hemoglobin is paramagnetic (Pauling and Coryell 1936
) and thus
influences local image signal levels. The net effect of a decrease in
deoxygenated hemoglobin is an increase in image signal. When the entire
chain of events is considered together, increases and decreases in
neural activity result in concordant changes in image signal strength (Kwong et al. 1992
; Ogawa et al. 1993
;
Springer et al. 1999
). Since hemodynamic changes occur
over the course of seconds, fMRI effectively provides a temporally
low-pass filtered view of neural activity. More specifically, since
fMRI is sampling activity over small volumes of brain (i.e., voxels),
the responses can be thought of as showing the time-envelope of
population neural activity on a voxel-by-voxel basis.
Previous work has shown that the relative timing and magnitude of
stimulus-evoked changes in blood flow, blood volume, and oxygen
consumption can influence the waveshape of the fMRI response (Buxton et al. 1998
; Mandeville et al.
1998
). While this raises the possibility that changes in
waveshape from sustained to phasic reflect changes in hemodynamics
rather than underlying neural activity, we believe this to be unlikely
for both of the main components of the phasic response, namely the
off-peak and the on-peak. It is particularly unlikely that a
hemodynamic explanation accounts for the off-peak. Previous hemodynamic
modeling and experimentation has not predicted an off-peak following
stimulus termination, and we know of no plausible model that could
generate such a component. Therefore the emergence of an off-peak with
increasing repetition rate in auditory cortex is almost certainly
attributable to a rate-dependent increase in neural activity at
stimulus offset.
The other major feature that distinguishes phasic from sustained
responses, namely the sharp decline in signal that forms the prominent
onset peak, requires more detailed consideration because it is known
that declines in signal can theoretically occur over the course of a
prolonged stimulus for completely hemodynamic reasons (Buxton et
al. 1998
). However, measurements of BOLD signal, blood flow,
and blood volume responses have failed to illustrate a case for which
purely hemodynamic features could generate a signal decline as dramatic
as those seen here (e.g., Hoge et al. 1999a
;
Mandeville et al. 1999
). Additionally, separate evidence works against the idea that the signal decline is driven primarily by
hemodynamic rather than neural influences. The reasoning follows from
the fact that the same voxels were capable of showing either a phasic
response (and the associated dramatic signal decline) or a sustained
response depending on the stimulus. Since the time course of the phasic
and sustained responses is very similar over the first 6-8 s, the
"operational history" of the hemodynamic system is presumably
similar as well. In light of this common initial response, a
hemodynamic system that could subsequently generate grossly different
response waveshapes seems unlikely, unless the differences reflect
differences in underlying neural activity.
While response waveshape varied with rate within a structure, it also
varied across structures for a given rate. It is known that there can
be spatial heterogeneity in tissue hemodynamics (Chen et al.
1998
; Davis et al. 1998
), so the possibility
that regional variations in hemodynamics play some role in the
waveshape differences across structures cannot be discounted. Still,
the heterogeneities in hemodynamics that have been documented are not
sufficient to account for the dramatic waveshape changes that occur
across the pathway as a whole (from the inferior colliculus to cortex).
fMRI response onset and neural adaptation
Given that fMRI responses reflect the time-envelope of population neural activity, the prominent declines in fMRI signal that occur at high rates in MGB, HG, and STG provide clear evidence for an overall decline in neural activity during the first seconds of a train (<10 s). This decline likely includes decreases in synaptic, as well as discharge activity, since both forms of activity can be reflected in fMRI signals.
While an overall decrease in neural activity early in high-rate trains
is clear, the subsecond temporal details of activity during this
decrease remain unresolved because fMRI provides a low-pass filtered
view of neural activity. Previous electrophysiological data suggest
various possible forms for the temporal details underlying the overall
decline in neural activity. For instance, it may be that the fMRI
signal decline reflects a burst-to-burst adaptation in neural activity
in which each successive burst early in a train elicits progressively
less activity (Fruhstorfer et al. 1970
; Ritter et
al. 1968
; Roth and Kopell 1969
). Alternatively,
a variant of this may occur in which activity does not always decrease
in a strictly progressive fashion across consecutive bursts but
sometimes shows an increase from one burst to the next (e.g.,
facilitation or enhancement; Brosch et al. 1999
;
Budd and Michie 1994
; Loveless et al.
1989
). (As long as decreases from burst to burst occur more
often than increases, the time-envelope of neural activity would still
decrease.) Another possibility is that population neural activity is
not synchronized to individual bursts (Lu and Wang 2000
;
Lu et al. 2001
) but instead occurs in response to the train as a whole with an initial peak in activity followed by a lower
level of activity. All of these possibilities would result in a decline
in the time-envelope of population neural activity and are therefore
consistent with the prominent declines seen in fMRI signal.
While we cannot conclusively determine the temporal details of activity
during the declines in fMRI signal, it is worth recognizing that
aspects of our data are consistent with the idea that there is a
burst-to-burst adaptation in neural activity. For instance, the fMRI
responses to small numbers of noise bursts are suggestive of an
adaptation process because the fMRI response amplitude did not increase
in proportion to the number of bursts, but rather showed a slower than
linear growth. Whether neural activity and fMRI signal are coupled in
an approximately linear manner (and under what circumstances) is an
open question under active investigation (Boynton et al.
1996
; Dale and Buckner 1997
; Gratton et
al. 2001
; Hoge et al. 1999b
; Logothetis
et al. 2001
; Vazquez and Noll 1998
). However, if
they are, the slower than linear growth in the fMRI responses to small
numbers of noise bursts suggests that the neural activity produced by
each successive burst in a train is progressively less (i.e., there is
burst-to-burst adaptation in neural activity). If the decline in neural
activity from burst to burst were to continue until there is little or
no burst-evoked activity, the time-envelope of neural activity would
decline substantially following the onset of the train, and so
correspondingly would the fMRI signal (as observed for high-rate
trains). Another aspect of the data consistent with neural adaptation
is the growth in onset amplitude with rate. If neural activity and fMRI
response amplitude vary in direct proportion, onset amplitude may be
viewed roughly as an indicator of the time-average neural activity
during the first seconds of a train. If there were no adaptation and
each successive burst in a train produced an identical increase in neural activity, the time-average neural activity during the first seconds of a train would increase in direct proportion to rate, and
onset amplitude would be expected to do the same. Instead, a
proportional increase in onset amplitude did not occur in any structure. This is most obvious at high rates in MGB, HG, and STG,
where onset amplitude either declined or did not change with increasing
rate.9 However, it
can also be seen at lower rates and in the IC. For instance, an
increase in rate from 2/s to 10/s increased onset amplitude by less
than twofold in every structure, well short of the fivefold increase
expected if growth were proportional to rate. This result is consistent
with neural adaptation occurring in all of the studied structures, even
the IC where fMRI response waveshapes are sustained and do not
immediately suggest an underlying adaptation.
Looking across structures, the data indicate that any neural adaptation increased with increasing position in the pathway. For instance, at any given rate, the percentage decline in signal following the on-peak increased progressively from IC to MGB to auditory cortex (HG and STG), suggesting an increasing degree of adaptation in the underlying population neural activity. An increase in adaptation across structures is also suggested by the fact that the growth in onset amplitude with rate falls increasingly short of predictions assuming no adaptation. For instance, the increase in onset amplitude from 2/s to 10/s falls increasingly short of the fivefold increase predicted in the absence of adaptation as one moves from IC (1.69) to MGB (1.42) to auditory cortex (HG, 1.26; STG, 0.92). Similarly, the increase in onset amplitude from 10/s to 35/s falls increasingly short of the 3.5-fold prediction [IC, 1.42; MGB, 1.36; auditory cortex, 0.80 (HG), 0.99 (STG)]. Thus if there is burst-to-burst adaptation in population neural activity early in a train, it increases from IC to MGB to auditory cortex.
RELATIONSHIP TO ELECTROPHYSIOLOGICAL DATA IN ANIMALS.
A trend of increasing adaptation with increasing position in the
auditory pathway has emerged from several animal neurophysiological studies explicitly designed to compare responses across structures. For
instance, microelectrode recordings of the response to paired stimuli
with different interstimulus intervals have shown an increase in
recovery time with increasing level in the auditory pathway. In
unanesthetized animals (cats and rabbits), the average interval required for 50% recovery of the response to the second of two clicks
is 2 ms in the auditory nerve, cochlear nucleus, and superior olivary
complex, but 7 ms in the inferior colliculus and 20 ms in auditory
cortex (Fitzpatrick and Kuwada 1999
). In unanesthetized guinea pig, Creutzfeldt et al. (1980)
recorded responses
to amplitude modulated tones simultaneously from thalamic and cortical
neurons (specifically 9 thalamo-cortical unit pairs for which the
correlation of spontaneous activity suggested a direct synaptic
connection). Activity in the cortical neurons declined more rapidly
over successive cycles of the AM tone than did the activity in the
thalamic neurons, indicating greater adaptation in the cortical
neurons. Finally, recording near-field potentials from the IC and
auditory cortex in response to brief noise bursts in unanesthetized
chinchilla, Burkard et al. (1999)
found that the mean
response amplitude (averaged across noise burst presentations)
decreased more in cortex than IC as repetition rate was increased.
Their results are again consistent with greater adaptation in cortex
than lower structures in the auditory pathway.
35/s (the highest rate employed), but peaked in HG at a lower rate
(10/s). While the variation in onset amplitude with rate in HG was
rather small (and in STG, onset amplitude did not vary at all), a
similarly weak "tuning" also holds in population neural activity,
in that the rMTF averaged across cortical neurons is primarily
low-pass, or only weakly band-pass (Eggermont 1994RELATIONSHIP TO ELECTRIC RECORDINGS IN HUMANS.
A trend of generally greater adaptation at cortical versus brainstem
levels of the auditory pathway fits with data concerning two of the
most studied components of human auditory evoked potentials: wave V of
the brain stem-evoked potential and the long latency potential N1.
Wave V is likely generated by neurons projecting to the IC
(Melcher and Kiang 1996
; Møller 1998
),
while the primary generators of N1 have been localized to auditory
cortex (e.g., Näätänen and
Picton 1987![]()