Selective auditory attention powerfully modulates neural activity in the human auditory cortex (AC). In contrast, the role of attention in subcortical auditory processing is not well established. Here, we used functional MRI (fMRI) to examine activation of the human inferior colliculus (IC) during strictly controlled auditory attention tasks. The IC is an obligatory midbrain nucleus of the ascending auditory pathway with diverse internal and external connections. The IC also receives a massive descending projection from the AC, suggesting that cortical processes affect IC operations. In this study, 21 subjects selectively attended to left-ear or right-ear sounds and ignored sounds delivered to the other ear. IC activations depended on the direction of attention, indicating that auditory processing in the human IC is not only determined by acoustic input but also by the current behavioral goals.
There has been a long-lasting debate on whether attention modulates processing of sounds as early as in the subcortical structures of the auditory pathway (Hernández-Peón et al. 1956; Jane et al. 1965; Lukas 1980; Maison et al. 2001; Michie et al. 1996; Näätänen 1992; Oatman and Anderson 1977; Ryan and Miller 1977). Although functional MRI (fMRI) studies in humans have ascertained that attention has a strong effect on cortical activity elicited by sounds (Grady et al. 1997; Jäncke et al. 1999; Johnson and Zatorre 2006; Petkov et al. 2004; Rinne et al. 2005, 2007), there is no unambiguous evidence that attention modulates subcortical auditory activity. Here we focus on the bilateral inferior colliculus of the midbrain where the ascending parallel auditory pathways converge before continuing to the auditory cortex (AC) via the thalamus. The inferior colliculus (IC) also receives substantial corticofugal projections from the AC (Winer 2006). The functional significance of the corticofugal pathway is not well understood, but stimulation of cortical neurons has been shown to modulate response properties of IC neurons (Suga and Ma 2003). Because the IC has been implicated in attention in some animal studies (Jane et al. 1965; Ryan and Miller 1977), it is possible that the corticofugal projections serve to mediate attention effects to the IC (Suga and Ma 2003).
We presented asynchronous left and right ear broadband noise bursts to our human subjects at a rapid rate (Fig. 1A). Temporal regularity of the noise bursts was manipulated so that there was a distinct pitch difference between the left and right ear sounds (Griffiths et al. 2001). Subjects were required to selectively attend to sounds at the designated ear to detect frequent (about once per second) pitch increases or decreases among the attended sounds and to indicate, by pressing one of the two buttons, the direction of the pitch change. This paradigm allowed us to compare brain activations during Attend Left (ignore right) and Attend Right (ignore left) conditions with similar (broadband) acoustic inputs. Based on previous studies using similar attention conditions and fast-rate sound streams (Alho et al. 1999; Woldorff and Hillyard 1991), we hypothesized that AC and IC activations would be enhanced in the hemisphere contralateral to the attended ear.
Subjects (14 women and 14 men) were 18–50 yr of age (mean, 27 yr) with normal hearing. Six subjects were excluded because of poor task performance and one subject because of excessive head motion during fMRI scanning. The included subjects (9 women and 12 men; 2 left-handed) were 21–35 yr of age (mean, 27 yr). Informed written consent was obtained from each subject before the experiment. The study protocol was approved by the Ethical Committee of the Hospital District of Helsinki and Uusimaa, Finland.
Stimuli and tasks
Auditory stimuli consisted of bursts of iterated rippled noise (IRN, generated by iteratively adding delayed broadband noise, 16 iterations, delays of 0.5–10 ms corresponding to pitches from 2,000 to 100 Hz, duration 100 ms). IRN sounds have a broadband frequency spectrum but possess a relatively salient pitch (Griffiths et al. 2001). Asynchronous left and right ear sequences of IRN sounds were presented throughout all task blocks in the experiment. Within a sequence, sound onset-to-onset intervals varied randomly between 200 and 390 ms. Each sound sequence consisted of four low- (median pitch corresponding to ∼115 Hz), four intermediate- (190 Hz), or four high-pitch (800 Hz) sounds so that, within a task block, the left ear stream consisted of sounds from one pitch group while the right ear received sounds from another group. Note that, despite the pitch variation, the sounds were similar in (broadband) frequency content. Each sequence was ordered so that one pitch was repeated two to five times, after which the pitch slightly increased or decreased. Subjects were required to attend to the sounds presented to the designated ear, detect pitch increases and decreases (target) in the attended sounds, and indicate the direction of the pitch change by pressing left (decrease) or right (increase) buttons with their right hand. The sounds were presented in 20-s blocks alternating with 14-s silent breaks. During the breaks, the subjects focused on a fixation mark (x) presented in the middle of a screen (viewed through a mirror fixed to the head coil) and waited for the next task. An arrow pointing to the left or right instructed the subjects to focus their attention correspondingly (but to maintain fixation). An arrow replaced the fixation mark 2 s before the onset of the next sound block. The order of attend left and attend right conditions was randomized.
The auditory stimuli were delivered with an UNIDES ADU2a audio system (Unides Design, Helsinki, Finland) via plastic tubes through a porous EAR-tip (ER3, Etymotic Research) acting as an earphone. The noise of the scanner was attenuated by the EAR-tip earplugs, headphones, and viscoelastic mattress inside and around the headcoil and under the subject. The experiment was performed with Presentation software (Neurobehavioral Systems, Albany, CA).
Analysis of the behavioral data
Mean hit rates and reaction times were calculated separately for both attention conditions. Correct responses occurring between 200 and 1,200 ms from target onset were accepted as hits, and responses outside this time window were counted as false alarms. Hit rate (HR) was defined as the number of hits divided by the number of targets. Relative number of false alarms (FaRs) was obtained by dividing the number of false alarms by the number of all responses. Mean reaction time was calculated only for hits. The task was designed to manipulate subject's attention between the left and right ear sounds and was intentionally made difficult. Thus we expected relatively low HRs. The subject was excluded if HR was <50% or FaR was >15%.
fMRI data acquisition and analysis
fMRI data were acquired with a 3.0-T GE Signa system retrofitted with an Advanced NMR operating console and a quadrature birdcage coil. Functional images were acquired using a T2*-weighted gradient-echo echo-planar (GE-EPI) sequence (TR = 1,000 ms, TE = 32 ms, flip angle = 90°, voxel matrix = 96 × 96, FOV = 22 cm, slice thickness = 6 mm with 1-mm gap, in-plane resolution = 2.29 × 2.29 mm). To reduce the acoustic noise associated with imaging, only a limited brain area consisting of three consecutive slices was imaged at relatively long interimage intervals (∼4–5 s). In addition, the acoustic noise was reduced by switching off the helium cooling pump before the functional scans. The first EPI slice was carefully oriented according to an anatomical scout image (sagittal slices, slice thickness = 3 mm, in-plane resolution = 0.94 × 0.94 mm) to include the IC and AC (posterior Heschl's gyrus) bilaterally. The other two slices were situated rostral to the first slice (data from these slices are not reported here). To minimize motion artifacts caused by cardiac pulsations, the acquisition was triggered by the cardiac cycle using the first pulse after 4 s (Guimaraes et al. 1998). The functional scan lasted for 30 min resulting in ∼415 images. At the end of the session, a fluid-attenuated inversion recovery (FLAIR; TR = 10,000 ms, TE = 120 ms, voxel matrix = 320 × 192, NEX = 2.0, FOV = 22, slice thickness = 6 mm with 1-mm gap, in-plane resolution = 0.688 × 1.145) image covering the whole brain was acquired. This image contained the same slices that were used in the functional scan and was used for anatomical alignment.
Global voxel-vise analysis was performed using the tools developed by the Analysis Group at the Oxford Centre for Functional MRI of the Brain (FMRIB) and implemented within FMRIB's software library (FSL, release 3.3, www.fmrib.ox.ac.uk/fsl, Smith et al. 2004). Functional data were resliced into higher-resolution space (voxel matrix = 128 × 128, in-plane resolution = 1.72 × 1.72 mm), motion corrected, high-pass filtered (cut-off 90 s), and spatially smoothed (Gaussian kernel of 2-mm full-width half-maximum). First-level statistical analysis was carried out using FMRIB's improved linear model (general linear model, GLM). Each functional image was labeled as either Attend Left, Attend Right, or baseline based on the timing information recorded during the experiment. The hemodynamic response function was modeled with a gamma function (mean lag = 6 s, SD = 3 s) and its temporal derivative. To reduce the effects of interimage variation on the signal magnitude (i.e., T1 effect), interimage interval was used as a covariate. Contrasts were specified to create Z-statistic images testing for the effect of active listening to sounds (each task block vs. silent baseline) and effects associated with the manipulation of the attention task (Attend Left vs. Attend Right and Attend Right vs. Attend Left).
For analysis across participants, all anatomical FLAIR images (i.e., the one slice that intersected the IC) were coregistered (2-dimensional rigid-body motion with 2 scalings) to that of one participant. The coregistration was manually corrected (by translating the coregistered images by a few voxels when needed) to assure maximal intersubject inferior colliculus overlap. Functional data were realigned using the coregistration matrices of the anatomical images. In the group analysis (FMRIB's local analysis of mixed effects, n = 21) of stimulation effects, Z-statistic images were thresholded using clusters determined by Z > 3.9 and a (corrected) cluster significance threshold of P < 0.05 (using Gaussian random field theory). For the group analysis of attention effects, slightly lower thresholds were used (Z > 2.6, P < 0.05).
Regions of interest (ROIs) were defined in AC and IC to extract the activation time series. The IC ROIs were defined as a 2 × 2 voxel square centered on each colliculus 1 (see Supplementary Fig. 1).1 The AC ROIs were defined based on the group activation clusters associated with the task that manipulated attention. The group activation clusters were transformed to the space of the individual functional data. The ROI time series were reconstructed as follows: 1) the data were motion corrected using FSL; 2) each image was scaled to reduce linear variability in global signal magnitudes caused by variation of the preceding TR caused by cardiac triggering; 3) the data were high-pass filtered (cut-off 90 s) using FSL; 4) the ROI data were transferred to percent signal change values relative to the mean ROI signal across all volumes; 5) the time points (volumes) were sorted in time relative to the onset of the block; 6) the ROI time series was linearly interpolated and 7) temporally smoothed using a low-pass Butterworth filter; and 8) finally, the baseline of the ROI time series was set to the mean of the two attention conditions during time windows −5 to 0 and 29–34 s from block onset.
Behavioral data obtained during the fMRI acquisition showed that, although the task was intentionally difficult, 21 subjects achieved similar levels of performance, well above chance, on both tasks (Fig. 1B).
Attentive listening to the sounds activated ACs bilaterally (Fig. 2A). These activations were enhanced in the hemisphere contralateral to the attended ear compared with activation in the same hemisphere during ipsilateral attention (Fig. 2B). This contralateral enhancement was distinctly visible in the AC activation time courses (Fig. 2C). Repeated-measures ANOVA with factors hemisphere (left, right), attention (contra/ipsilateral), and time (5–10, 10–15, 15–20 s from block onset) showed a significant main effect of attention [F(1,20) = 51, P < 0.0001] and significant interactions hemisphere × attention [F(1,20) = 8.4, P = 0.009] and hemisphere × attention × time [F(2,40) = 3.3, P = 0.048]. Note that the imaging slice does not necessarily cover the same functional areas in the left and right AC. Thus the apparent differences between the left and right AC signal magnitudes cannot be interpreted directly.
The IC was also reliably activated by attentive listening to the sounds (Fig. 2A). ANOVA of the IC activation time courses (Fig. 2D; see Supplementary Figs. 1 and 2) showed that IC activations were higher during attention to contralateral ear than during attention to ipsilateral ear [main effect of attention: F(1,20) = 18, P = 0.0004]. IC activations were not constant throughout the task [main effect of time: F(2,40) = 15, P < 0.0001; interaction attention × time: F(2,40) = 3.5, P = 0.041]. This and the relatively low sampling rate (∼4 images per each 20-s block) most likely explain why IC attention effects were not detected using the global voxel-vise analysis (Fig. 2B), which assumes constant effects.
In this study, we asked whether auditory spatial selective attention influences processing of auditory inputs in the human IC and showed that this is the case. An effect of selective attention on subcortical auditory processing is theoretically important because it is often assumed that selective attention relies on subcortical gating or filtering of irrelevant stimuli. The subcortical gating hypothesis has been the target of numerous previous studies using scalp-recorded evoked potentials. The results of these studies suggest that attention does not modulate the brain stem responses to clicks (Hackley et al. 1990) generated within 10 ms after sound onset. However, it is possible that the attention-related IC modulations occur later than stimulus-related brain stem responses and might thus be overlapped in time with cortical activations. Based on the scalp-recorded electric signals, it would be difficult to separate late IC activations generated deep in the brain from concurrent and typically larger cortical activations.
Several previous fMRI studies in human subjects have been able to measure IC activations to auditory stimulation (Guimaraes et al. 1998). Consistently, in this study, reliable IC activations in response to listening to broadband noise bursts (compared with silent baseline) were evident in most subjects even with relatively strict statistical criteria (Supplementary Fig. 1). Furthermore, previous fMRI studies have reported modulations of IC activations by subtle differences in various sound parameters including temporal regularity (Griffiths et al. 2001), presentation rate (Harms and Melcher 2002), bandwidth (Hawley et al. 2005), sound movement (Krumbholz et al. 2005), intensity (Sigalovsky and Melcher 2006), interaural time differences (Thompson et al. 2006), and acoustic context (Schönwiesner et al. 2007). These finding show that, in addition to stimulus features, the task the subject is performing influences IC activations.
These results were obtained during a strictly controlled selective attention task used in numerous previous studies (Alho et al. 1999; Johnson and Zatorre 2006; Petkov et al. 2004; Rinne et al. 2005; Woldorff and Hillyard 1991). In different blocks, the subjects were required to selectively attend to the sounds at one ear at a time. To facilitate focused selective attention, the pitches of sequences presented to the two ears were distinctly different from each other, and the subjects' demanding task was to detect frequent, slight pitch changes within the attended sequence. Thus the task required highly focused selective attention throughout the experiment. Note that, despite the pitch difference, both ears received similar high-rate acoustic inputs consisting of broadband noise bursts. Because performance was similar for the left and right ear stimuli, it is very unlikely that the observed attention effects were caused by differences in arousal (Wickelgren 1968) related to task difficulty. It is also important to note that, consistent with similar previous studies using fast-rate acoustic stimulation (Alho et al. 1999; Woldorff and Hillyard 1991), the present IC and AC attention-related modulations were observed contralaterally to the attended ear. Such contralateralization would not be expected if the attention effects were caused by differences in the general arousal level of the subject.
Based on these fMRI data, it is not possible to determine whether the IC attention effects were caused by early (∼10 ms from sound onset) modulation of sensory activations (gating/filtering) or were caused by later activation of some attention-related IC processes. Note also that the present IC attention effects could either be caused by attention-related enhancement (contralateral) or suppression (ipsilateral) of IC activations during the two attention conditions.
Because of its vicinity, it might be argued that the present IC attention effects are actually caused by activation of the superior colliculus (SC). The SC contains a systematic map of auditory space (Middlebrooks and Knudsen 1984), has been implicated in covert visual spatial attention (Ignashchenkova et al. 2004), and unilateral cooling of the SC induces contralateral neglect of auditory stimuli (Lomber et al. 2001). In this study, attentive listening (vs. silence) was associated with distinct tectal activations that reached Z-scores corresponding to those found for the same contrast in the AC (Fig. 2A). Although there are auditory-responsive units in the SC, to our knowledge, sound-related fMRI activations in the SC have not been reported. This suggests that SC contributes a negligible amount to the fMRI activations elicited during attentive listening. This study was carefully designed to detect reliable IC activations: Only a limited brain area consisting of three consecutive slices was imaged at relatively long interimage intervals (∼4–5 s) to reduce the acoustic noise associated with imaging, which could interfere with the attention task and decrease the signal differences between the task and silence blocks, especially in the IC. The first imaging slice was carefully oriented to include the IC and AC bilaterally. Immediately after the functional imaging, anatomical images were obtained using the same slice definitions (but with higher in-plane resolution) that were used for functional images. The IC is very easy to localize, and in most subjects, it could be identified (anatomically) even on the basis of (low-resolution) functional images. Although we cannot exclude the possibility that slight head movements during the scanning affect the anatomical accuracy of these data, we are quite confident that the tectal activations shown in Fig. 2A (see Supplementary Fig. 1 for individual data) and, thus the IC ROIs used in the time series analysis that were based on these activations, are in the IC and not in SC.
Because the contralateral IC attention effect was not detected using the conventional GLM analysis (Fig. 2B), it might be argued that this effect is weak and not reliable. This study was designed assuming that the (unknown) temporal characteristics of the IC attention effects could be different from those in AC and that the acoustic scanner noise could prevent the detection of IC effects. Thus fMRI data were acquired at relatively long interimage intervals (∼4–5 s) to minimize scanner noise while the fully jittered acquisition (i.e., images were obtained at random time points relatively to the block onset) allowed the reconstruction of the IC activation time series with higher temporal resolution. However, such acquisition potentially hinders the GLM analysis, which assumes constant effects. Because of long interimage intervals, there are only about four to five time points per each 20-s block in the GLM analysis. Because the sampling rate was low, and the IC activations were not constant throughout the block (main effect of time and interaction of attention × time), it makes sense that the GLM analysis failed to detect any contralateral IC attention effects. Importantly, the analysis of the reconstructed activation time series with higher temporal resolution showed a clear effect of contralateral attention on IC activations (main effect of attention, n = 21, P = 0.0004).
In our previous study (Rinne et al. 2007) using data acquisition and analysis similar to this study (except that a 1.5-T scanner was used in that study), attention effects were found in the AC but not in IC. In that study, an intermodal task was used in which the subjects attended to either visual or auditory stimuli and detected targets in the attended modality. Although the auditory and visual tasks were difficult, it is possible that attention demands were lower in that study than in this one: the previous task did not require within-modality selective attention, and time pressure was lower than in this study. In the previous study, targets appeared three to fourtimes during a 30-s block, whereas in this study, pitch-change classification was required once per second (on average). Furthermore, in this study, the pitch of the sounds varied both between and within blocks, requiring continuous refocusing of attention, whereas in the previous study, the pitch of the sounds was constant. Thus the present intramodal attention effects might be caused by the higher attention demands resulting in systematic attention-related modulations of IC activation. In addition, this study required spatial auditory attention, whereas in the previous study, all sounds were presented in the same location. Given the role of the IC in spatial processing (King et al. 2001), the spatial task might be critical for producing these attention effects.
Researchers have discussed the possible role of attention in subcortical auditory processing for a long time. This study shows that human IC activation is significantly modulated by auditory selective attention and that this modulation depends on where in space attention is directed. These results were obtained by applying a strictly controlled selective-listening paradigm requiring highly focused selective attention throughout the experiment. These results indicate that IC operations are not solely stimulus driven but are also susceptible to top-down modulations related to behavioral goals. Future research is needed to clarify how such modulations influence auditory processing and to see whether similar modulations can be found in other subcortical processing stages.
This work was supported by Academy of Finland Grants 207180 to T. Rinne, 210587 to K. Alho, and 213938 and 213470 to M. Sams.
↵1 The online version of this article contains supplementary material.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Copyright © 2008 by the American Physiological Society