|
|
||||||||
1Medical Research Council Institute of Hearing Research, University Park, Nottingham; and 2Medical Research Council Institute of Hearing Research, Glasgow Royal Infirmary, Glasgow, United Kingdom
Submitted 14 March 2005; accepted in final form 5 July 2005
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
50 Hz are important for accurate speech recognition (e.g., Drullman et al. 1994a
The auditory system must be able to represent temporal information across these different timescales. It is well established that the auditory cortex represents slow-rate envelope modulations explicitly using a temporal code that is phase locked to the stimulus. When amplitude-modulated (AM) stimuli are presented at modulation rates that span 1500 Hz, most auditory cortical neurons have a discharge rate that is selectively tuned to low modulation rates; the best modulation frequencies range from about 3 to 30 Hz (e.g., Eggermont 1994
; Schreiner and Urbas 1986
,1988
; Wang et al. 2003
). The observed selectivity is indeed a temporal, not a spectral, phenomenon because the same tuning pattern is found for frequency-modulated (FM) stimuli (Wang et al. 2003
). It has been argued that fine-grained temporal patterns are more likely to be represented by the mean discharge rate of the population (Wang et al. 2003
), although using moving ripple sounds it has been possible to demonstrate some preservation of precise spike timing (Elhilali et al. 2004
). Results from human auditory neuroimaging have confirmed that sustained cortical responses occur predominantly for rates <10 Hz (Giraud et al. 2000
; Harms and Melcher 2002
; Liégeois-Chauvel et al. 2004
) and equally for both AM and FM sounds (Hart et al. 2003
). The human auditory cortex also responds to modulation rates >10 Hz, but the magnitude and shape of the response varies according to modulation rate. Steady-state evoked potentials can be recorded for stimuli that are modulated at rates between 1 and 200 Hz (Picton et al. 2003
). Systematic effects of sinusoidal AM rate (10 to 98 Hz) on the steady-state MEG (magnetoencephalographic) response reveal a clearly detectable synchronized response at each rate, but as the rate increases to >40-Hz harmonics, the positive and negative deflections approximate more toward a sinusoid (Ro
et al. 2000
). Thus neural-coding mechanisms for temporal envelope and periodicity have been well explored.
Functional magnetic resonance imaging (fMRI) takes advantage of the association between neuronal activity and the local control of blood flow to localize activity in the brain. The fMRI signal from the human auditory cortex is able to determine activity within a few millimeters of space enabling response parameters to be mapped both within and across different auditory fields (e.g., frequency sensitivity; Talavage et al. 2004
). Human fMRI data have identified degrees of functional segregation across regions of the auditory cortex both in terms of the range of best modulation rates (Giraud et al. 2000
) and whether the response shape is sustained or phasic (Harms and Melcher 2003
; Seifritz et al. 2002
). The distribution of preferred responses is somewhat patchy and so no obvious organizational scheme has yet emerged from this research except to say that lateral Heschl's gyrus (HG) and planum temporale (PT) prefer low modulation rates (Hart et al. 2003
).
Imaging studies have recently begun to map out the cortical distribution of responses to temporal acoustic patterns at rates >50 Hz and have begun to define those regions that are highly sensitive to periodicity (Griffiths et al. 1998
, 2001
; Krumbholz et al. 2003
; Patterson et al. 2002
; Penagos et al. 2004
; Warren and Griffiths 2003
) and fine structure (Budd et al. 2003
; Krumbholz et al. 2005
). Again HG and PT are strongly implicated in the analysis of these more rapid temporal variations in sound. Significant positive relationships have been revealed between activation in lateral HG and both the degree of monaural temporal regularity in a regular-interval noise (RIN) signal and the degree of correlation between a noise at the two ears (Budd et al. 2003
; Griffiths et al. 1998
; see also Krumbholz et al. 2003
). Moreover, reanalysis of Budd's data (computing the difference between correlated and uncorrelated noise conditions) confirmed that the sensitivity to correlated noise did not extend beyond lateral HG (see Fig. 1). This result has led to speculation that the analysis of temporal pitch and interaural correlation (IAC) might engage the same bilateral auditory regionlateral HG (Budd et al. 2003
).
|
We predict that if the activation in lateral HG reflects a common computational step in the analysis of temporal structure in sound, then the effects of RIN and correlated noise will equally engage the same neural basis in the human auditory cortex. Previous results are inconclusive because periodicity and fine structure have been tested in separate groups of listeners. The present study reports a novel set of results that tests this hypothesis using a single set of listeners.
| METHODS |
|---|
|
|
|---|
Eighteen adults (ten male, eight female) aged between 18 and 42 yr participated in the experiment. None of the participants was a trained psychophysical listener. All participants were screened for normal hearing (<20 dB HL) for pure-tone octave frequencies between 500 and 8,000 Hz and none had a history of audiological or neurological impairment. All participants were familiarized with the scanning environment and listening task before giving written consent. The experimental procedures were approved by the local health service ethics committee.
Stimuli and task
The starting point for stimulus generation was a random noise burst that was band-passed from 500 to 1,500 Hz and sampled at a rate of 44,100 Hz. The high-pass cutoff was set to 500 Hz to exclude the region normally associated with the dominant, resolved harmonics for these low-pitched stimuli, whereas a low-pass cutoff of 1,500 Hz is close to the upper limit for the effectiveness of binaural time difference cues for sound location. Noise bursts were created digitally by taking both real and imaginary components within the pass-band from Gaussian distributions and setting those components outside the pass-band to zero. The real part of the inverse fast Fourier transform of this signal was stored for further processing. Monaural temporal regularity was generated by delaying a copy of the random noise and adding it back to the original. The pitch characteristics of the noise are directly related to the temporal patterning within the signal. The pitch frequency is equal to the reciprocal of the delay (i.e., the period of the repeating interval) and the salience of the pitch is exponentially related to the number of delay-and-add iterations (e.g., Yost 1996
, 1998
; Yost et al. 1996
). RIN is a good candidate for investigating the cortical basis for pitch sensitivity because the signal can be high-pass filtered to minimize resolved spectral components in the excitation pattern and pitch salience can easily be varied parametrically (Griffiths et al. 1998
, 2001
; Patterson et al. 2002
; Warren and Griffiths 2003
). Three sound conditions were created that differed in the amount of regular temporal structure. The first RIN condition contained 16 add-and-delay iterations, generating a clear periodic structure and thus a salient percept of temporal pitch. The second RIN condition contained only one add-and-delay iteration, thus generating a weak percept of temporal pitch. The third sound condition was random noise with no regular structure and thus no pitch.
These three levels of temporal regularity were crossed with two levels of IAC (correlations of zero and unity) to create six noise conditions. Correlated signals, with an IAC of unity, were created by copying the same noise waveform to left and right channels. This diotic signal is perceived as a compact source at the center of the head. Uncorrelated signals were created by generating two statistically independent waveforms so that no time delay would make the waveforms in the left and right channels match. Listeners often report an uncorrelated noise as two separate sound sources at the two ears (e.g., Blauert and Lindeman 1986
), although some listeners report a spatially diffuse percept especially when they are naïve psychophysical subjects. In all six noise conditions, the noise was spectrally equivalent because it always originated from the same samples of random noise. Additional supplementary information, including audio examples, is available.1
Each noise condition contained a sequence of 16 of the same type of noise bursts (450-ms duration with 10-ms raised cosine onset and offset ramps and 50-ms interburst intervals). The burst rate was 2 Hz and each sequence lasted 8 1s. Thirty different noise sequences were generated for each of the six noise conditions. For the RIN conditions, the delays in the add-and-delay algorithm were chosen from a set of 16 steps ranging from 9.1 to 19.6 ms (i.e., 51110 Hz). They were varied randomly from burst to burst to create a changing pitch sequence. Noise sequences plus 30, 8-s silent epochs were presented to listeners in a pseudorandomized order. Listeners were requested to press a button in response to each occurrence of the silent epoch. All noise sequences were presented at 80 dB SPL using high-fidelity headphones that had been specifically modified for use during fMRI (Palmer et al. 1998
).
fMRI acquisition
Echo-planar brain images were acquired using a 3-T whole body scanner equipped with a head-gradient coil (Bowtell et al. 1994
) and a custom-built radio-frequency receiver coil (Nova Medical, Tyngsboro, MA). Each image set constituted 22 slices that were selected to give the best coverage of the superior temporal gyrus. Coverage generally excluded the frontal and occipital poles and so before the functional experiment, an additional whole-brain set of 64 coronal slices was acquired to facilitate image postprocessing. Contiguous T2*-weighted functional images were acquired using the parameters TE = 35 ms and flip angle = 90°. Each set of images was acquired in the coronal orientation (64 x 64 matrix; 4-mm3 voxel resolution; 2.2-s clustered acquisition time). The time interval between each functional image set was 9.0 s. This sparse sampling method enabled the auditory stimulus to be delivered predominantly during the quiet period between image acquisitions to reduce acoustic masking and other contaminatory effects of background scanner noise (Edmister et al. 1999
: Hall et al. 1999
). A sustained hemodynamic response has been demonstrated for a burst rate of 2 Hz (Harms and Melcher 2002
), and so the single image set acquired at the end of each noise sequence provides a response measure that is as equally representative as one acquired at an earlier time point in the sequence. Participants were strongly encouraged to remain as still as possible during scanning.
Image preprocessing
The functional time series for each participant (211 image sets) was processed and analyzed using SPM99 software (http://www.fil.ion.ucl.ac.uk/spm) that is supported by the Matlab platform (The MathWorks, Natick, MA). The first processing stage included correcting the time series for rigid body brain movement using the first image in the time series as a reference. Movement parameters are calculated for six orientations (translations in the x, y, and z planes and rotations around the three axes). Acceptability criteria were movements not exceeding a translation of 4 mm and a rotation of 4 ° (with a majority of translations and rotations being <1 mm and 1°). To average brain images across individuals, it is necessary to reduce intersubject variability by transforming all the data into a common brain space. SPM99 offers a range of whole-brain templates that conform to the same standardized brain space defined by the Montreal Neurological Institute (MNI). Our procedure therefore used the individual 64-slice image to compute the spatial transformations needed to warp each participant's brain to the T2*-weighted image template. The resulting transformation parameters were then applied to the coregistered functional time series to create a set of normalized images with an upsampled voxel resolution of 2 mm3. The final processing stage applied a Gaussian smoothing kernel of 8 mm full width at half-maximum to improve the signal-to-noise ratio and to satisfy the assumptions of intersubject averaging. The size of the kernel was much less than the distance of lateral HG from its neighbors. For example, according to Patterson et al. (2002)
, the normalized center of central HG lies at a Euclidean distance of 15 mm from the center of lateral HG in either hemisphere. Thus our choice of voxel size and smoothing kernel avoids smearing one auditory field into another. Low-frequency artifacts in the time series were dealt with by applying a high-pass filter at 0.3 cycles/min.
Statistical analyses in SPM99 implement the general linear model and partition the observed response as a sum of several weighted variables. Statistical contrasts between stimulus conditions can be specified by any linear combination of these variables. The significance of each contrast is determined relative to the scan-to-scan residual variability. The most parsimonious model is one that describes the greatest influences on the signal using the fewest orthogonal variables (giving a model with the greatest possible degrees of freedom). We used a model that specified 11 variables encompassing four stimulus variables, six head-movement variables, and a constant term for the mean image value for each individual data set. Each variable was a column of digits, one for each set of images. Our stimulus variables were 1) the presence of any noise condition (coded as 1 for presence and 1 for absence), 2) the values of IAC giving rise to different spatial percepts (coded as 1 for correlated and 1 for uncorrelated), and 3) the levels of temporal regularity giving different sensations of pitch salience (coded as 0, 3.0, and 12.3 with values being defined by the function "10 Log [number of add-and-delay iterations + 1] "; Griffiths et al. 1998
). A fourth variable (4) was included to account for any additional nonlinearity in the response to temporal regularity. This variable took the form of a stepwise response that simply defined the presence of an RIN condition by a value of 1 and its absence by a value of 1.
To explore the question about a common neural basis for the sensitivity to temporal acoustic structure, we first considered the patterns of activation related to pitch and spatial width by defining four feature-related contrasts; the increasing response to temporal regularity (pitch salience) (variable 3), the additional nonlinear stepwise response to the presence of temporal pitch (variable 4), the response to correlated noise (compact source) (variable 2), and the response to uncorrelated noise (split, broad source) (defined by the negative sign of variable 2). Although these analyses were performed for each individual data set, we primarily consider the overall significance of the stimulus-related activation for the group of 18 listeners, determined using a second level of analysis that dealt with the between-subject variance. These group analyses were computed in SPM99 by performing one-sample t-test on the individual summary outputs from the four feature-related contrasts. Activation in the auditory cortex was reported only if it exceeded a voxel-level significance threshold of P < 0.001 (uncorrected for multiple comparisons). As well as reporting the location of activation using the standard three-dimensional coordinate system, we also describe it with respect to the gross anatomy of the superior temporal gyrus. Of particular interest is the precise location of activation around HG and so our descriptive scheme pays careful attention to whether the activation is located in lateral, central, and medial portions of HG and/or Heschl's sulcus. There is some anatomical support for partitioning HG in this way using reported differences in its laminar architecture (Morosan et al. 2001
). Although the subdivisions cannot be determined from an MR image, the gyrus and sulci are clearly visible from the 64-slice reference image acquired for each listener (an example of which is shown in Fig. 2). Activation beyond the superior temporal region was reported only if it reached a corrected cluster-level threshold of P
0.05. This more conservative statistical threshold is more appropriate when inferences are made in the absence of any planned prediction about activation being present in that region. In the present study, we had no strong expectations about finding activation beyond the superior temporal region that was related to pitch or spatial width.
|
| RESULTS |
|---|
|
|
|---|
First, we will discuss the spatial location of those voxels that displayed a growth in their response as a function of increasing temporal regularity (pitch strength) (Table 1). The result from the group analysis is presented (in blue) for five slices through the auditory cortex in Fig. 3A. There was widespread activation across the auditory cortex in both left and right hemispheres. Most of the peaks of activation were focused around the primary auditory cortex (medial HG), but activation extended out to the lateral convexity as well as anteriorly and posteriorly along the length of the superior temporal gyrus to include parts of the surrounding nonprimary auditory cortex on the PT and planum polare. Conversely, we inverted the linear growth contrast to identify candidate regions that decreased in their response as the pitch strength increased. No activation was found to reach significance.
|
|
Next, we will discuss the spatial location of those voxels showing an additional stepwise response to the presence of temporal regularity (pitch). The result of the group analysis is presented (in cyan) in Fig. 3A. These response regions were primarily located in nonprimary auditory cortex (i.e., the lateral portion of HG) and extended below the surface of the superior temporal gyrus. In these voxels the pitch-related activity had both linear and nonlinear components. The additional nonlinear response was more limited than the area showing a purely linear response to pitch salience and some asymmetry was observed (right hemisphere activity greater than left). When the individual data were examined, the distribution of suprathreshold pitch-related activation was found to vary somewhat from listener to listener; in seven listeners it was bilateral, in seven it was purely right-sided, and in four it was absent. Nevertheless, when present, its location was moderately consistent across listeners because up to six listeners had activation at the same voxel location (x 58, y 6, z 2 mm), which lies just underneath the lateral portion of HG. The range of the overlap across listeners for the additional nonlinear response to pitch is shown in Fig. 3B (second row).
Although the response to both temporal pitch contrasts occurred most prominently in the lateral portion of HG, the extensive sensitivity to pitch salience leads us to conclude that pitch processing probably engages multiple auditory regions that extend beyond HG. The strong growth in activity in medial HG as a function of temporal regularity could be interpreted as a physical analysis of periodic structure or a perceptual analysis of pitch because the two attributes co-vary. The spread of pitch-related activation across the planum polare is consistent with a role in processing pitch changesincluding melodic pitch sequencesin the sound sequence (Griffiths et al. 1998
: Patterson et al. 2002
). Although planum polare was activated in both hemispheres, these effects were typically greater in the right hemisphere, providing another parallel to the data reported by Patterson et al. (2002)
.
Sensitivity related to spatial width
We sought to identify regions that were more sensitive to a compact sound source (correlated noise) than to a broad, split sound source (uncorrelated noise) and vice versa. Sensitivity to a compact sound source involved bilateral regions of the inferior parietal lobe and the lateral convexity of the right superior temporal gyrus, just below the lateral HG. No other brain region reached significance in the group analysis. The activity in the right auditory cortex is shown (in green) in Fig. 3A and its peak was a mere 6 mm away from the peak identified in the data reported by Budd et al. (2003)
(as shown in Fig. 1). The group activity pattern was not well replicated because when the thresholded individual maps were superimposed on one another, no two listeners overlapped at the same voxel (Fig. 3B, third row). However, the individual activation maps revealed several listeners to have significant activation (P < 0.001) in the anterior insula, which extended across 41 voxels in the left and 42 voxels in the right hemisphere and which overlapped in two listeners. The centroid of these clusters was x 38, y 2, z 14 mm and x 34, y 6, z 10 mm, but the activity can also be seen on the overlap images at z = 16 and 8 mm. The functional role of the anterior insula is unclear.
Sensitivity to the broad, split sound source revealed moderate clusters of activation in bilateral auditory cortex, which predominantly had a medial focus just behind and below HG (Fig. 3A, in yellow). The activation encompassed both primary and nonprimary regions of the auditory cortex on HG and PT, respectively. Alhough five of the individual listeners had no significant auditory activation in response to the broad, split sound source, the individual activation patterns for 13 listeners showed a distribution that was broadly similar to that of the group (Fig. 3B, bottom row). In six listeners it was bilateral, but in five it was present only on the left side and in two listeners it was present only on the right side. There was moderate consistency in the location of the response to the broad, split sound because eight of those listeners activated precisely the same left-sided medial location (x 50, y 26, z 4 mm), immediately behind HG. This site was also close to the peak voxel for the group. The posterior insula activation that was obtained from the group analysis reached threshold in only one of the individual listeners.
In summary, several auditory cortical regions were sensitive to the spatial width of the sound source. However, these were many times smaller than that region, which was sensitive to pitch, as well as more variable across listeners. These results illustrate that the pattern of significant activation for the group not only reflects the suprathreshold activation in the individual listeners, but is also influenced by the individual's subthreshold responses. It is interesting to note that the location of the most reliable response to the broad, split sound source was the same voxel in the PT that also showed the most statistically reliable response to pitch strength (x 50, y 26, z 4 mm). In the following section, we quantify the degree to which regions of the auditory cortex are sensitive to both types of temporal acoustic structure.
Co-localization of responses to pitch and spatial width
By displaying the group results for the four feature-related contrasts on the same brain view (Fig. 3A), the overlap between the different activation maps can be appreciated. The schematic diagram in Fig. 4 quantifies the number of voxels that are coactivated by more than one of the contrasts:
90% (213/230) of the voxels that responded to the presence of pitch were also sensitive to the rise in temporal regularity. Their location corresponds to the lateral portion of HG. In lateral HG, the stepwise response to the manipulation of temporal regularity lends further support to the idea of a perceptual center that plays a special role in representing both the presence of pitch and its salience.
|
Interaction between the two sound features
The three- x two-way interaction between pitch salience and spatial width reached significance in a bilateral auditory region located just behind HG, in Heschl's sulcus and PT (see Fig. 5). The separate two-way interactions are informative about the form of the interaction and their results indicate that the region is particularly responsive to the effect of the uncorrelated noise when the sound also has a strong pitch compared with when it has no pitch. To confirm the shape of the interaction, the MR signal (adjusted for the constant term relating to the mean image value) was extracted for the left and right peak voxel coordinates for every listener and the group mean value for each of the six sound conditions was computed. As the graphs in Fig. 5 reveal, sensitivity to the broad, split sound source was greatest when the sound also had a strong temporal pitch. Indeed, as Table 1 reveals, the peak coordinates of the interaction lie very close to those voxels showing a significant sensitivity to the broad, split sound source. The shape of the interaction shown by the peak voxels was the same as that in the rest of the region because plots of the mean MR signal for the entire cluster followed a similar pattern.
|
The interaction result was unexpected and no clear explanation seemed immediately apparent. When creating the stimuli, we had made the assumption that because the manipulations of temporal iteration and IAC operated on different scales of temporal structure, their subsequent effects on perception and brain activation might be additive, but not interactive. Because our fMRI results revealed a superadditive response for uncorrelated nose and strong pitch, would there also be an interaction for perception? To test this prediction, we recruited a new set of 20 naïve psychophysical listeners because the original set of listeners were no longer available. Listeners were simply asked to describe their experience of a set of sound stimuli. Our procedure followed that of Blauert and Lindeman (1986)
in which listeners were asked to draw a map of each auditory event using schematic sections through the head's vertical and horizontal planes. This method brings the listener's attention to the position and extent of the auditory percept. For this test, we presented four examples of a binaural 8-s sound sequence, taken directly from each of the "no pitch " and "strong pitch " fMRI conditions. Thus the sequences were 1) random noise, IAC = 0, 2) random noise, IAC = 1, 3) RIN = 16 add-and-delay iterations, IAC = 0, and 4) RIN = 16 add-and-delay iterations, IAC = 1. Four monaural conditions were also presented to provide baseline measures. These were left- and right-ear versions of the "no pitch " and "strong pitch " conditions.
Figure 6 shows the distribution of the auditory images that were mapped for the monaural and binaural stimuli. Monaural noise bursts were reliably perceived to have a single sound source to the left or right, determined by the ear of presentation. Lateralization judgments were not influenced by whether the noise had a pitch. A majority of listeners (16/20) perceived both sequences of correlated noise bursts to have a single compact source around a central, midline location. Again judgments about lateralization were not strongly affected by any additional pitch information. In contrast, there was a good deal of variability in the perceived position of the binaurally uncorrelated stimuli. From our sample of 20 listeners, five listeners described the two sequences of uncorrelated noise as having two separate sources to the left and right, whereas three always described a single, more diffuse source. The remaining listeners described two lateralized sources on one occasion and a diffuse source on another. One striking observation was that seven of the listeners also stated that the source of the uncorrelated noise appeared to shift from burst to burst for the RIN sequence in which pitch varied from burst to burst. The perceived direction of the location shift was not systematic across listeners. It was either lateralized or crossed the midline and either oscillating or circular. In contrast, the uncorrelated "no pitch " noise sequence was never perceived to be spatially varying. Note that the pitch sequences contained a random melody, yet changing the pitch is itself insufficient to induce a sensation of spatial shifts because listeners always described both the correlated noise with pitch and the monaural noise with pitch as being static. The perceptual data therefore also reveal an interaction between temporal regularity and interaural correlation that neatly mirrors the same interaction in the fMRI data. The response to the uncorrelated "strong pitch " noise appears to be most different from the other stimulus conditions, in terms of both the way it is perceived and its pattern of cortical activation.
|
| DISCUSSION |
|---|
|
|
|---|
Overlap in the sensitivity to monaural temporal regularity and IAC occurred in several regions of the nonprimary auditory cortex (close to the lateral part of HG and in PT) providing the first evidence that these two sound features engage the same auditory regions. The strongest interpretation of this result would be to conclude that the coactivation reflects the common neural computation of temporal structure. However, two observations from the present study are incompatible with this conclusion. Most notably, our data clearly revealed two separate regions of IAC sensitivity determined by whether the signal was correlated or uncorrelated across the two ears. Across-frequency integration is a step that ought to be applied equally to both types of binaural signal because it is only the output of this process that differs according to the IAC value, not the process itself. Therefore this stage of the model would not predict correlated and uncorrelated noise to have separate neural substrates. Second, the group results demonstrate that the patterns of activation for pitch and spatial width were typically different to a greater degree than they were similar. Lateral HG was highly responsive to pitch, whereas its sensitivity to correlated noise reached significance only in the right hemisphere and even here the effect size (Z value) was reduced. This asymmetry suggests that any common neural computation for periodicity and fine structure is unlikely to be instantiated in lateral HG, but probably at an earlier stage in the auditory pathway. If cortical activation reflects a later operation then it would be performed on the output of the frequency integration process, possibly associated with the perceptual qualities of the sounds. We therefore infer that the cortical activation is a neural correlate of pitch and spatial width rather than monaural temporal regularity and IAC. Although fMRI cannot reveal what is the precise neural code in lateral HG, we expect the cortical code to be an abstract high-level one. For example, as a result of the general loss of rapid temporal synchrony, the cortex is proposed to operate on a transformed representation of the temporal information and not on the original rapid timing code (Eggermont 2001
; Palmer 1995
; \?\Wang 2003). An abstract cortical code for pitch has some empirical support from human listeners because lateral HG is responsive to pitch irrespective of whether it is generated by the temporal pattern in noise (Griffiths et al. 1998
; Patterson et al. 2002
) or is a complex tone containing resolved or unresolved harmonics (Penagos et al. 2004
).
The relevance of individual differences
We observed that the pitch-related activation was highly consistent across all the listeners, whereas that associated with different spatial widths was both less distinct and more highly variable across listeners. We view these individual differences to be informative because they support the poor association between the neural correlates of pitch and spatial width processing in human auditory cortex. Individual differences in spatial width activation may represent distinctions between the ease of perceiving that feature in the sound. Although somewhat anecdotal, listeners were more readily able to distinguish between the sound sequences along the dimension of pitch than along the dimension of spatial width. Uncorrelated noise is not commonly heard in the environment and so might be difficult to judge perceptually and to assign a verbal label. Given their relative unfamiliarity with these synthesized stimuli, different listeners might adopt different strategies for listening to and describing the spatial attributes of the sound. What we mean by the term strategic is intended to incorporate the idea of different listening modes. A well-known example is off-frequency listening, when listeners can use information in different frequency regions to improve their target discrimination performance in masking tasks (O'Loughlin and Moore 1981
). An individual's listening experience can also influence the listening mode for uncorrelated noise because trained and naïve listeners are reported to differ in the way they interpret the spatial characteristics of broadband signals when the value of IAC is <1 (Blauert 1978
). Whereas practiced psychoacoustic listeners can distinguish separate auditory objects, naïve listeners are more likely to report a single broadened object (see also Blauert and Lindemann 1986
). Indeed, in our survey of 20 naïve listeners, we recorded a wide range of perceptual experiences for the uncorrelated noise, including reports of either one diffuse or two separate lateralized auditory objects. Some of the naïve listeners differed even in their perception of the correlated noise. Although we were unable to collect perceptual data from those who participated in the scanning study, the qualitative data from the new listeners strongly indicate that the perceptual ambiguity in the spatial characteristics of the uncorrelated noise might be sufficient to underlie the observed patterns of results, particularly the intersubject variability obtained in the fMRI data.2 We conclude from the data presented here, a strong need to relate individual patterns of cortical activation with that of listener's own perceptual experience, as well as with that of the acoustical attributes of the sound.
The interaction between pitch- and spatial-related processing
We discuss two possible functional interpretations of the interaction effect found in the PT: that PT reflects a site for either 1) processing sounds that vary in spatial location or 2) integrating object- and location-based sound information to segregate auditory objects. First, if we infer that the fMRI interaction is a correlate of the perceptual interaction, then PT might simply be implicated in the processing of auditory space. The role of PT in spatial analysis is well established. It is responsive to sound sources that move in space (Baumgart et al. 1999
; Hart et al. 2004
; Warren et al. 2002
) or shift in their location from burst to burst (Krumbholz et al. 2005
; Warren and Griffiths 2003
). PT is also sensitive to multiple simultaneous sources that have a broad spatial distribution (Zatorre et al. 2002
). In the present study, naïve listeners described the uncorrelated "pitch " noise as shifting in location from noise burst to noise burst. We speculate that this sensation could be induced because uncorrelated noise had a spatial ambiguity and because another (nonspatial) attribute of the sound was changing within the sequence (i.e., the frequency of the pitch). The key point here is the putative association between the reported sensation of shifts in spatial position and the greater PT activation by uncorrelated noise.
The second inference considers PT as a more general purpose system implicated in higher-order sound analysis including the representation and segregation of spectrotemporal sound patterns (Griffiths and Warren 2002
). The interaction revealed by our fMRI data is consistent with a cortical process that integrates spectrotemporal features of a sound with spatially relevant informationan important process in sound segregation. Electrophysiological data from primates support the claim for a distinct region of the nonprimary auditory cortex that is responsible for the integration of object-related sound features with information about its location in space. A likely neural candidate for integration in the primate auditory cortex is the lateral auditory belt because here neurons are jointly sensitive to both the spatial position and the spectrotemporal features of a sound (Rauschecker and Tian 2000
; Tian et al. 2001
). For example, the response preference for a particular monkey call co-varies with that neuron's spatial selectivity, especially in the posterior portion of the lateral belt (areas CL and CM). The human homologue of the lateral belt is most probably part of PT (see APPENDIX A in Hackett 2003
). In human listeners, PT activation has been reported to co-vary with the spatial distribution of a set of acoustically distinctive sounds (Zatorre et al. 2002
). Thus PT appears to be particularly sensitive to spatial position when object-related features are also present in the sound. Again, in the present experiment PT responded most strongly to the uncorrelated noise that had a salient pitch. Because this sound condition was the one judged to contain several auditory objects, the increased computational resource required to derive and integrate spatial and pitch-related outputs could account for the high level of PT activity. The observation (illustrated in Fig. 5) that decreasing spectotemporal distinctiveness (i.e., pitch salience) was associated with a decrease in PT activity supports an integrative processing account in which the magnitude of the response is contingent on both the spectrotemporal and spatial attributes of the stimulus.
The results from this study clearly rule out the hypothesis that lateral HG plays an equivalent role in representing the temporal acoustic structure for temporal pitch and binaurally correlated stimuli. The interaction effect in PT was an unexpected result whose interpretation cannot be resolved by the present data. The precise functions of PT are not well defined because it is responsive to many different types of complex sound (Griffiths and Warren 2002
) and, yet, not always jointly sensitive to pitch and location (Warren and Griffiths 2003
). Important questions for future research should probe the role of PT in object formation by addressing the type of listening conditions that engage PT and exploring evidence for functional subdivisions.
| GRANTS |
|---|
|
|
|---|
| ACKNOWLEDGMENTS |
|---|
|
|
|---|
Present address of Q. Summerfield: Department of Psychology, The University of York, Heslington, York, YO10 5DD, UK.
| FOOTNOTES |
|---|
1 The Supplementary Material for this article (text and audio examples) is available online at http://jn.physiology.org/cgi/content/full/00271.2005/DC1. ![]()
2 T-test comparisons are particularly sensitive to the degree of response variance in either of the pairwise conditions and so variability in the response to uncorrelated noise conditions would be sufficient to reduce the effect size for correlated noise conditions. ![]()
Address for reprint requests and other correspondence: D. A. Hall, MRC Institute of Hearing Research, University Park, Nottingham, NG7 2RD, UK (E-mail: d.hall{at}ihr.mrc.ac.uk)
| REFERENCES |
|---|
|
|
|---|
Blauert J. Some aspects of three-dimensional hearing in rooms. Proc 1st Int Conf on Fundam Approaches to Software Eng, Warsaw 3: 6568, 1978.
Blauert J and Lindemann W. Spatial mapping of intracranial auditory events for various degrees of interaural coherence. J Acoust Soc Am 79: 806813, 1986.[CrossRef][ISI][Medline]
Bowtell R, Mansfield P, Coxon RJ, Harvey RJ, and Glover PM. High-resolution EPI at 3.0 T. Magn Reson Mater Phys Med Biol 2: 15, 1994.
Budd TW, Hall DA, Gonçalves MS, Akeroyd MA, Foster JR, Palmer AR, Head K, and Summerfield AQ. Binaural specialisation in human auditory cortex: an fMRI investigation of interaural correlation sensitivity. NeuroImage 20: 17831794, 2003.[CrossRef][ISI][Medline]
Drullman R, Festen JM, and Plomp R. Effect of temporal envelope smearing on speech reception. J Acoust Soc Am 95: 10531064, 1994a.[CrossRef][ISI][Medline]
Drullman R, Festen JM, and Plomp R. Effect of reducing slow temporal modulations on speech reception. J Acoust Soc Am 95: 26702680, 1994b.[CrossRef][ISI][Medline]
Edmister WB, Talavage TM, Ledden PJ, and Weisskoff RM. Improved auditory cortex imaging using clustered volume acquisitions. Hum Brain Mapp 7: 8997, 1999.[CrossRef][ISI][Medline]
Eggermont JJ. Temporal modulation transfer functions for AM and FM stimuli in cat auditory cortex. Effects of carrier type, modulating waveform and intensity. Hear Res 74: 5166, 1994.[CrossRef][ISI][Medline]
Eggermont JJ. Between sound and perception: reviewing the search for a neural code. Hear Res 157: 142, 2001.[CrossRef][ISI][Medline]
Elhilali M, Fritz JB, Klein DJ, Simon JZ, and Shamma SA. Dynamics of precise spike timing in primary auditory cortex. J Neurosci 24: 11591172, 2004.
Giraud AL, Lorenzi C, Ashburner J, Wable J, Johnsrude I, Frackowiak R, and Kleinschmidt A. Representation of the temporal envelope of sounds in the human brain. J Neurophysiol 84: 15881598, 2000.
Griffiths TD, Büchel C, Frackowiak RSJ, and Patterson RD. Analysis of temporal structure in sound by the human brain. Nat Neurosci 1: 422427, 1998.[CrossRef][ISI][Medline]
Griffiths TD, Uppenkamp S, Johnsrude I, Josephs O, and Patterson RD. Encoding of the temporal regularity of sound in the human brainstem. Nat Neurosci 4: 633637, 2001.[CrossRef][ISI][Medline]
Griffiths TD and Warren JD. The planum temporale as a computational hub. Trends Neurosci 25: 348353, 2002.[CrossRef][ISI][Medline]
Hackett TA. The comparative anatomy of the primate auditory cortex. In: Primate Audition: Ethology and Neurobiology, edited by Ghazanfar AA. Boca Raton, FL: CRC Press, 2003, p. 199225.
Hall DA, Haggard MP, Akeroyd MA, Palmer AR, Summerfield AQ, Elliott MR, Gurney E, and Bowtell RW. Sparse temporal sampling in auditory fMRI. Hum Brain Mapp 7: 213223, 1999.[CrossRef][ISI][Medline]
Harms MP and Melcher JR. Sound repetition rate in the human auditory pathway: representations in the waveshape and amplitude of fMRI activation. J Neurophysiol 88: 14331450, 2002.
Harms MP and Melcher JR. Detection and quantification of a wide range of fMRI temporal responses using a physiologically-motivated basis set. Hum Brain Mapp 20: 168183, 2003.[CrossRef][ISI][Medline]
Hart HC, Palmer AR, and Hall DA. Amplitude- and frequency-modulated stimuli activate common regions of human auditory cortex. Cereb Cortex 13: 773781, 2003.
Hart HC, Palmer AR, and Hall DA. Different areas of human non-primary auditory cortex are activated by sounds with spatial and nonspatial properties. Hum Brain Mapp 21: 178190, 2004.[CrossRef][ISI][Medline]
Krumbholz K, Patterson RD, Seither-Preisler A, Lammertmann C, and Lutkenhoner B. Neuromagnetic evidence for a pitch processing center in Heschl's gyrus. Cereb Cortex 13: 765772, 2003.
Krumbholz K, Schönwiesner M, Yves von Cramon D, Rübsamen R, Shah NJ, Zilles K, and Fink GR. Representation of interaural temporal information from left and right auditory space in the human planum temporale and inferior parietal lobe. Cereb Cortex 15: 317324, 2005.
Liégeois-Chauvel C, Lorenzi C, Trébuchon A, Régis J, and Chauvel P. Temporal envelope processing in the human left and right auditory cortices. Cereb Cortex 14: 731740, 2004.
Morosan P, Rademacher J, Schleicher A, Amunts K, Schormann T, and Zilles K. Human primary auditory cortex: cytoarchitectonic subdivisions and mapping into a spatial reference system. NeuroImage 13: 684701, 2001.[ISI][Medline]
O'Loughlin BJ and Moore BCJ. Off-frequency listening: effects on psychophysical tuning curves obtained in simultaneous and forward masking. J Acoust Soc Am 69: 11191125, 1981.[CrossRef][ISI][Medline]
Palmer AR. Neural signal processing. In: Hearing, edited by Moore BCJ. San Diego, CA: Academic Press, 1995, p. 75121.
Palmer AR, Bullock DC, and Chambers JD. A high-output, high-quality sound system for use in auditory fMRI. NeuroImage 7: S359, 1998.
Patterson RD, Handel S, Yost WA, and Datta AJ. The relative strength of the tone and the noise components in iterated rippled noise. J Acoust Soc Am 100: 32863294, 1996.[CrossRef]
Patterson RD, Uppenkamp S, Johnsrude I, and Griffiths TD. The processing of temporal pitch and melody information in auditory cortex. Neuron 36: 767776, 2002.[CrossRef][ISI][Medline]
Penagos H, Melcher JR, and Oxenham AJ. A neural representation of pitch salience in nonprimary human auditory cortex revealed with functional magnetic resonance imaging. J Neurosci 24: 68106815, 2004.
Penhune VB, Zatorre RJ, Macdonald JD, and Evans AC. Interhemispheric anatomical differences in human primary auditory cortex: probabilistic mapping and volume measurement from magnetic resonance scans. Cereb Cortex 6: 661672, 1996.
Picton TW, John MS, Dimitrijevic A, and Purcell D. Human auditory steady-state responses. Int J Audiol 42: 177219, 2003.[ISI][Medline]
Rauschecker JP and Tian B. Mechanisms and streams for processing of "what" and "where" in auditory cortex. Proc Natl Acad Sci USA 97: 1180011806, 2000.
Ro
B, Borgmann C, and Draganova R. A high-precision magnetoencephalographic study of human auditory steady-state responses to amplitude-modulated tones. J Acoust Soc Am 108: 679691, 2000.[CrossRef][ISI][Medline]
Schreiner CE and Urbas JV. Representation of amplitude-modulation in the auditory-cortex of the cat. 1. The anterior auditory field (AAF). Hear Res 21: 227241, 1986.[CrossRef][ISI][Medline]
Schreiner CE and Urbas JV. Representation of amplitude-modulation in the auditory-cortex of the cat. 2. Comparison between cortical fields. Hear Res 32: 4964, 1988.[CrossRef][ISI][Medline]
Seifritz E, Esposito F, Hennel F, Mustovic H, Neuhoff JG, Bilecen D, Tedeschi G, Scheffler K, and Di Salle F. Spatiotemporal pattern of neural processing in the human auditory cortex. Science 297: 17061708, 2002.
Shackleton TM, Meddis R, and Hewitt MJ. Across frequency integration in a model of lateralization. J Acoust Soc Am 91: 22762279, 1992.[CrossRef]
Shannon RV, Zeng FG, Kamath V, Wygonski J, and Ekelid M. Speech recognition with primarily temporal cues. Science 270: 303304, 1995.
Stern RM and Trahiotis C. Models of binaural interaction. In: Hearing, edited by Moore BCJ. San Diego, CA: Academic Press, 1995, p. 347386.
Talavage TM, Sereno MI, Melcher JR, Ledden PJ, Rosen BR, and Dale AM. Tonotopic organization in human auditory cortex revealed by progressions of frequency sensitivity. J Neurophysiol 91: 12821296, 2004.
Tian B, Reser D, Durham A, Kustov A, and Rauschecker JP. Functional specialization in rhesus monkey auditory cortex. Science 292: 290293, 2001.
Wang X, Lu T, and Liang L. Cortical processing of temporal modulations. Speech Commun 41: 107121, 2003.[CrossRef]
Warren JD and Griffiths TD. Distinct mechanisms for processing spatial sequences and pitch sequences in the human auditory brain. J Neurosci 23: 57995804, 2003.
Warren JD, Zielinski BA, Green GGR, Rauschecker JP, and Griffiths TD. Perception of sound-source motion by the human brain. Neuron 34: 139148, 2002.[CrossRef][ISI][Medline]