We examined the mechanics of online experience-dependent auditory plasticity by assessing the influence of prior context on the frequency-following responses (FFRs), which reflect phase-locked responses from neural ensembles within the subcortical auditory system. FFRs were elicited to a Cantonese falling lexical pitch pattern from 24 native speakers of Cantonese in a variable context, wherein the falling pitch pattern randomly occurred in the context of two other linguistic pitch patterns; in a patterned context, wherein, the falling pitch pattern was presented in a predictable sequence along with two other pitch patterns, and in a repetitive context, wherein the falling pitch pattern was presented with 100% probability. We found that neural tracking of the stimulus pitch contour was most faithful and accurate when listening context was patterned and least faithful when the listening context was variable. The patterned context elicited more robust pitch tracking relative to the repetitive context, suggesting that context-dependent plasticity is most robust when the context is predictable but not repetitive. Our study demonstrates a robust influence of prior listening context that works to enhance online neural encoding of linguistic pitch patterns. We interpret these results as indicative of an interplay between contextual processes that are responsive to predictability as well as novelty in the presentation context.
NEW & NOTEWORTHY Human auditory perception in dynamic listening environments requires fine-tuning of sensory signal based on behaviorally relevant regularities in listening context, i.e., online experience-dependent plasticity. Our finding suggests what partly underlie online experience-dependent plasticity are interplaying contextual processes in the subcortical auditory system that are responsive to predictability as well as novelty in listening context. These findings add to the literature that looks to establish the neurophysiological bases of auditory system plasticity, a central issue in auditory neuroscience.
- supersegmental processing
- subcortical auditory plasticity
- frequency-following response
- predictive tuning
human auditory perception often occurs in dynamic and complex listening environments. For the most part, humans demonstrate remarkably successful and consistent perceptual abilities in everyday communication. Accurate perception requires that the auditory system discovers and extracts behaviorally relevant regularities from a nonstationary soundscape, and fine-tunes and reorganizes sensory signal on the fly (Large and Jones 1999; Winkler et al. 2009). The ability of the auditory system to perform these processes online in the ambient auditory environment has been extensively studied as a form of online experience-dependent auditory plasticity (Chandrasekaran et al. 2014; Skoe et al. 2014). At least two fundamental neural mechanisms have been hypothesized to underlie online auditory plasticity: predictive coding, a process that computes and extracts statistical relationships of objects in stimuli, from which expectancies of future sounds can be built and continuously tested (Baldeweg 2006; Chandrasekaran et al. 2014; Lupyan and Clark 2015), and stimulus-specific adaptation (SSA), a process that attenuates repetitive sensory presentation to enhance the processing of novel stimuli (Natan et al. 2015).
Prior work on animal models has shown that online subcortical encoding of auditory signals is dynamically modified by top-down cortical feedback (Suga 2008). This top-down mechanism is executed via corticofugal pathways, which are feedback loops that back-project from auditory cortical regions onto subcortical structures like the inferior colliculus (IC) (Winer et al. 1998). Also, direct neuron recordings from anesthetized rats have indicated that IC neurons demonstrate SSA to commonly recurring auditory stimuli (Pérez-González et al. 2005; Malmierca et al. 2009). Although the exact neuronal mechanism underlying SSA is still under debate, a neuronal cooling study has shown that SSA at the level of the IC was largely unaffected by the cortex (Anderson and Malmierca 2013). An emerging view is that SSA at the level of the IC is largely generated by a mechanism local to the IC that is sensitive to stimulus novelty [i.e., the difference of a stimulus in one or more dimensions compared with previously occurring stimuli (Näätänen and Picton 1987; Wang et al. 2010)]. Repetitive presentation, in which a stimulus is always identical to its prior stimulus given any position, may lead to greater synaptic depression, leading to more robust detection of stimuli that are more novel (i.e., less repetitive) in other presentation contexts.
In humans, a recent functional magnetic resonance imaging (fMRI) study has shown that the processing of unexpected auditory stimuli in an oddball paradigm yielded activation in the left IC, thus converging with evidence from animal models on the role of the IC in online subcortical plasticity (Cacciaglia et al. 2015). However, electrophysiological studies looking at the wave V of auditory brainstem response (ABR), which is thought to be generated in ensembles within the IC (Picton 2010), failed to find such an unexpected deviance-related effect (Slabu et al. 2010; Althen et al. 2011). One interpretation to this lack of effect is that the wave V may be generated at the ascending lemniscal portions of the IC, which are not sensitive to novelty (Escera et al. 2014).
The scalp-recorded frequency-following responses (FFRs), on the other hand, reflect phase-locked responses from neural ensembles within the auditory brainstem and midbrain following the phasic ABR (Chandrasekaran and Kraus 2010). Prior work in humans has extensively characterized the FFR as an index of long-term and short-term subcortical auditory plasticity. Although FFRs, unlike cortical responses, cannot be evoked based on top-down online auditory regularities (e.g., evoked to a suddenly omitted stimulus in a highly regular sequence) (Lehmann et al. 2016), previous studies have shown that the presumably bottom-up FFRs can at least be modulated by online auditory regularities. Representations of stimulus features in the FFR are enhanced when a stimulus is highly predictable, such as within a repetitive presentation context (Chandrasekaran et al. 2009; Parbery-Clark et al. 2011; Strait et al. 2011) or in musical patterns (Skoe et al. 2013). Extrapolating from these studies, subcortical auditory processing, as indexed by the FFRs, is less robust when the stimulus presentation is not easily predictable, such as when presented as an oddball (Slabu et al. 2012; Skoe et al. 2014) or in a random context (Chandrasekaran et al. 2009; Parbery-Clark et al. 2011; Strait et al. 2011; Skoe et al. 2013).
These FFR studies have provided critical evidence that stimulus predictability modulates online subcortical plasticity. Such predictability-related plasticity may be driven by top-down predictive coding via corticocollicular feedback loops (Chandrasekaran et al. 2014). Yet, the extent to which stimulus predictability interacts with stimulus novelty in mediating neural plasticity in the same participants, to our knowledge, has not been examined. In previous study designs, stimulus novelty in the listening context covaried with transitional probability (i.e., the probability of the target stimulus transitioning from any stimulus in a single step) (Chandrasekaran et al. 2009; Strait et al. 2011; Parbery-Clark et al. 2011; Slabu et al. 2012; Skoe et al. 2014). Therefore, disentangling potential local novelty-dependent plasticity from top-down predictability-dependent plasticity in previous human FFR studies is challenging. As such, despite evidence of local SSA demonstrated in the IC in animal models, compelling evidence of the interaction between local novelty-related mechanisms and top-down predictability-related mechanisms in mediating human subcortical auditory plasticity is still lacking.
To address this research gap, the current study investigated the extent to which stimulus novelty modulated subcortical auditory encoding in addition to stimulus predictability in the same participants. Here, we examined the neural encoding of dynamic lexical pitch patterns in Cantonese (Fig. 1), a tone language, in native speakers of Cantonese. Participants listened to the same falling pitch pattern in three different contexts: 1) a variable context, wherein, the tone occurred randomly with two other tones at a 33% probability; 2) a repetitive context, wherein the tone was presented at a 100% probability; and 3) a patterned context, wherein the tone was presented nonrepetitively in a predictable pattern with two other tones (100% transitional probability). Due to prior evidence that has shown both novelty-dependent (Cacciaglia et al. 2015) and predictability-dependent forms of subcortical plasticity on humans (Chandrasekaran et al. 2009; Slabu et al. 2012; Strait et al. 2011), we predicted that both stimulus predictability and stimulus novelty would modulate Cantonese speakers subcortical representation of the target falling lexical pitch pattern. Given that the target pitch pattern was more novel in the variable context (lower stimulus probability) compared with the repetitive context, if effects of stimulus novelty were more robust than stimulus predictability, then we would expect subcortical representations to be enhanced in the variable context relative to the repetitive context. However, in line with prior studies examining segmental speech information (Chandrasekaran et al. 2009; Slabu et al. 2012; Strait et al. 2011), we this predicted enhanced representation of the lexical pitch pattern in the repetitive context, relative to the variable context. Such results would not only extend the finding of context-dependent subcortical processing to the domain of suprasegmental information in speech signals but also suggest top-down processes like predictive coding may be more robust in mediating plasticity than stimulus novelty-related local processes. Crucially, on top of this, we predicted most robust encoding in the patterned context, which would confirm the effect of stimulus novelty-related modulation independent of stimulus predictability-related effects. Our basis for this prediction was that since transitional probability was 100% in both patterned and repetitive contexts (hence identical stimulus predictability), top-down processes would be equally active in extracting regularities in the signal. However, relative to the repetitive context, local processes such as synaptic depression may be minimized in the patterned context in which Tone 4 presentations were more novel (i.e., less repetitive) due to its prior patterned presentation with the two other tones.
MATERIALS AND METHODS
Twenty-four participants (12 male; age: M: 23.8 yr, SD. 5.1) recruited by advertisements through the mass e-mail services at the Chinese University of Hong Kong were selected for the current study. All participants were native speakers of Hong Kong Cantonese who reported no neurological/psychiatric impairments. Participants self-reported normal hearing in both ears and demonstrated pure-tone air conduction thresholds of 25 dB or better at frequencies of 500, 1,000 2,000, and 4,000 Hz. Informed consent approved by The Joint Chinese University of Hong Kong–New Territories East Cluster Clinical Research Ethics Committee was obtained from each participant before any experimental procedure. Electrophysiological testing took place in the Laboratory for Language, Learning, and the Brain at the Chinese University of Hong Kong. All participants were compensated monetarily for their participation.
Speech stimuli used for electrophysiological testing consisted of three Cantonese lexical tones, namely Tone 1 (T1, high-level pitch pattern), Tone 2 (T2, high-rising pitch pattern), and Tone 4 (T4, low-falling pitch pattern). The three tones had the same syllable /ji/, which in combination with the lexical tones, lead to three different Cantonese words: /ji1/ (T1, “doctor”), /ji2/ (T2, “chair”), and /ji4/ (T4, “son”). The stimuli were identical to ones used in a previous study (Liu et al. 2014). The stimuli were produced by a male native speaker of Cantonese. The stimuli were normalized for duration (175 ms) and intensity (74 dB SPL). As such, f0 (fundamental frequency) contour is the main acoustic feature that differs across the stimuli: the f0 contours for T1, T2, and T4 ranges from 141 to 143, 105 to 132, and 86 to 99 Hz, respectively (Fig. 1). Native speakers of Cantonese (the first and second author) confirmed the stimuli to be natural exemplars of their respective lexical tone categories.
These three Cantonese tones were chosen because their phonemic distinctions have been reported to be stable in the language, i.e., these distinctions do not collapse under diachronic effects like sound change (Mok et al. 2013) or synchronic processing mechanisms like talker normalization (Wong and Diehl 2003) across the population.
Among the three tones, a priori, we chose to focus on FFRs elicited by T4 to avoid a ceiling effect on pitch tracking metrics, because FFR pitch tracking is relatively weaker for the falling tone T4 compared with T1 and T2 (Liu et al. 2014).
We presented the T4 syllable in three contexts, namely a variable context, a repetitive context, and a patterned context (Fig. 2). In the variable context condition, 1,980 sweeps of T4 were presented randomly in the context of T1 (1980 sweeps) and T2 (2,040 sweeps) at a probability of 33% (Fig. 2A, top). In the repetitive context condition, 6,000 sweeps of T4 were presented with a probability of 100% (Fig. 2A, middle). In the patterned context, we presented T4 nonrepetitively along with T1 and T2 in a fixed sequence. In this patterned context condition, 2,000 sweeps of T4 were presented, and each sweep was preceded by a fixed sequence of one trial of T1 followed by one trial of T2 (Fig. 2A, bottom). The transitional probability of occurrence of a T4 trial was therefore controlled at 100% in both patterned context (T4 after a T2) and repetitive context (T4 after a T4) conditions (Fig. 2B, middle and bottom) but was much lower for the variable context at 33% (Fig. 2B, top). To control for the relative location of the T4 trials within the stream of all stimuli across conditions, we conducted separate event-matching (Chandrasekaran et al. 2009) to control for trial order between the variable and repetitive conditions (Fig. 2A, top and middle), as well as the patterned and repetitive conditions (Fig. 2A, middle and bottom).
Electrophysiological Recording Procedures
Electrophysiological recording took place in an acoustically and electromagnetically shielded booth. During recording, participants were told to ignore the stimuli and to rest or sleep in a reclining chair, consistent with prior FFR recording protocols (Krishnan et al. 2004; Skoe and Kraus 2010). Stimuli were presented in a single polarity to the participant's right ear through electromagnetically shielded insert earphones (ER-3A; Etymotic Research, Elk Grove Village, IL) at 80 dB SPL. A 10-min six-talker babble noise in Cantonese was recursively presented in the background at a signal-to-noise ratio level of 0 dB to avoid potential ceiling effects on FFR metrics observed in pilot experiments that were conducted without noise. Stimuli in all conditions were presented with a 250-ms stimulus-onset asynchrony (SOA). Stimuli were presented via the presentation software Neuroscan Stim2 (Compumedics, El Paso, TX). For 20 participants (out of 24) who chose to complete the whole set of electrophysiological recording on a single day, the order of the context conditions was counterbalanced across participants. For the other four participants, electrophysiological recordings for the repetitive and variable context conditions occurred on a single day, and the order of these two conditions was counterbalanced.1 Their recording session for the patterned context condition took place on a later separate date. The recording of each condition lasted roughly 25 min. The total duration of the testing including preparation time lasted ∼90 min including preparation for each participant.
Electrophysiological responses were recorded using a SynAmps2 Neuroscan system (Compumedics) with Ag-AgCl scalp electrodes, and digitized at a sampling rate of 20,000 Hz using CURRY Scan 7 Neuroimaging Suite (Compumedics). We used a vertical electrode montage (Skoe and Kraus 2010) that differentially recorded electrophysiological responses from the vertex (Cz, active) to bilateral linked mastoids (M1 + M2, references), with the forehead as ground. Contact impedance was <2 kΩ for all electrodes.
Filtering, artifact rejection, and averaging were performed offline using CURRY 7 (Compumedics). Responses were bandpass filtered from 80 to 2,500 Hz (12 dB/octave) to isolate subcortical activity from cortical contamination and mimic the phase-locking limit of the subcortical auditory system (Skoe and Kraus 2010). Trials with activities greater than ±35 μV were considered artifacts and rejected. Responses to the T4 stimulus were averaged with a 275-ms epoching window encompassing −50 ms before stimulus onset, the 175 ms of the stimulus, and 50 ms after stimulus offset. Responses in the repetitive context condition were averaged according to their occurrence relative to the order of presentation in the variable context condition and, separately, to the order of presentation in the patterned context condition. The average number of trials was ∼1,800 trials per condition.
FFR data were analyzed using customized MATLAB (The Mathworks, Natick, MA) scripts adapted from the Brainstem Toolbox (Skoe and Kraus 2010). Before analysis, the stimulus was down-sampled to 20,000 Hz to match the sampling rate of the response. For each FFR, we first calculated its estimated onset delay relative to the stimulus presentation time (neural lag) due to neural conduction of the auditory pathway. This neural lag value was computed using a cross-correlation technique that slid the response waveform (the portion of FFR wave from 0 to 175 ms) and the stimulus waveform in time with respect to one another (Liu et al. 2014). The neural lag value (in ms) was taken as the time point in which maximum positive correlation was achieved between 6 and 12 ms, the expected latency of the onset component of the auditory brainstem response, with the transmission delay of the ear inserts also taken into account (Bidelman et al. 2011; Strait et al. 2012). Then, the f0 (fundamental frequency) contour from the averaged FFR was derived using a sliding window autocorrelation-based procedure (Krishnan et al. 2004; Wong et al. 2007). To estimate how f0 values changed through the waveform, the portion of the waveform that corresponds to the vowel portion of the stimulus (25–175 ms of stimulus, c.f. Fig. 1, shifted by neural lag in the FFR) was divided into 100 bins, each 50 ms (49-ms overlap between adjacent time bins). Each of the 100 time bins was time shifted in 1-ms steps with a delayed version of itself, and a Pearson's r was calculated at each 1-ms interval. The time lag to achieve maximum correlation within each bin was recorded. The reciprocal of this time lag represented an estimate of f0 of that bin. The resulting f0 values formed a 100-point f0 contour. The f0 contour of the stimulus was also derived separately using the same procedure, but the 25- to 175-ms analysis window of the waveform was not shifted by the neural lag. Subsequent analyses focused on whether and how neural pitch tracking varied as a function of online stimulus predictability and novelty.
We derived two main metrics previously used to define the fidelity of the neural responses to linguistic pitch patterns (Wong et al. 2007; Song et al. 2008; Skoe et al. 2014; Liu et al. 2014): 1) Stimulus-to-response correlation, and 2) f0 error. Stimulus-to-response correlation (values between −1 and 1) is the Pearson's correlation coefficient (r) between the stimulus and response f0 contours. It indicates the similarity between the stimulus and response f0 contours in terms of the strength and direction of their linear relationship (Wong et al. 2007; Liu et al. 2014). F0 error (in Hz) is the mean absolute Euclidean distance between the stimulus and response f0 contours across the 100 bins in the autocorrelation analysis. This metric represents the pitch encoding accuracy of the FFR by reflecting how many Hz the FFR f0 contour deviates from the stimulus f0 contour on average (Song et al. 2008; Skoe et al. 2014).
In addition, the signal-to-noise ratio (SNR) of each FFR were also derived to assess whether the overall magnitude of neural activation over the entire FFR period (relative to prestimulus baseline) (Russo et al. 2004) varied as a function of stimulus context. To derive the SNR of each FFR, the root mean square (RMS) amplitudes (the mean absolute value of all sample points of the waveform within the respective time windows, in μV) of the FFR period (neural lag to neural lag +175 ms) and the prestimulus baseline period (−50 to neural lag) of the waveform were first recorded. The quotient of the FFR RMS amplitude and the prestimulus RMS amplitude was taken as the SNR value (Russo et al. 2004).
Before subsequent parametric statistical analyses, stimulus-to-response correlation values were first converted into Fisher's z' scores (Wong et al. 2007), as Pearson's correlation coefficients do not comprise a normal distribution. To directly examine the extent to which FFR pitch encoding and phase locking varied as a function of the three types of stimulus context (variable, repetitive, and patterned contexts) overall, we conducted one-way repeated-measures ANOVAs on the FFR metrics (stimulus-to-response correlation, f0 error, and SNR). We note that directly comparing encoding across the three contexts conditions (comparing the event-matched variable and patterned conditions with the nonevent-matched patterned condition) is limited by a potential confound of presentation order. To account for this confound, we conducted a second analysis that compared the T4 FFRs in the variable condition to the event-matched T4 FFRs in the repetitive condition. A third analysis was conducted to compare the T4 FFRs in the patterned condition to the event-matched T4 FFRs in the repetitive condition. Two sets of separate paired sample t-tests were used to compare the mean stimulus-to-response correlation, f0 error, and SNR, between variable context and repetitive context conditions and, separately, between patterned context and its separately event-matched repetitive context conditions.
Direct Comparison Between Variable, Repetitive, and Patterned Context Conditions
One-way repeated measures ANOVA on stimulus-to-response correlation with the Greenhouse-Geisser correction revealed significant differences between the three context conditions [F(1.19, 34.947) = 10.607, P = 0.001]. Planned comparisons revealed that stimulus-to-response correlation of the repetitive condition was significantly higher than that of the variable condition [F(1, 23) = 5.007, P = 0.035], and that stimulus-to-response correlation of the patterned condition was significantly higher than that of the repetitive condition [F(1, 23) = 9.684, P = 0.005].
The one-way repeated measures ANOVA on f0 error with the Greenhouse-Geisser correction also revealed significant differences between the three context conditions [F(1.482, 34.095) = 8.973, P = 0.002]. Planned comparisons revealed that f0 error of the repetitive condition was significantly lower than that of the variable condition [F(1, 23) = 8.338, P = 0.008] and that f0 error of the patterned condition was marginally lower than that of the repetitive condition [F(1, 23) = 3.713, P = 0.066]. The one-way repeated measures ANOVA on SNR with the Greenhouse-Geisser correction was not significant [F(1.894, 43.569) = 1.001, P = 0.372]. Planned comparisons on the SNR metric, between repetitive and variable conditions [F(1, 23) = 0.389, P = 0.539], and between repetitive and patterned conditions were not significant [F(1, 23) = 0.753, P = 0.395].
Event-Matched Comparison Between Variable and Repetitive Context Conditions
Figure 3, A and B, shows the grand averaged waveforms of event-matched FFRs to Cantonese T4 /ji4/ syllable in variable context and repetitive context conditions, and the corresponding spectrograms.
We observed context-dependent effects, more specifically to this comparison, effects of online stimulus predictability, on both of our pitch tracking metrics [Fig. 3, C and D]. We found a higher stimulus-to-response correlation [t(23) = −2.697, P = 0.013] and lower f0 error [t(23) = 3.005, P = 0.006] in the repetitive context condition relative to the variable context condition, indicating that the encoding of a dynamic pitch pattern was more faithful when online stimulus context was predictable. No significant context-dependent effect was found for the SNR measure [t(23) = 0.615, P = 0.545].
Event-Matched Comparison Between Patterned and Repetitive Context Conditions
Figure 4, A and B, shows the grand averaged waveforms of separately event-matched FFRs to Cantonese T4 /ji4/ syllable in patterned context and repetitive context conditions, and the corresponding spectrograms.
From our metrics, a context-dependent online stimulus novelty effect on pitch encoding was observed (Fig. 4, C and D). We found that pitch encoding was more faithful when stimulus context was patterned rather than repetitive, indicated by a higher stimulus-to-response correlation [t(23) = 2.876, P = 0.009] and lower f0 error [t(23) = −3.057, P = 0.006] in the patterned context condition relative to the repetitive context condition. No significant context-dependent effect was found for SNR [t(23) = −0.788, P = 0.439].
Our results demonstrate a clear influence of context on the representation of lexical pitch patterns in human FFRs. We found that the pitch tracking in FFRs to a falling lexical pitch pattern (T4) was more faithful (higher stimulus-to-response correlation) and accurate (lower f0 error) when the pitch pattern was presented in a repetitive context relative to a variable context. This is consistent with prior work using segmental speech stimuli (Chandrasekaran et al. 2009; Strait et al. 2011). Interestingly, we found that when the transitional probability of occurrence (hence stimulus predictability) was controlled, FFR pitch tracking was more faithful and accurate when the stimulus was patterned but not repetitive.
Prior research has extensively characterized the FFR as an index of subcortical auditory plasticity (Chandrasekaran and Kraus 2010). However, the extent to which cortical contributions to FFRs can be ruled out has sparked attention lately. For example, a recent magnetoencephalography (MEG) study has shown a right-asymmetric contribution of the auditory cortex to FFRs (Coffey et al. 2016). Nevertheless, external evidence has at least suggested that the dominant source of FFRs is subcortical. Human FFRs have an upper limit of ∼1,000 Hz (Chandrasekaran and Kraus 2010). Animal models on other species of mammals have demonstrated a similar range of phase-locking abilities in neuronal populations within the IC (Liu et al. 2006), while units at the auditory cortex only demonstrate phase-locking up to ∼250 Hz (Wallace et al. 2005). Also, FFRs, compared with cortical auditory-evoked potentials, are smaller in amplitudes; FFRs also demonstrate much lower latency variability and earlier maturation (Chandrasekaran and Kraus 2010). A recent study using source dipole modeling and three-channel Lissajous analysis on high-density multichannel-recorded FFRs has also suggested the midbrain to be the putative generator of speech FFRs (Bidelman 2015). Since all these external evidence has suggested that the dominant source of FFRs is subcortical, we hereby interpret our current findings as a demonstration of the subcortical auditory system being sensitive to both stimulus novelty and predictability. As such, there may be multiple neural mechanisms that interactively influence online subcortical encoding.
Top-Down and Local Processes and Their Interplay
Per animal models, at least two interactive mechanisms are likely to be involved in context-dependent online subcortical plasticity: a top-down corticofugal mechanism that automatically fine tunes the representation of stimulus features that matches top-down expectation, and a local mechanism that enhances representation of novel information (Chandrasekaran et al. 2014). We posit that more faithful and accurate neural pitch tracking in the repetitive context relative to the variable context may be driven by the higher transitional probability in the repetitive condition relative to the variable condition, despite the target stimulus being more novel in the latter condition. The high transitional probability in the repetitive condition may thus result in greater top-down predictive coding, the effects of which may have overridden that of local novelty enhancement, thereby enhancing subcortical pitch representation. In this experiment, we collected FFRs using a passive listening paradigm wherein participants did not pay overt attention to the stimulus stream. Our findings of a context-related effect are therefore likely to be a fundamental process of auditory processing wherein highly automatic processes are operative even without overt attention or explicit goal-directed behavior. Future studies can systematically test the extent to which overt attention to the stimulus pattern may modulate the magnitude of top-down modulation. For example, testing context-dependent subcortical encoding in different sleeping states (e.g., awake vs. asleep) or while requiring participants to explicitly track the stimulus pitch patterns may be informative.
The poorer encoding in the repetitive condition relative to the patterned condition, wherein the transitional probability is controlled (hence equally robust predictive coding), is likely a result of reduced local responsitivity to the repetitive stimulus at the subcortical auditory system, as a result of SSA. Prior studies have discussed the possibility of SSA in modulating subcortical encoding online (Slabu et al. 2012; Skoe et al. 2014). These studies have used auditory oddball paradigms to elicit FFRs in the standard (high-probability) and deviant (low-probability) conditions, respectively. FFR studies using passive oddball paradigms have found that FFRs are more robust for the highly repetitive standard stimulus, which may either suggest that SSA is not reflected in the FFR, which is sensitive to neural phase-locking (Skoe et al. 2014), or that effects of SSA cannot be disambiguated from predictive coding with an oddball paradigm in which stimulus novelty and probability covary (Slabu et al. 2012). Here, we controlled for transitional probability while manipulating stimulus novelty in our study by employing an event-matched comparison between patterned and repetitive contexts. Our results demonstrate a novelty-related enhancement effect on FFRs (patterned > repetitive), suggesting that predictability and novelty both drive context-dependent auditory plasticity.
An intriguing possibility is that the context-dependent effects found in this study represent an interaction between online auditory plasticity and speech processing in the presence of background noise. The use of babble noise in our stimulus presentation was intended to avoid a ceiling effect on pitch tracking observed during pilot experiments where speech was presented in quiet. However, it is possible that the presentation of speech-in-noise could invoke a greater interaction between top-down modulation and local adaptation mechanisms than quiet presentation. For example, voice pitch is an important cue in tagging auditory streams (Snyder and Alain 2007). Hence, a constant repetitive pitch in the repetitive condition may be tagged more easily as a separate stream from the background noise relative to the variable condition. In other words, the randomly presented stimuli in the variable condition may have resulted in greater neural inhibition because the stimuli were tagged as noise. This neural habituation may have disproportionately impacted subcortical encoding in the variable condition in addition to the low transitional probability. This interpretation would imply an intricate interaction not just between predictive coding and SSA but also a generalized noise-related neural habituation process that resulted in context-dependent modulation. Future studies could introduce different SNR levels of background noise and/or change pitch cues that results in a change in talker, but not tone identity, as a factor in their experimental design to test this possibility.
Animal studies have shown that subcortical auditory encoding can be modulated by online stimulus statistics (Dean et al. 2005; Pérez-González et al. 2005; Malmierca et al. 2009) and behaviorally relevant auditory experience (Suga 2008) due to an interaction of local adaptation and top-down modulation through corticofugal pathways (Malmierca et al. 2009). Human studies have shown that subcortical encoding of speech and music sounds is modulated by prior listening contexts which vary in statistical features of the auditory input (Chandrasekaran et al. 2009; Strait et al. 2011; Parbery-Clark et al. 2011; Slabu et al. 2012; Skoe et al. 2013, 2014). A growing body of literature has also demonstrated that subcortical auditory encoding is enhanced if the signals are behaviorally relevant, such as when they serve linguistic purposes (Krishnan and Gandour 2009) and are ecologically valid (Xu et al. 2006; Krishnan et al. 2009). Together, these findings are shaping an emerging view that subcortical structures are active processors that can be modulated by online listening contexts, among other factors such as long-term and short-term auditory experience (Chandrasekaran and Kraus 2010), to achieve subcortical auditory plasticity (Chandrasekaran et al. 2014). However, how online listening context interacts with these other types of auditory experiences to shape subcortical auditory plasticity is still an open question. Prior studies, using repetitive stimuli presentation, have demonstrated that native tone-language speakers exhibit superior subcortical pitch encoding ability to lexical tones, presumably because of their life-long native tone language experience (Krishnan et al. 2005, 2009, 2010; Bidelman et al. 2011; Krishnan et al. 2016).
Skoe et al. (2014), on the other hand, investigated the extent to which subcortical encoding of linguistic pitch patterns (i.e., lexical tones) was modulated by listening context in native English speakers (with no prior tone language experience) and how this context-dependent encoding changed after participants underwent an extensive sound-to-meaning auditory training program that rendered the tones as behaviorally relevant. Before training, they found that subcortical pitch tracking of lexical tones was enhanced when the tones were presented with a higher probability in the context, relative to when presented with a lower probability. Interestingly, posttraining, there was no probability-dependent enhancement effect on tone encoding. They argued that this loss of probability-dependent enhancement was due to the stimuli becoming less novel after the tones' linguistic relevance was acquired. However, in the current study we found context-dependent enhancement on native tone language speakers, demonstrating that linguistic relevance is unlikely to result in a loss of context-dependent plasticity. Instead, we posit that the lack of probability-dependent enhancement after training, seen in the study by Skoe et al. (2014), may reflect a less efficient top-down modulation for the nonnative learners, who have only had very limited exposure to the lexical tones.
In summary, the current study shows a robust influence of prior listening context that enhances online subcortical encoding of a dynamic, time-varying linguistic pitch pattern. Encoding is more robust when a sound is more predictable and novel in a listening context. These findings demonstrate a complex interplay between top-down predictive coding and local SSA processes at the subcortical level that tunes sensory signal online based on stimulus history. These two processes are likely driven by at least two general neurobiological mechanisms: predictive coding which enhances predictable sensory input, and SSA which reduces responsitivity to repetitive sensory input. Together, we interpret this context-dependent encoding as indicative of an interaction between online and long-term auditory experience that shapes neural plasticity in the subcortical auditory system.
This work was supported by the National Institute on Deafness and Other Communication Disorders Grant 1R01-DC-013315 (to B. Chandrasekaran), the Global Parent Child Resource Centre Limited (to P. C. M. Wong), and Dr. Stanley Ho Medical Development Foundation (to P. C. M. Wong).
No conflicts of interest, financial or otherwise, are declared by the author(s).
J.C.Y.L. and P.C.M.W. performed experiments; J.C.Y.L., P.C.M.W., and B.C. analyzed data; J.C.Y.L., P.C.M.W., and B.C. interpreted results of experiments; J.C.Y.L. prepared figures; J.C.Y.L. and B.C. drafted manuscript; J.C.Y.L., P.C.M.W., and B.C. edited and revised manuscript; J.C.Y.L., P.C.M.W., and B.C. approved final version of manuscript; P.C.M.W. and B.C. conceived and designed research.
We thank Zilong Xie for comments on drafts of this manuscript and Hilda Chan, Jason Ho, Yinyin Liang, Christine Liu, and Grace Pan for assistance with data collection. We also thank Oliver Bones and Fang Liu for help on MATLAB coding.
↵1 For the four participants whose electrophysiological recording for the patterned context condition took place on a separate day, a Wilcoxon signed-rank test confirmed that their FFR SNRs of the patterned context (M = 1.64, SD = 0.32) was not statistically different (Z = −0.365, P = 0.715) from the rest of the participants who completed all recordings on a single day (M = 1.66, SD = 0.47). A separate Wilcoxon signed-rank test confirmed that the four participants' FFR SNRs between the event-matched patterned (M = 1.64, SD = 0.32) and repetitive (M = 1.62, SD = 0.11) conditions were not statistically different either (Z = 0.00, P = 1). These suggest that the four participants' electrophysiological recordings were consistent across the 2 days of experiment, and also with participants who completed the experiments on a single day. (On how the SNR metric was derived, readers are referred to materials and methods.)
- Copyright © 2017 the American Physiological Society