|
|
||||||||
Center for Neural Science, New York University, New York, New York
Submitted 14 November 2006; accepted in final form 3 July 2007
| ABSTRACT |
|---|
|
|
|---|
20 Hz. Physiological studies, however, have typically emphasized the upper limits of modulation encoding. Responses to sinusoidal AM (SAM) are generally summarized by modulation transfer functions (MTFs), which emphasize tuning to modulation frequency rather than the representation of the instantaneous stimulus amplitude. Unfortunately, MTFs fail to capture important but nonlinear aspects of amplitude coding in the central auditory system. We focus on an alternative data representation, the modulation period histogram (MPH), which depicts the spike train folded on the modulation period of the SAM stimulus. At low modulation frequencies, the fluctuations of stimulus amplitude in decibels are robustly encoded by the cycle-by-cycle response dynamics evident in the MPH. We show that all of the parameters that define a SAM stimulus—carrier frequency, carrier level, modulation frequency, and modulation depth—are reflected in the shape of cortical MPHs. In many neurons that are nonmonotonically tuned for sound amplitude, the representation of modulation frequency is typically sacrificed to preserve the mapping between the instantaneous discharge rate and the instantaneous stimulus amplitude, resulting in two response modes per modulation cycle. This behavior, as well as the relatively poor tuning of cortical MTFs, suggests that auditory cortical neurons are not well suited for operating as a "modulation filterbank." Instead, our results suggest that <20 Hz, the processing of modulated signals is better described as envelope shape discrimination rather than modulation frequency extraction. | INTRODUCTION |
|---|
|
|
|---|
The most commonly studied form of modulation, sinusoidal AM (SAM), has often been used to characterize the temporal aspects of the responses of central auditory neurons (Bieser and Muller-Preuss 1996
; Creutzfeldt et al. 1980
; Eggermont 1991
, 1994
; Frisina et al. 1990
; Gaese and Ostwald 1995
; Krishna and Semple 2000
; Langner 1992
; Langner and Schreiner 1988
; Rees and Moller 1983
, 1987
; Schreiner and Urbas 1988
). Four parameters are needed to specify a SAM stimulus: carrier frequency, carrier level, modulation frequency, and modulation depth. Historically, however, many studies of cortical responses to SAM have focused almost exclusively on modulation frequency because that is what cortical neurons were thought to encode. For example, it has been shown in awake macaque monkeys (Malone et al. 2000
) and marmosets (Liang et al. 2002
) that the modulation frequencies eliciting the strongest responses in primary auditory cortex (AI) are correlated when responses to SAM and sinusoidal FM (SFM) are compared. On this basis, Liang et al. (2002)
concluded that "..it is the temporal modulation, and not the amplitude or FM per se that most auditory cortical neurons appear to extract from a complex acoustic environment (p. 2257)." While we agree that the correlation suggests that common temporal constraints influence how individual AI neurons respond to SAM and SFM, the broader conclusion that AI neurons respond to an abstracted "temporal modulation" is unwarranted.
Rees and Moller (1987)
emphasized that neurons of the inferior colliculus "do not function as an array of stimulus invariant modulation frequency detectors" of the sort appropriate to a modulation filterbank, but rather "carry a selectively emphasized version of the input signal's amplitude envelope which is modified by the prevailing stimulus conditions." This notion gained support from the observation of Krishna and Semple (2000)
that the best modulation frequency (BMF) of many neurons in the inferior colliculus (IC) changed substantially when the carrier level was varied over more than a 20-dB range. In contrast, Liang et al. (2002)
reported that BMF was relatively invariant when carrier level and modulation depth were varied in AI. These competing views could be reconciled by evidence that successive levels in the auditory pathway are increasingly unaffected by changes in SAM parameters such as carrier level or modulation depth. Nevertheless, our results indicate that responses of central auditory neurons remain sensitive to SAM parameters other than modulation frequency, and are inadequately characterized by the modulation transfer function (MTF) describing how average firing rate and response synchrony vary with modulation frequency, and by derivative summary measures such as the BMF.
The most obvious difference between a neural code for modulation frequency and a neural code for changes in sound amplitude pertains to how cortical neurons represent changes in SAM stimuli that affect the stimulus amplitude but do not affect the modulation frequency, such as changes in carrier level or modulation depth. If cortical neurons code for stimulus amplitude, changes in such parameters should interact with an individual neuron's tuning for sound amplitude because that tuning is the basis for coding amplitude changes. For example, a unit's nonmonotonic tuning for sound pressure level (SPL) should be reflected in its response to SAM, even if such tuning introduces response components at modulation frequencies that are not present in the acoustic signal. Thus amplitude coding would compromise the modulation frequency code.
To reveal what AI neurons actually encode, we will concentrate on a data representation called the modulation period histogram (MPH), which shows the occurrence of action potentials relative to the period of the modulating stimulus waveform. It is common to collapse this distribution into summary measures such as vector strength and mean phase, which are plotted against modulation frequency in the modulation transfer function (MTF). By using detailed MPH examples from individual neurons, we will show that the responses of many cortical neurons cannot be adequately captured in this manner, particularly at low modulation frequencies. In this modulation range, it is reasonable to define the "instantaneous" SPL of the SAM stimulus, because the modulation period is long relative to the carrier period. Calculation of the "SPL profile" of the SAM stimulus allows for direct comparison of the amplitude envelope to the cycle-by-cycle response profile depicted by the MPH. Examination of the data in this format reveals that cortical responses to modulation frequencies in the range most important for communication sounds unequivocally encode changes in sound amplitude and are robustly sensitive to all parameters defining the SAM signal.
| METHODS |
|---|
|
|
|---|
Two adult male monkeys (Macaca mulatta, designated X and Z) participated in these experiments. All procedures pertaining to animal use and welfare in this study were reviewed and approved by the New York University Institutional Animal Care and Use Committee. Before implant surgery, anesthesia was induced with ketamine and sodium thiopental, and a surgical plane was maintained with isoflurane. This first implant was a head-holder that mated to a specially designed primate chair (Crist Instruments, Hagerstown, MD). After behavioral training, a recording chamber (CalTech Engineering Services, Pasadena, CA) was implanted above the auditory cortex in the left hemisphere of each animal. The initial placement of the recording chamber on monkey Z was slightly rostral to allow recordings across the rostral (R) and rostrotemporal (RT) fields (Hackett et al. 1998
). The back of the initial chamber and the front of the chamber in its second placement straddled the low-frequency portion of primary AI. On completion of the mapping of the left hemisphere, the recording chamber was removed, and the skull was permitted to regrow under a protective layer of acrylic (Palacos). Meanwhile, a new recording chamber was implanted above the putative location of field R on the right hemisphere, which allowed for limited access to AI caudally. The initial implant for animal X was centered over AI in the left hemisphere, and allowed for a complete mapping of AI and portions of the surrounding auditory cortex. When this site was completed and covered, a new recording chamber was centered on the putative low-frequency border of AI/R in the right hemisphere.
All penetrations were made vertically with respect to the cylinder implants and thus roughly parallel to the stereotaxic vertical plane. Animal Z is still involved in experiments, so assignment of recording locations to cortical fields is based on physiological criteria, such as the tonotopic progression in AI and the distribution of response latencies (Scott et al. 2000
). Subsequent histology and postmortem magnetic resonance imaging in animal X confirmed the recording locations to be within primary auditory cortex. We also assigned a relative cortical depth to the neurons in our sample by normalizing the recording depth with respect the first and last points in each penetration where audible "hash" responses could be detected (n = 270). Expressed in quintiles from the shallowest to deepest depths, we obtained the following distribution: 19, 24, 27, 20, and 10%. Although we cannot unequivocally assign our recordings to particular laminae, it is likely that the neurons in our sample came predominantly from the middle and upper layers.
Both animals were extensively trained on binaural lateralization tasks. During recordings, blocks of psychophysical trials alternated with passive listening, when the SAM stimuli described in this report were presented. Behavioral and recording sessions were all conducted in a double-walled sound attenuated chamber (Industrial Acoustics) while the animals were continuously monitored using closed circuit television. Single-unit activity was recorded with tungsten microelectrodes (FHC, Bowdoin, MA) advanced into the brain through a stepping motor microdrive (CalTech Engineering Services, Pasadena, CA). Recording location was referenced to a stereotaxic positioning system that mounted directly on the implant. Depths of all recordings were referenced to entry into the brain. Entry into the superior temporal plane was typically marked by a sudden increase in activity after a long silent interval and the first appearance of auditory responsiveness.
Stimulus generation and data acquisition
Stimulus waveforms were generated by digital synthesizers and custom hardware (MALab, Kaiser Instruments). Stimulus characteristics were specified in software running on the host computer (Macintosh), which communicated with a dedicated microprocessor (MALab) using an IEEE-488 interface. After digital attenuation and D/A conversion, the signal was transduced by electrostatic earphones (STAX Lambda) in custom housings (Custom Sound Systems) fitted to ear inserts. Before each experiment, the SPL expressed in decibels (re: 20 µPa) at each ear was calibrated under computer control for level and phase from 40 Hz to 30 kHz, using a previously calibrated probe tube and condenser microphone (Brüel and Kjær 4134).
Electrical signals from the brain were amplified (variable gain), filtered (typically from 0.25 to 10 kHz), and passed to oscilloscopes, an audio speaker, and an event timer (MALab, Kaiser Instruments). The occurrence of discriminated action potentials and stimulus synchronization events were logged with a resolution of 1 µs and stored by the host computer for analysis and display.
Stimulus protocols
All stimuli described in this report were gated on and off by a cosine-squared ramp (10 ms). Responsive AI neurons were initially characterized with a battery of pure tone stimuli of relatively short duration (typically 100 ms, but occasionally 200 ms). These tonal stimuli were used to determine the frequency tuning function at the best sound level (dB SPL), and the rate-level function at the neuron's best frequency. SAM stimuli were typically presented at best frequency and level. In cases where there was no clear best level because of saturating responses in the range of moderate SPLs, 60 dB SPL was used, which provided a consistent carrier level for the generation of the composite modulation period histograms (![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Fig. 15). This was the most common carrier level used (particularly in animal X) and was used in 178 (49%) cells. SAM stimuli used in this study were typically presented in two consecutive trials of 10 s, separated by a 2-s interstimulus interval. Long stimulus durations were chosen to minimize the effects of onset responses while maximizing the number of modulation periods. This choice was crucial for the low modulation frequencies emphasized in this study. We verified that the effects of onset responses were negligible by recalculating spike counts and spike timing metrics for a subset of the data sample (n = 124). The common practice of excluding responses during the first 100 ms (1%) of the stimulus duration does not significantly impact the distributions of the spike timing metrics described below (Wilcoxon ranked sum, P < 0.7). In fact, this correction, which affected 7 of 333 spikes on average, cannot be resolved when the distributions of the spike counts themselves are compared (P > 0.4). Because stimulus runs generally included an unmodulated control tone of similar duration (10 s), we were also able to compare "sustained" responses elicited by modulated stimuli to the adaptation characteristics of each neuron. In general, cortical neurons continued to respond robustly throughout the duration of SAM stimuli, such that firing rates showed average decreases of roughly 10% from the first to second half of the stimuli (0–5 vs. 5–10 s) and average decreases of roughly 20% from the first to fourth quarter (0–2.5 vs. 7.5–10 s). The distributions of temporal measures such as vector strength and trial similarity were not significantly altered from the first to second halves of the stimuli (Wilcoxon ranked sum, P > 0.2), nor from the first to fourth quarters (P > 0.1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1,000 Hz were presented in steps of 100 Hz. In some cases, intermediate values (e.g., 3 Hz) were also chosen to get more precise estimates of the slopes of the MTF. In many cases, tuning to modulation depth was explored at a range of depths (typically, 0–100% in 10 or 20% steps). Because of the limited recording time available, variations in all four SAM parameters could not feasibly be presented in all neurons. Consequently, runs that varied carrier level and frequency were performed somewhat less frequently, generally in those cells where the isolation was particularly stable and the responses were particularly robust. To reference the SAM stimulus to the decibel scale used to measure the rate level function for each neuron, we computed the "instantaneous" relative amplitude of SAM signals (in dB) by taking the logarithm of the envelope, in decibels: 20 · log{[1 + m · sin(2 · pi · fm · t)]}. A family of curves describing the instantaneous amplitude of the SAM signal relative to an unmodulated carrier signal at various modulation depths is shown in Fig. 2A. With increasing modulation depth, these curves become less sinusoidal, and the falling and rising phases of the envelope become more prominent. There is also an asymmetry in the increases and decreases of SPL within each modulation cycle. When m = 0.1, the SAM signal increases 0.83 dB (0°) and decreases –0.92 dB (180°) relative the carrier. For m = 0.9, these values are 5.5 and –20 dB, respectively. Thus the changes in sound level (dB) for large modulation depths are dominated by the rapid fall and rise of the envelope within a relatively small portion of the modulation cycle centered on 180°. For the low modulation frequencies considered in this study, where the envelope is well defined, it is possible to generate estimates of the instantaneous SPL of SAM signals by adding the relative amplitude to the carrier SPL, as shown for 100% modulated signals at various carrier SPLs in Fig. 3B. For example, a 60-dB SPL SAM signal presented at a modulation depth of 90% (m = 0.9) varies from 65.5 dB SPL to 40 dB SPL during each modulation cycle.
Data analysis
The dominant representation of cortical responses to SAM in this paper is the MPH, which shows the distribution of spike counts for the different phases of the modulation cycle. The MPH is constructed by folding the peristimulus response histogram (PSTH; Fig. 1 B) around the modulation period. The modulation period is inversely related to the modulation frequency (e.g., fm of 2 Hz results in a modulation period of 500 ms). Both the carrier and modulation waveforms were presented in sine phase, resulting in the MPH shown in Fig. 1C. To facilitate interpretation of shapes of the MPHs, however, the responses were shifted by 90° (Fig. 1D), so that responses to the most dramatic changes in stimulus amplitude (instantaneous SPL) are centered in the MPH representation.
Calculations of spike rate were based on the entire stimulus duration. To evaluate the significance of differences in average firing rate, responses were averaged across repeated trials and binned in 1-s epochs. The average firing rate was calculated for each epoch. Calculation of spontaneous rates was based on firing rates for 1-s epochs drawn from all interstimulus intervals in a given stimulus run. Significance was assigned for all comparisons according to the outcome of a heteroschedastic t-test (P < 0.01).
To allow for comparison with other studies, response synchronization at the modulation frequency was quantified in terms of vector strength (VS) (Goldberg and Brown 1969
). Each spike is treated as a unit vector whose angle corresponds to the phase at which it occurred in the modulation cycle. These unit vectors are summed to produce a resultant vector whose length corresponds to the magnitude of the Fourier component of the response at the modulation frequency. Normalizing the resultant vector by the total number of spikes (n) results in the VS, which is bounded from 0 (e.g., spike counts are equal at all phases) to 1 (all spikes occur at the same phase). The direction of the resultant vector indicates the mean phase of the MPH. Computationally, VS is calculated from the MPH as follows: VS = [(ricosi)2 + (risini)2]0.5, where r is the spike count in the ith bin of the MPH. To assess the statistical significance of the VS, the Rayleigh statistic (2VS2n) was computed, and values >13.8 (Mardia and Jupp 2000
) were considered to be significant (P < 0.001). The synchrony cut-off was considered to be the highest tested modulation frequency that resulted in a Rayleigh statistic >13.8. Additional details concerning the problems with applying the VS metric to cortical responses are presented in RESULTS.
As an alternative to VS, we introduce a spike timing index, trial similarity (TS), based on the correlation (i.e., the Pearson's correlation, or product-moment coefficient of correlation) between MPHs constructed independently from trial 1 and trial 2 of the SAM stimulus presentation. All spikes occurring within the stimulus duration were included in the analysis. Although useful alternatives to VS have been proposed (Joris et al. 2006
; Kajikawa and Hackett 2005
), we have chosen to analyze TS because of its simplicity in the context of data collected in two long duration trials. Nevertheless, one could in principle correlate MPHs generated from data distributed across any number of trials, provided that the total stimulus duration was comparable and the conditions for generating the MPHs are properly met. For example, one could generate the first MPH from odd numbered trials and the second from even numbered trials.
Unlike the VS metric, which measures how densely spikes are clustered around a single phase of the MPH, the TS index depends only on the reproducibility of the MPH shapes across trials. Whereas VS measures the synchrony of the neural response, TS measures its fidelity. To calculate the correlation between the MPHs obtained for each stimulus trial, it is first necessary to choose the numbers of bins that comprise the MPHs, and the value of TS will depend on the number of bins used for the correlation. Empirically, MPHs based on 52 bins adequately capture the temporal features of the neural responses, and MTFs based on TS show similar high-frequency cut-offs to those based on VS. The significance criterion for the TS metric is the likelihood that a given correlation coefficient could have been produced by chance (i.e., for 2 random spike trains). We created significance criteria by simulating thousands of pairs of random spike trains and calculating TS across a range of binwidths and spike counts. For the 52 bin MPHs used in this study, TS values of 0.4 and 0.6 correspond conservatively to P values of 0.001 and 0.0001, respectively. Note that it is possible for TS to be negative. Because we never observed a case where TS was significantly negative by the criteria above, however, negative values were simply set to zero.
Stimulus estimation and spike train classification
To determine how much information cortical spike trains provided about stimulus identity, we used a PSTH-based pattern classifier to estimate the stimulus on the basis of 1 s of data (for complete details of the method, see Foffani and Moxon 2004
). Note that only stimuli whose modulation frequency was an integral multiple of 1 Hz were analyzed in this way. For each stimulus in a given set (i.e., the stimuli comprising a modulation depth function, modulation transfer function, or carrier level function), a "template," representing the average response to that stimulus, was formed by folding the response at 1-s intervals and binning the responses into a bin-dimensional vector. The average spike count per bin was obtained by dividing by the number of seconds of data (i.e., 20), unless the template contained the data epoch to be matched—the "test." In such a case, the test was subtracted from the appropriate PSTH before binning and calculating the average spike count per bin (i.e., dividing by 19). This form of classification is referred to as "complete cross-validation"(Foffani and Moxon 2004
). The test was binned similarly.
Each of the 20 tests per stimulus was matched to the template that minimizes the Euclidean distance between the test vector and template vectors. The results of this matching process are stored in a confusion matrix whose columns represent the stimulus that was actually present and whose rows represent the estimate of stimulus identity produced by the classifier. If every test epoch is correctly associated with the stimulus that elicited it, all values of the confusion matrix along the diagonal will be 20 and all off-diagonal entries will be 0. Percent correct for a given stimulus set is obtained by summing along the diagonal and dividing by the product of data epochs (20, in every case we analyzed) and the number of stimuli in the set (e.g., 8, for our typical MTF consisting of responses to 1, 2, 5, 10, 20, 50, 100, and 200 Hz SAM).
The method described above also allows us to parse the contributions of spike timing and spike rate information to the performance of the classifier. For example, the size of the binning applied to the tests and templates will impact the performance of the classifier because it determines the amount of temporal detail available to it. To capture this aspect of its performance, we generated complete confusion matrices for bins 1, 2, 4, 8, 10, 20, 40, and 1,000 ms wide for each stimulus set. Given the use of 1-s data epochs and PSTHs, the inclusion of a single, 1,000-ms-wide bin in the analysis allows us to determine how successfully the stimulus can be estimated based on the spike rate alone. Conversely, it is also possible to eliminate information pertaining to the distribution of firing rates across stimuli by normalizing both the tests and the templates by their respective vector norms. Geometrically, this corresponds to mapping all tests and templates to a hypersurface located at a unit distance from the origin. Subsequent to this normalization, the only information retained by the test and template vectors is the relative distribution of spike probability within a 1-s window. Because spike phase information is retained, we refer to this as the "phase only" classifier. Normalization by the total spike count, rather than the vector norm, produces essentially identical results: the correlation coefficients for classifier performance, in percent correct, across the different normalization schemes were 0.96, 0.99, and 0.97 for modulation depth, modulation frequency, and carrier level, respectively.
To assess whether the classifier performance was significantly better than would be expected by chance, we simulated confusion matrices based on random draws from a given stimulus set over many (10,000) iterations and generated a distribution of percentage correct based on the bootstrap results. If classifier performance exceeded all bootstrapped values, it was considered to be significant (P < 0.0001). Because the typical number of stimulus set elements was not constant across stimulus type (e.g., 8 for modulation frequency vs. 11 for modulation depth), it is not possible to compare classifier performance across different stimulus types directly because the baseline for chance performance varies inversely with stimulus set size. To circumvent this limitation, we standardized classifier performance as a z-score with respect to the appropriate bootstrap distribution by dividing the difference between the actual classifier result and the bootstrap mean by the bootstrap SD.
Although similar classifiers are often applied to stimulus sets that vary categorically (e.g., a set of vocalizations), SAM parameters such as modulation depth varied monotonically. Percentage correct is insensitive to the relative quality of the classifier estimate—for a SAM stimulus modulated at 40% depth, an estimate of 30% is the same as an estimate of 0 or 100%, i.e., a miss. We accounted for the relative quality of the classifier estimates by assigning a cost to each estimate, so that values along the diagonal of the confusion matrix equal zero, and off-diagonal values are multiplied by the distance to the diagonal in each column. Thus for a 40% depth modulation in an 11 by 11 confusion matrix spanning 0 to 100% modulation in 10% intervals, an estimate of 30% entails a cost of 1, whereas 0% has a cost of 4, and 100% has a cost of 6. Significance is assessed by the bootstrap method described above, except the distribution simulated was based on total cost, summed over the confusion matrix, rather than percentage correct. For population comparisons, the total cost for classifier performance on a given stimulus set was normalized by the theoretical maximum cost for a confusion matrix of equivalent size, producing a cost index from 0 (perfect) to 1. In practice, the additional sensitivity provided by the cost index, relative to percentage correct, proved unnecessary for modulation frequency and carrier level because classifier performance was particularly strong in such cases.
| RESULTS |
|---|
|
|
|---|
We will describe data obtained from the responses of 361 neurons tested with SAM stimuli. These data represent a subset of an extensive physiological survey of auditory cortex focused on A1, but perhaps including a few neurons on the borders of adjacent fields. To allow for response class categorization, only recordings that included responses to the unmodulated control tone were included in the data sample. Because we could not detect any obvious differences in the responses from either animal (X or Z) or hemisphere (left or right), we combined the data from four hemispheres in two animal subjects (see METHODS).
Carrier frequencies varied from 0.1 to 32 kHz. At least 30 cells were characterized for each octave with respect to 0.5 kHz (i.e., <0.5, 0.5–1, 1–2 kHz, etc.). The SAM stimulus was presented binaurally in most cases because it elicited more robust responses than monaural stimulation (binaural summation was much more common than suppression). Modulation transfer functions were typically generated at the neuron's best frequency and level. If the rate-level function exhibited a plateau of similar responses that included 60 dB SPL, we used that value as the carrier level. Carrier levels ranged from –10 to 90 dB SPL, but roughly one half of the neurons were tested with a carrier level of 60 dB SPL.
SAM stimuli were very effective for AI neurons in awake rhesus macaques. A neuron was considered to be responsive to SAM if it exhibited either a significantly synchronized response to at least one modulation frequency or a significantly different firing rate from the response to the unmodulated control tone for at least one modulation frequency (see METHODS). By this criterion, only 6 of 361 (0.6%) neurons were considered to be unresponsive to 100% modulated SAM signals presented at the cell's best carrier frequency and level (or 60 dB SPL). We typically did not record SAM responses for cells that had been deemed generally unresponsive during initial testing with tonal stimuli, so this result likely overestimates the prevalence of SAM responsiveness in AI. Nevertheless, we attempted to obtain an MTF for all neurons exhibiting robust responses to pure tones, so the data sample can be considered representative of such neurons.
In addition to characterizing cortical responses to tones of short duration (typically 100 ms), we also measured responses to an unmodulated tone of the same duration as the modulated stimuli (10 s). The response to a pure tone of the same carrier frequency and level as the SAM stimuli served as a reference for the responses to modulated tones Fig. 1A). A striking aspect of cortical responses in awake animals is the fact that roughly one third (113/361; 31%) had significantly elevated firing rates relative to the spontaneous rate when calculated over the duration (10 s) of the control tone. In an additional 12% (42/361) of neurons, the firing rate was significantly suppressed over the duration of the control tone. Thus the generally accepted notion, derived from studies of anesthetized animals, that cortical neurons do not give sustained responses to pure tone stimuli of long duration (see Middlebrooks 2005
) does not hold for nearly one half (43%) of our data sample (Malone et al. 2002
; Wang et al. 2005
). It should be noted that if mechanisms of adaptation act on long time scales (Malone et al. 2002
; Ulanovsky et al. 2004
), the very long duration (10 s) of the control tones makes our estimate of the prevalence of sustained responses more conservative than a response classification based on shorter stimuli would likely be.
Construction of the MPH
The MPH represents the occurrence of action potentials relative to the phase of the modulation cycle. Figure 1 shows the construction of the MPH and the conventions for its display. The periodic modulation of spike rate for a 1-Hz SAM stimulus (modulation depth = 100%, or m = 1) is clearly evident in the PSTH representation shown in Fig. 1B. By folding the responses on the modulation period (1 s), the distribution of discharge rates within the modulation period is more easily seen in the MPH representation (Fig. 1C). The stimulus envelope was presented in sine phase, but in Fig. 1D (and in all subsequent MPHs), we shifted the responses by 90° (cosine phase) to facilitate interpretation of the shape of MPH with respect to the instantaneous stimulus amplitude. In this representation, the instantaneous amplitude minimum (270°) of the stimulus occurs in the middle of the MPH, and the maximum (90°) occurs at its lateral extremes, which occur at neighboring phases because the modulation phase axis is cyclic. Inspection of Fig. 1D revealed that, for this neuron, the MPH response profile differs from the sinusoidal amplitude envelope suggested by the cartoon of the carrier waveform shown above it. Within each cycle, the instantaneous probability of discharge declines rather slowly as the instantaneous level declines, but the response rises abruptly when the amplitude increases from its minimum. This behavior was typical of neurons that exhibited sustained responses to the unmodulated control tone (Fig. 1A).
The fact that the response shown in Fig. 1 is responsive throughout most of the modulation period results in a relatively low VS (0.22). Nevertheless, the shape of the MPH response profile was extremely robust. The shapes of the MPHs obtained for separate trials were strongly correlated (TS = 0.83). This suggests that, despite the low VS, the fidelity of the cortical response is quite good because the response profile is a consistent, albeit transformed, representation of the modulated signal.
Changes in modulation depth are reflected in the MPH response profile
The nature of the cortical representation of slow amplitude changes will be addressed in a number of examples that compare the MPH response profile to the instantaneous amplitude of a SAM stimulus, calculated by taking the ratio of the SAM stimulus to the unmodulated control and converting this ratio into decibels (see METHODS). By adding this value to the carrier level, one can construct what we term the "SPL profile" of the SAM stimulus, which specifies the approximate sound pressure level (in dBre: 20 µPa), of the modulated waveform at each phase of the modulation period. This conversion facilitates comparisons between the SPL profile of the stimulus, the MPH response profile, and the neuron's rate-level tuning function, which was also measured in terms of SPL.
Although the stimulus envelope of SAM is sinusoidal by definition, the log-transformed SPL profiles depicted in Fig. 2A are not. Deviations from a sinusoidal profile in decibels increase with increasing modulation depth, as shown. Not only are the increases and decreases in SPL asymmetric about the nominal carrier SPL, but the range of relative amplitudes spanned by SAM also increases nonlinearly with increasing modulation depth. Although small modulation depths sample a relatively narrow range of actual stimulus levels, the response depicted in Fig. 2 is clearly modulated for m = 0.2 (+1.58/–1.93 dB), and increasing depths result in increasingly robust modulation. Note that in all cases this neuron responds to the decrease in amplitude (see Fig. 2B stimulus icons, in gray) with an increase in firing rate, as would be predicted from the fact that the carrier level (60 dB SPL) is higher than this strongly nonmonotonic unit's best level (20 dB SPL; Fig. 2C, inset). For m = 0.9, a second response mode corresponding to the rising phase of the stimulus envelope begins to emerge, as the SPL profile encompasses progressively lower SPLs and the slope of its rising phase sharpens. The emergence of an "onset" peak at large modulation depths is consistent with the onset response this unit displayed for short-duration pure tones (Fig. 2C). This type of change in the response profile as modulation depth increased to the top of its range (0.8 to 1) was common in our sample. When the modulation is of sufficient depth, the SPL profile sweeps through the neuron's preferred range of sound levels twice: once during the falling phase and again during the rising phase. Because these features of the SPL profile occur at neighboring phases of the modulation cycle, the resultant peaks are closely apposed, as shown in Fig. 2B. Although such changes in the shape of the MPH reduce the VS, the TS metric continues to increase, indicating that this feature of the neuron's response was highly reliable (Fig. 2D). The relationship between the response profile and the SPL profile for this cell suggests that the rate-level function (RLF) can sometimes serve as a useful heuristic for predicting the neuron's responses to changes in amplitude. Although the presence of two modes in the MPH accurately conveys information about changes in SPL, it necessarily confounds a simple periodicity code for modulation frequency.
Changes in carrier level profoundly affect cortical response profiles
If cortical neurons are primarily sensitive to the amplitude of SAM signals, we would expect that changes in carrier level would profoundly change the shapes of the MPH, particularly in neurons that are sharply tuned for SPL. For example, nonmonotonic tuning for SPL was common in AI in these animals, with 38% of neurons exhibiting decreases in firing rate >50% (relative to best level) for increasing SPLs, 15% exhibiting milder (<50%) decreases, 7% exhibiting a saturating plateau, and 31% responding strictly monotonically (Scott 2004
). The range of SPLs spanned by a SAM stimulus is jointly determined by the carrier level and the modulation depth. In the case of 100% modulated (m = 1) signals, however, the carrier level effectively determines only the instantaneous SPL maximum, because all signals are briefly "off" for a portion of the modulation cycle (180°). Figure 3A depicts the way that changes in carrier level cause vertical displacements of the signal on the SPL axis.
Figure 3C indicates that the responses of an example neuron to brief duration tones were strongly nonmonotonic, with a best level of 20 dB SPL. As the carrier level for 1-Hz modulation was increased from 10 to 70 dB, the shapes of the MPHs changed profoundly. From 20 to 40 dB, the MPH trough corresponding to the lowest instantaneous SPLs progressively narrowed, as a greater proportion of the modulation cycle consisted of moderate SPLs. As the carrier level increased further, the cell responded to decreases in SPL with increases in firing rate, resulting in a response peak centered between 90 and 180°. These features of the MPHs are compatible with the fact that this neuron responded in a sustained fashion to the long duration control tone at 50 dB and showed offset responses for tone pips at higher SPLs.
The set of response profiles in Fig. 3B suggests that the cell fired robustly when the instantaneous SPL falls within the range of its preferred SPLs. However, the SAM-derived RLF (Fig. 3C) deviates substantially from the RLF defined with 100-ms tone pips. The SAM-derived RLF is shifted to higher SPLs and is substantially less nonmonotonic. Apparently, the two response peaks elicited by rapid transitions through the neuron's preferred SPL range offset the fact that high carrier level stimuli spend a smaller fraction of the modulation period in that range. Because the shape of the SPL profile is effectively constant across carrier level, barring the vertical displacement on the dB SPL axis, response profile changes as carrier level is varied can only be explained by the interaction of the SPL profile and the neuron's tuning for SPL. The divergence of the SAM-derived RLF and tone pip RLF indicates that the neuron is also sensitive to the dynamics of the amplitude changes.
Here again, the appearance of novel response features in the response profiles at high carrier levels shows that this neuron is capable of coding more than the modulation frequency of the SAM stimulus. These response features degrade synchrony to the modulation frequency (Fig. 3C). The progressive narrowing of the trough in the response profile from 20 to 50 dB causes a progressive reduction in VS. From the perspective of SPL coding, however, the narrowing of the trough in the response profile is readily explained in terms of the neuron's level tuning, and implies that the fidelity of the amplitude representation is maintained. Inspection of the TS curve, which increases with carrier level, confirms this impression, indicating that the neural representation of the SPL profile is more rather than less robustly coded at high carrier levels.
Changes in carrier frequency are captured by changes in cortical response profiles
Unlike the carrier level, modulation depth, and modulation frequency, the carrier frequency of the SAM stimulus is not directly tied to the SPL profile. For this reason, we presented SAM stimuli at each neuron's best frequency and rarely varied this parameter systematically. Nevertheless, the fact that a neuron's response area is a joint function of frequency and level implies that the choice of carrier frequency should impact the shape of the MPH for SAM. Anecdotally, we have observed that changes in the responses to tone pips at different frequencies (e.g., a change from predominantly onset to offset responses) are typically mirrored in analogous changes in the MPH response profiles, including predictable changes in the phase of the dominant response peak.
Figure 4 shows how changes in carrier frequency (6, 10, and 16 kHz) were reflected in the responses of an example neuron. At all carrier frequencies, the neuron fired most strongly at the SPL profile minima (Fig. 4A; note that as the modulation period shortens, a fixed neural latency will cause increasing apparent lag in the response peak relative to the SPL minimum). Overall, the 6-kHz carrier elicited the highest average firing rates, and the firing rate varied relatively little with modulation frequency (Fig. 4B). Increasing the carrier frequency resulted in more variable rMTFs, including suppression relative to the control tone for the 16-kHz carrier at modulation rates >50 Hz. The impression conveyed by the tMTFs based on VS (Fig. 4B) is somewhat more complicated, because the VS tMTF appears band-pass at 10 kHz, but not at 6 or 16 kHz. Study of the MPHs revealed that responses to the 10-kHz carrier are weakly bimodal, which explains the reduction in VS below 20 Hz. Although the differences between the MPH shapes for the 10- and 16-kHz carriers are subtle, they are highly reproducible, as indicated by the uniformly high TS values from 2 to 20 Hz. Thus MPH response profiles contain information that can be used to distinguish the carrier frequencies of SAM stimuli.
Changes in modulation frequency are captured by changes in cortical response profiles
As modulation frequency is increased, the changes in amplitude indicated by the SPL profile occur more and more rapidly. The resultant changes in the response profile reveal temporal constraints on the coding of those changes operant in the recorded cell and its input pathway. Figure 5 B depicts response profiles for 100% modulated stimuli (m = 1) spanning a range from 1 to 50 Hz. These data were obtained from the same cell featured in Figs. 2 and 3. Comparison of the response profiles obtained with 2-Hz modulation at 100% depth in this and previous figures indicates that the features of the SAM stimulus were robustly encoded in the moment-by-moment discharge rate of this neuron.
Summary measures derived from the MTF fail to convey important aspects of the way this neuron encodes the SPL profile of the SAM stimulus. Figure 5B shows the typical data representation for SAM responses: MTFs for average firing rate (rMTF; black line) and synchrony, measured as VS (tMTF; dashed line). Based on these functions, the best modulation frequencies for rate (rBMF) and temporal synchrony (tBMF) are 10 and 20 Hz, respectively. The substantial increase in response synchrony at 10 and 20 Hz occurs because the neuron is no longer capable of producing separate response peaks associated with the rising and falling phases of the SPL profile, as it does for modulation frequencies <5 Hz. The fact that the tMTF is band-pass is merely an artifact of the VS calculation, because the separate peaks in the response profiles for slow modulations distribute the spike times more evenly in the modulation period. It would be inaccurate to infer from the tMTF that the temporal fidelity of this neuron's representation of modulated stimuli is best at 20 Hz or that the neuron is tuned for that modulation frequency. At low modulation frequencies, the shape of the response profile captures the direct rate code for amplitude used by AI neurons. Accordingly, the TS curve is effectively flat over this range. In contrast, the shape of the tMTF captures only what the VS metric encodes—the precision of phase-locking for a presumed unimodal distribution of spike phases. This is reflected in the sharp increase in the VS from 5 to 10 Hz, where the MPH changes from a bimodal to a unimodal shape (Fig. 5).
Inspection of the response profiles at the lowest modulation frequencies also indicates that the response profiles have higher frequency components than are present in the sinusoidal envelope of the stimulus. The logarithmic transform that generates the SPL profile from the amplitude envelope introduces increasingly higher frequency components at larger modulation depths, caused by the steep falling and rising phases of the SPL profiles. Nevertheless, the response profiles in Fig. 5 are clearly not faithful replicas of either the amplitude envelope or the SPL profile. The nature of this transformation appears to be related, at least in part, to the neuron's strongly nonmonotonic rate level function. At higher modulation frequencies, however, this relationship becomes obscured because the way amplitude changes can be encoded is limited by the neurons' maximal discharge rates.
Number of spikes per modulation cycle determines the resolution of a discharge rate code for amplitude
The distinction between a synchrony code for modulation frequency and a discharge rate code for amplitude can only be made when there are sufficient spikes within each modulation cycle to distinguish them. The average spike count per modulation cycle impacts a synchrony code and a rate code very differently. Firing rates affect the synchrony estimate because VS is maximal (i.e., 1) when all spikes fall in the same phase of the modulation period. Thus increasing the firing rate above one spike per modulation cycle will tend to increase the dispersion of spike times within the modulation period, lowering the VS. If cortical neurons serve to extract the modulation frequency, a synchrony code based on perfect phase-locking to a single and arbitrary point on the stimulus envelope would suffice. This code is achievable with one or fewer spikes per modulation cycle. In contrast, the resolution of an amplitude code based on instantaneous discharge rate is crucially dependent on the number of discharges that a neuron can fire within each modulation period. Whereas a synchrony code for modulation frequency marks a point in the MPH, subject to intrinsic variability in response phase, a discharge rate code for amplitude changes must describe a function through the period, subject to intrinsic variability in response rate. In effect, the average number of spikes per modulation cycle limits the resolution of a rate code for sound amplitude.
Figure 6 shows the distributions of average spikes per cycle across all tested modulation frequencies. On a plot with logarithmic axes, the average numbers of spikes per modulation cycle is well described by a power function, intersecting an average of one spike per cycle between 10 and 20 Hz. Averaged across all neurons, mean spikes per modulation period falls from 18.8 at 1 Hz to 1.05 at 20 Hz. Our data showed that the multipeaked response profiles evident at very low (<5 Hz) modulation frequencies typically become unimodal at or above 10 Hz (e.g., Fig. 5). Multipeaked response profiles (e.g., Figs. 2, 3, and 5) were typically associated with large depth, low frequency modulations presented well above the best level of a strongly nonmonotonic neuron exhibiting sustained responses to the unmodulated control tone. Because not all cells were tested at the requisite high carrier levels, we cannot assess the absolute prevalence of the phenomenon. However, in a subpopulation of neurons tested across a wide range of carrier levels with 100% depth, fully modulated SAM signals <20 Hz, 13 of 25 neurons exhibited multipeaked discharges, indicating that they are not uncommon if the appropriate stimulus conditions exist. Nevertheless, the modulation frequencies that resulted in a clearly multimodal MPH never exceeded 20 Hz in our sample. In effect, cortical neurons default to a synchrony code for SAM above this critical range for direct amplitude coding. We will consider evidence that cortical neurons use a nonsynchronized rate code for modulation frequency in a subsequent section.
Response profiles of cortical neurons uniquely encode different SAM stimuli
The cortical response profiles appear to "multiplex" the parameters of SAM stimuli because those parameters jointly define the instantaneous amplitude of the signal. The foregoing examples showed that changes in modulation depth, carrier level, and modulation frequency are each captured by changes in the shapes of MPH response profiles at low modulation frequencies. Figure 7 shows the results of four slices through the space of possible SAM stimuli. For a given carrier frequency, the stimulus space of a SAM stimulus has three dimensions, as indicated by the axes of carrier SPL (x), modulation depth (y), and modulation frequency (z). It is clear from the SPL profiles that graded changes in a given stimulus parameters are represented by graded changes in the associated response profiles. As a result, the shapes of the MPHs suffice to identify the details of the stimulus envelope, which is jointly defined by these three parameters. In contrast, the summary measures based on rate, synchronization, and trial correlation appearing in the insets do little to enlighten precisely how the changes in amplitude are encoded by this neuron.
Increases in modulation frequency for a 20-dB SPL carrier (aligned diagonally in black) produce a progressive rounding of the peak associated with the SPL maximum (changes in the mean phase of the response reflect the group delay). Increases in carrier level for a 2-Hz modulation (aligned horizontally in gray) also produce striking changes in the phase of the response, but these changes are attributable solely to the neuron's tuning for SPL, because the modulation frequency is constant (latency differences related to changes in SPL are unlikely to have a significant impact when the modulation period is very long, as it is here). At 40 dB SPL and above, the cell responds to decreases in SPL near the falling phase of the stimulus envelope, consistent with its nonmonotonic pure tone RLF and preferred SPL of 25 dB SPL (data not shown; note, however, that the SAM-derived RLF in b is flat). Comparison of the response profiles at 20 and 70 dB SPL shows that the response phases are effectively inverted. Analogously, this neuron responded to short (100 ms) 20-dB SPL tones throughout the tones' duration, whereas 70-dB SPL tones elicited only a phasic discharge at sound offset (data not shown).
This example further shows that the RLF can be very useful for predicting qualitative features of cortical responses to SAM at low modulation frequencies (particularly in cells with sustained responses to long duration tones). It is noteworthy that the reversal in response phase at 20 and 60 dB SPL is apparent even at the lowest modulation depths, where the change in actual stimulus levels is quite small (less than ±2 dB for m = 0.2). For this reason, it is more accurate to say that the neuron generally responded when the stimulus approached its preferred SPL, because it clearly did not actually achieve that value for moderate depths. Thus it would be overly simplistic to suggest that a static measure of SPL tuning such as the RLF could simply be used as a lookup table for predicting cortical responses to those same SPLs in a dynamic context. A more detailed model relating to the shape of the RLF to shape of the MPH, taking the additional temporal factors that shape cortical responses (e.g., spike frequency adaptation, or synaptic depression) into account, is beyond the scope of this paper. Nevertheless, the heuristic value of the RLF in predicting SAM responses in many neurons is evidence for rate-based coding of sound level at low modulation frequencies.
Like Fig. 7, Fig. 8 includes a matrix of response profiles for SAM signals varying in depth, carrier level, and modulation frequency. In this neuron, however, complete modulation transfer functions (0.7 to 200 Hz) were obtained at 100% depth for carrier levels spanning a 40-dB SPL range. This neuron responded nonmonotonically to tone pips (Fig. 8A). Here we show results with SAM whose carrier level was near the peak (20 dB SPL), on the slope (40 dB SPL), and near its nadir (60 dB SPL). Similar reversals of response phase at very low modulation depths are evident in the responses to 20- and 60-dB SPL carriers (gray histograms on the diagonal), as are graded changes in the response profiles with increasing modulation depth, including the appearance of a narrow peak near the rising phase of the envelope for 100% modulation at 60 dB SPL. Again, stimulation at a low carrier SPL elicited responses coincident with the highest SPLs within the modulation period, with significant synchrony evident even at 200 Hz (data not shown). Responses to 40 dB SPL SAM were similar to those at 20 dB SPL, although the average firing rates are higher at modulation frequencies >10 Hz, as shown in the rMTF (Fig. 8B). At 60 dB SPL, however, responses to both the falling and rising phases of the envelope are evident
20 Hz, although the earlier peak becomes substantially attenuated when the latter begins to dominate at 5 Hz and above. The rMTF at 60 dB SPL drops rapidly with increasing modulation frequency because of the diminished response to the falling phase of the envelope, which comprises the majority of the spikes at 1 and 2 Hz. Thus the rBMF for a 60-dB SPL carrier is 1 Hz compared with 5 and 10 Hz for the 20- and 40-dB SPL carrier levels, respectively.
The relative ordering of the tMTFs in Fig. 8B indicates a progressive loss of synchrony as the carrier level is increased. At low modulation frequencies, the cortical reductions in VS at high carrier levels typically reflect increases in firing rate during the rapid falling and rising phases of the SPL profile, which distributes spikes more widely throughout the modulation cycle. For example, the sharp peak that occurs during the rising phase of the SPL profile is not captured by the tMTF at 60 dB SPL. The effect on the VS is diluted by spikes occurring at distant phases, despite the obvious temporal precision of the neuron's responses. Given the shape of the RLF, this neuron might be expected to respond poorly to 60 dB SPL SAM because the stimulus spends relatively little time in the range of the neuron's preferred SPLs. The rMTFs and tMTFs seem to confirm this expectation. The TS values did not vary significantly across carrier level (Wilcoxon ranked sum; P > 0.1). On the other hand, the changes in TS across modulation frequency were highly correlated for the different carrier levels (r2 > 0.7; Fig. 8B), which suggests the existence of shared temporal limits on the neuron's ability to encode SPL changes across the tested range.
The foregoing examples suggest that the shapes of the MPHs can, in many cases, be uniquely associated with the SAM stimuli that elicited them. We explicitly tested this notion by applying a PSTH-based response classifier to spike trains elicited by SAM stimuli that varied along a single parameter axis: carrier level, modulation depth, or modulation frequency (see METHODS). Figure 9 shows the confusion matrices (A–C) obtained when applying the classifier to the MPHs depicted in Figs. 2, 3, and 5, respectively. In all three cases, classifier performance was substantially better than chance, as indicated by the distribution of correct estimates of stimulus identity along the diagonal. Although the elimination of firing rate information by response normalization (see METHODS) did reduce performance in these examples, the "phase only" confusion matrices in the central column evidence a fairly modest reduction in classifier performance. In contrast, the "rate only" confusion matrices to the right indicate poorer performance when the basis for the estimate is limited to average spike rate.
To explore these issues at the population level, we identified a subset of neurons for which we had obtained MPHs for a set of SAM parameter values. We were able to identify 25 neurons that had been tested across a large range (mean = 52 dB) of carrier levels at low (<20 Hz) modulation frequencies. We also identified a subset of neurons (n = 145) where we had obtained both MTFs and modulation depth functions (MDFs) at the best modulation frequency. After discarding data based on non