Amplitude modulation (AM) is a crucial feature of many communication signals, including speech. Whereas average discharge rates in the auditory midbrain correlate with behavioral AM sensitivity in rabbits, the neural bases of AM sensitivity in species with human-like behavioral acuity are unexplored. Here, we used parallel behavioral and neurophysiological experiments to explore the neural (midbrain) bases of AM perception in an avian speech mimic, the budgerigar (Melopsittacus undulatus). Behavioral AM sensitivity was quantified using operant conditioning procedures. Neural AM sensitivity was studied using chronically implanted microelectrodes in awake, unrestrained birds. Average discharge rates of multiunit recording sites in the budgerigar midbrain were insufficient to explain behavioral sensitivity to modulation frequencies <100 Hz for both tone- and noise-carrier stimuli, even with optimal pooling of information across recording sites. Neural envelope synchrony, in contrast, could explain behavioral performance for both carrier types across the full range of modulation frequencies studied (16–512 Hz). The results suggest that envelope synchrony in the budgerigar midbrain may underlie behavioral sensitivity to AM. Behavioral AM sensitivity based on synchrony in the budgerigar, which contrasts with rate-correlated behavioral performance in rabbits, raises the possibility that envelope synchrony, rather than average discharge rate, might also underlie AM perception in other species with sensitive AM detection abilities, including humans. These results highlight the importance of synchrony coding of envelope structure in the inferior colliculus. Furthermore, they underscore potential benefits of devices (e.g., midbrain implants) that evoke robust neural synchrony.
- amplitude modulation
- auditory midbrain
- envelope synchrony
- inferior colliculus
amplitude modulation (AM) is a critically important acoustic feature of speech and many nonhuman animal vocalizations (Beckers and ten Cate 2001; Rosen 1992; Shannon et al. 1995). Humans exhibit remarkable sensitivity to AM, with the ability to detect modulation depths as low as 3–6% (i.e., 25–30 dB below full modulation) at modulation frequencies less than a few hundred hertz (Carney et al. 2014; Kohlrausch et al. 2000; Viemeister 1979). Currently, the neural mechanisms underlying behavioral sensitivity to this crucial acoustic feature of communication signals are poorly understood.
The inferior colliculus (IC) is a large auditory nucleus in the vertebrate midbrain that is an almost-mandatory processing station in the ascending auditory pathway (Aitkin and Phillips 1984). The IC has emerged as an important location for studying AM sensitivity, because it is the first stage of auditory processing to show neural rate coding of AM (Joris et al. 2004). Whereas more peripheral nuclei encode AM through envelope synchrony (i.e., temporal variation in discharge rate at the modulation frequency of the stimulus) in both mammals (Joris and Yin 1992; Rhode and Greenberg 1994; Sayles et al. 2013) and birds (Gleich and Klump 1995), the IC and other more central nuclei encode AM through both envelope synchrony and substantial changes in average discharge rate compared with responses from unmodulated stimuli [for IC, see Langner and Schreiner (1988) and Woolley and Casseday (2005); for a summary of conserved physiological response properties in the midbrain of mammals and birds, see Woolley and Portfors (2013); for thalamus, see Bartlett and Wang (2007); for cortex, see Johnson et al. (2012) and Rosen et al. (2010)]. Effects of AM on IC response rate can be excitatory or suppressive (Krishna and Semple 2000; Nelson and Carney 2007) and are most pronounced at a neuron's best modulation frequency (BMF). The emergence of two neural codes for AM at the level of the IC raises the question of how synchrony and average rate thresholds compare with behavioral AM detection thresholds.
The relationship between behavioral and IC sensitivity to AM in the same animal model has been examined previously in a single species—the rabbit—which has limited behavioral sensitivity to AM (Carney et al. 2014). The rabbit study used receiver-operating characteristic (ROC) analyses (Egan 1975) to calculate thresholds for AM detection (i.e., minimum detectable modulation depths) across a population of IC neurons, based on both average discharge rate and envelope synchrony, for comparison with behavioral thresholds obtained with operant conditioning procedures. The most sensitive rate thresholds in the rabbit IC were approximately as sensitive as behavioral thresholds. Synchrony thresholds, in contrast, were considerably more sensitive than the behaving animal, falling instead within the range of human AM performance. These results suggest that rate coding in the IC, rather than envelope synchrony, may underlie behavioral detection of AM in rabbits. However, the mechanisms underlying AM perception in species with more sensitive auditory perceptual abilities are unexplored. Human listeners may use envelope synchrony of IC responses to achieve greater AM sensitivity than rabbits (Carney et al. 2014; Nelson and Carney 2007). Alternatively, rate-based AM thresholds in the human IC may be more sensitive than in the rabbit and therefore able to support superior behavioral performance. The relationship between average rate or envelope synchrony and human-like behavioral AM acuity is unknown.
Whereas rate- and synchrony-based neural AM thresholds cannot be measured in the human IC for comparison with behavioral performance, they can be quantified in other species with sensitive AM detection abilities. The budgerigar (Melopsittacus undulatus) is an avian, lifelong vocal learner with complex, temporally modulated vocalizations (Farabaugh et al. 1994; Tu et al. 2011) and the capacity to mimic human speech (Dooling et al. 2000). Previous research in this small Australian parrot suggests that AM detection abilities are similar between the budgerigar and human listeners for noise-carrier stimuli (Dooling and Searcy 1981) and tones with low modulation frequencies (Carney et al. 2013). Compared with mammals of similar size, the sensory epithelium of the avian cochlea is shorter (∼2.5 mm in budgerigar) (Manley et al. 1993) and restricted in sensitivity to lower acoustic frequencies. Behavioral absolute thresholds of the budgerigar are <30 dB sound pressure level (SPL) from 0.35 to 6 kHz and range from 0 to 10 dB SPL at frequencies from 2 to 3 kHz (Brittan-Powell et al. 2002; Dooling and Saunders 1975) (Fig. 1A). Comparable ranges of hearing sensitivity in the domestic mouse and rabbit are 4–64 kHz (Koay et al. 2002) and 0.5–40 kHz (Heffner 1980), respectively. The frequency selectivity of auditory nerve-fiber responses to tones has not been studied in the budgerigar, but in other avian species with similar cochlear anatomy to budgerigar, frequency tuning is similar to or slightly sharper than in typical mammalian auditory nerve fibers (Manley et al. 1985; Sachs et al. 1974).
The present study quantified neural and behavioral sensitivity to AM in the budgerigar to gain insight into the relationships between neural rate and synchrony thresholds and AM behavioral thresholds. Neural recordings were made from multiunit neural clusters in the central nucleus of the IC (also known as nucleus mesencephalicus lateralis, pars dorsalis) (Covey and Carr 2005) using chronically implanted electrodes in awake, unrestrained birds. Behavioral AM sensitivity was assessed with operant conditioning procedures. The results suggest that envelope synchrony rather than average rate information in the budgerigar IC underlies human-like behavioral sensitivity to AM in this species.
MATERIALS AND METHODS
All physiological and behavioral experiments included in this study were conducted in English-variety budgerigars and performed under a protocol approved by the University Committee on Animal Resources at the University of Rochester. English budgerigars are bred for larger size (40–65 g) and calmer deportment compared with examples of the species commonly found in pet stores (parakeets). Neurophysiological data were collected from the central nucleus of the IC during 68 multiunit neural recording sessions conducted in three awake, unrestrained birds (1 female, implanted twice; 2 males). Behavioral data were obtained using operant conditioning procedures in four birds (all male). This new behavioral dataset was not part of a previously published study focused on detection of low modulation frequencies (Carney et al. 2013). Behavioral and physiological data were collected from different birds.
Microelectrode implantation procedure.
Tungsten microelectrodes (3–5 MΩ; MicroProbes, Gaithersburg, MD) were implanted into the IC of anesthetized birds using a stereotaxic, surgical procedure to allow recording of sound-evoked neural activity during subsequent daily recording sessions. Electrodes were epoxied to a miniature, head-mounted microdrive (“nDrive”; NeuroNexus, Ann Arbor, MI), which allowed postimplantation adjustments of recording depth. Implant assemblies consisted of the nDrive, electrode, protective cap, and miniature connector (#A79000-001; Omnetics, Minneapolis, MN). Electrodes extended 8 mm below the base of the nDrive in its retracted state, with the capacity to extend to a maximum depth of 11 mm through adjustment of a manual control screw.
Birds were anesthetized with ketamine (3–5 mg/kg sc) and dexmedetomidine (0.08–0.1 mg/kg sc) for the implantation surgery and wrapped loosely in a towel. They were then placed into a stereotaxic frame that held the head in an upright position, with the nares ∼5 mm above the horizontal plane of the interaural axis and the beak tip ∼5 mm below. Supplemental warmth was provided by a disposable heating pack (∼50°C; ComfortTec International, Carlsbad, CA) placed under the body. Body temperature was not measured during implantation surgeries. The top of the head was trimmed of feathers and disinfected with Betadine and alcohol. The scalp was numbed with Lidocaine (0.05 ml sc), and a 15 × 15-mm area of the skull was exposed with an incision to accommodate the dimensions of the implant assembly. An ∼1-mm diameter craniotomy was made using a surgical drill centered 4 mm lateral of the midline and 1 mm posterior to the vertical plane of the interaural axis. Finally, an electrical ground was established on the animal by advancing a self-tapping, stainless-steel screw (#000; ×3/32”) through the skull near the midline and just posterior to the base of the implant assembly.
The implant assembly, with electrodes oriented vertically, was placed into a micromanipulator mounted on the stereotaxic frame. The manipulator was adjusted to center the electrode tip over the craniotomy. A wire was connected to the ground screw with electrically conductive epoxy (#8331-14G; MG Chemicals, Burlington, Ontario, Canada). The dura was then pierced with an ophthalmic scalpel and the implant assembly slowly lowered into the brain using a hydraulic microdrive (David Kopf Instruments, Tujunga, CA) until the base of the nDrive contacted the skull (electrode depth = 8 mm). The nDrive control screw was then used to advance the electrodes toward the IC during stimulation with 75 dB SPL Gaussian noise or tone bursts. Following the emergence of robust, sound-evoked neural activity, which occasionally required retraction of the implant assembly and minor adjustment of the electrode trajectory, a bead of Kwik-Sil adhesive (World Precision Instruments, Sarasota, FL) was placed over the craniotomy, and the nDrive was cemented to the skull using light-cured dental composite material (Vertise Flow; Kerr, Orange, CA). Finally, light-cured composite material was used to cement the protective cap over the implant assembly and the electrical connector to the cap.
Anesthesia was maintained throughout the 2- to 3-h implantation procedure by slow infusion with a syringe pump of a solution combining ketamine (6–10 mg·kg−1·h−1), dexmedetomidine (0.16–0.27 mg·kg−1·h−1), and lactated Ringer's solution (30–50 ml·kg−1·h−1) through a subcutaneous catheter. Following completion of the surgery, animals were removed from the stereotaxic frame and placed in a heated recovery chamber (#912-000; Lyon Technologies, Chula Vista, CA). Antisedan (0.5 mg/kg sc) was given to speed recovery from anesthesia. The analgesic carprofen (1 mg/kg sc) was given on the day of surgery and once daily for 1–2 days thereafter to minimize pain and inflammation. Metoclopramide (1.75 mg/kg) and supplemental fluids (lactated Ringer's solution 1 ml sc) were given once or twice daily following the procedure until normal appetite and droppings were observed (typically 1–2 days).
Neurophysiological recording sessions.
Sound-evoked neural activity was recorded in awake, unrestrained birds for 14–26 daily test sessions beginning 1 wk after the microelectrode implantation procedure. Daily test sessions were 2 h in duration. The position of the recording site was held constant throughout each session until its end, at which point, the electrode was advanced by 35 μm. This distance reliably produced an increase in the characteristic frequency (CF; the frequency of maximum sensitivity to tones) of the recording site. All of the recording tracks included in this study showed robust spiking activity (described further below) and a clear increase in CF with increasing recording depth (e.g., Fig. 1A). This tonotopic gradient is consistent with localization of recording sites in the central nucleus of the IC rather than other nearby auditory nuclei (e.g., thalamus) (Bigalke-Kunz et al. 1987). CFs increase dorso-ventrally in the central nucleus of the IC in mammals (Baumann et al. 2011; Langner et al. 2002; Langner and Schreiner 1988) and all other bird species studied to date [Calford et al. (1985); Knudsen and Konishi (1978); Woolley and Casseday (2004); reviewed for both taxa in Woolley and Portfors (2013)].
Recordings were made in a double-walled, sound-isolation booth (inside dimensions: 2.13 m long, 1.98 m wide, 1.96 m tall; Industrial Acoustics, Bronx, NY), lined with 7.6 cm sound-absorbing foam (Pinta Acoustic, Minneapolis, MN). During recording sessions, birds perched in a wire-mesh cage (0.2 m length, width, and height; 6.4 mm wire spacing) that was centered in the chamber and separated by 45 cm from a single, free-field loudspeaker (#PS180-8; Dayton Audio, Springboro, OH) mounted in the same horizontal plane as the bird. A video-monitoring system was used to ensure that birds remained perched, which was typical behavior in all birds, and facing the loudspeaker throughout the session.
Stimulus generation and response acquisition were coordinated using a data acquisition board (#PCI-6251; National Instruments, Austin, TX) controlled by custom MATLAB programs (MathWorks, Natick, MA). Stimulus waveforms were generated in MATLAB (sampling frequency = 50 kHz) and converted to analog signals on the National Instruments board before passing to a power amplifier (Tascam PA-20 MK II) that drove the loudspeaker. Stimuli were calibrated in MATLAB before analog conversion using a 4,000-point finite impulse response (FIR) filter that compensated for the frequency response of the system, which was determined from the output of a calibrated microphone (Type 4134; Brüel and Kjaer, Marlborough, MA; sampled at 50 kHz by the National Instruments board) placed inside the cage at the location of the animal's head. Tones were presented during calibration at 249 log-spaced frequencies from 0.050 to 15.1 kHz.
Electrode signals were buffered using a miniature headstage that clipped to the implant connector before passing out of the sound booth through thin, flexible wires to a custom-built amplifier designed for extracellular signals. The amplifier filtered signals from 0.3 to 8 kHz and amplified signals by a factor of 1,000–10,000. Amplified signals were digitized on the National Instruments board (16-bit resolution) at a sampling frequency of 31,250 Hz and written to the hard drive of the personal computer.
Responses to 200 ms pure tones of variable frequency (0.2–8 kHz, 7 steps/octave) and level (15–65 dB SPL, 10 dB steps) were recorded at the beginning of each session to generate a frequency-response map for determination of CF and 10 dB bandwidth of frequency tuning at each recording site. Each frequency-level combination was presented three times. Tones were presented in random sequence (within each presentation group) with 10 ms cos2 onset and offset ramps and 500 ms of silence between tones. Responses were used to calculate pure-tone tuning curves, plotting the threshold stimulus level necessary to evoke a criterion discharge rate as a function of stimulus frequency (Fig. 1A). The criterion discharge rate was set at either 3 SD above the mean spontaneous rate or, for recording sites with an excitatory rate response at 15 dB SPL, the maximum discharge rate evoked across all 15 dB SPL stimuli.
Responses to AM tone- and noise-carrier stimuli with varying modulation frequency and depth were then recorded. AM tones were generated with the carrier frequency equal to best frequency (BF), and the same “frozen” Gaussian noise waveform was used as a carrier for all noise signals (bandwidth: 0.1–10 kHz). First, a modulation transfer function (MTF) was obtained by presenting stimuli with 100% modulation depth at 25 log-spaced modulation frequencies ranging from 4 to 1,024 Hz. An unmodulated stimulus was also included in the set. MTF stimuli had a duration of 1 s and were presented in random order for 10 repetitions. Second, for determination of neural AM thresholds, modulation depth functions were obtained at 2–5 modulation frequencies (typically 3; modulation frequency range: 16–512 Hz) by presenting stimuli with modulation depths ranging from −30 to 0 dB in 5 dB steps. Modulation depth in decibels is calculated as 20 log10(m), where m is the modulation index. A value of m = 1 corresponds to 100% or 0 dB depth, whereas m = 0.1 corresponds to 10% or −20 dB depth. An unmodulated carrier signal was also included in each stimulus set. Stimuli had a duration of 500 ms and were presented in random order for 20 repetitions. For both MTFs and modulation depth functions, stimuli were presented at 65 dB SPL with 50 ms cos2 onset and offset ramps and 500 ms of silence between stimuli.
Spikes were detected in multiunit neurophysiological recordings (Fig. 2) after first high-pass filtering to minimize the local field potential (500 point FIR; 1 kHz cutoff frequency) and application of a nonlinear energy operator (Kim and Kim 2000). The nonlinear energy operator gives an output proportional to the product of the instantaneous frequency and amplitude of the input signal and hence, accentuates spike peaks for detection based on an amplitude threshold. For the discrete time sequence x(n), the output of the nonlinear energy operator is first calculated as x2(n) − x(n + 1)x(n − 1). This output is then smoothed with a six-point Bartlett window to eliminate spurious peaks in the transformed signal due to cross-terms and background noise (Kim and Kim 2000). The amplitude threshold for spike detection was set once per recording session at approximately one-half of the peak amplitude of the largest spikes in the transformed neurophysiological recording (i.e., well above the level of the noise). Spike times were calculated as the time of the peak deflection expressed relative to stimulus onset.
Isolation of single-unit responses using a template-matching procedure was not successful, apparently because the action potentials of neurons near the recording site were too similar in shape for discrimination. Nonetheless, multiunit recordings from groups of spatially clustered neurons are expected to provide valuable insight into the functional response properties of the budgerigar IC based on previous reports of robust anatomical gradients in spectral and temporal response properties in the midbrain of mammals [Baumann et al. (2011); Chen et al. (2012); Langner et al. (2002); Langner and Schreiner (1988); but see Seshagiri and Delgutte (2007)] and other bird species (Calford et al. 1985; Woolley and Casseday 2004).
Rate- and synchrony-based AM thresholds of individual neural recording sites.
Neural thresholds for AM detection were calculated from modulation depth functions based on both the average discharge rate and envelope synchrony of neural responses. Rate-based AM thresholds were estimated using ROC analysis (Egan 1975), which computes classification performance (i.e., percent correct, modulated vs. unmodulated) from the distributions of discharge rates observed in response to each modulation depth compared with the unmodulated stimulus condition. Linear interpolation was applied to the function-plotting classification performance vs. stimulus modulation depth. The rate threshold was calculated as the minimum modulation depth, above which classification performance consistently exceeded 70.7% correct. This value corresponds to average behavioral performance at threshold for the operant task and tracking procedure used in behavioral experiments (see below).
AM thresholds based on envelope synchrony were estimated using ROC analysis and a variation of the classical vector strength (VS) metric of neural synchrony. Classical VS (Goldberg and Brown 1969) is calculated from neural spike times pooled across stimulus repetitions as 1/n , where n is the total number of spikes, fm is the modulation frequency of the stimulus, and ti is the time of the ith spike. In the analysis used here, response synchrony was calculated on a repetition-by-repetition basis using phase-projected VS (VSPP), which is calculated as VSR·cos(φR − φP), where VSR is the VS of spikes associated with an individual stimulus repetition, φR is the mean phase of spikes associated with that repetition, and φP is the mean phase of spikes pooled across repetitions (Johnson et al. 2012; Sayles et al. 2013; Yin et al. 2011). VSPP reduces single-repetition estimates of envelope synchrony, which can be highly variable, given low spike counts, when they are out of phase with the pooled response. VSPP thresholds were estimated using the same ROC analysis approach used for rate thresholds. The distribution of VSPP estimates observed at each modulation depth was compared with the distribution associated with the unmodulated stimulus to determine classification performance given optimal threshold placement. Linear interpolation was applied to the function plotting classification performance by stimulus modulation depth, and the VSPP threshold was calculated as the minimum modulation depth at which classification performance surpassed 70.7%.
The analysis time window used for calculation of response rate extended from 50 ms after stimulus onset to stimulus offset. The analysis window used for calculation of VSPP began 50 ms after stimulus onset and extended for the maximum integer number of stimulus modulation periods possible before stimulus offset. This analysis window effectively excluded the neural response to the first cycle of AM for stimuli with modulation frequencies >25 Hz [the majority of neural AM thresholds was estimated with modulation frequencies ≥32 Hz (see Fig. 7)].
Neural AM thresholds obtained by pooling rate information across recording sites.
AM detection thresholds of the pooled neural population were estimated for both carrier types using a maximum likelihood-based pattern decoder (Day and Delgutte 2013; Jazayeri and Movshon 2006). This analysis calculated, separately for each modulation frequency and at each modulation depth, the percentage of stimuli that could be classified correctly as modulated or unmodulated based on single, optimally weighted population responses (i.e., spike counts sampled across the population) and an assumption of independent, Poisson-distributed spike counts. Percent correct was calculated by first randomly drawing 1,000 population responses from each of the two stimulus conditions (i.e., modulated and unmodulated). For each population draw, the logarithm of the likelihood of the two conditions was calculated as in Jazayeri and Movshon (2006), where ni is the randomly drawn spike count of the ith site, N is the total number of sites, and fi(θ) is the average spike count of site i for stimulus condition θ
logL(θ) = ∑i=1Nni logfi(θ) − ∑i=1Nfi(θ) − ∑i=1Nlog(ni!)
The first term is an optimally weighted sum of spike counts across the population, whereas the second term is the sum of average counts. The last term can be ignored, because it is independent of θ. Note that for each random population draw, selected spike counts ni were removed from the dataset before calculation of fi(θ) to avoid overfitting of the model. Percent correct was calculated as the percentage of population draws for which the log-likelihood of the correct stimulus condition was greater than that of the incorrect condition. The population threshold was calculated as the modulation depth at which the performance of the decoder model exceeded 70.7% correct.
Recording sites with no significant variation in discharge rate with modulation depth theoretically make no contribution to decoder performance. In practice, however, spike counts drawn from these sites carry small weights, due to sampling errors in the estimation of fi(θ), and hence, can decrease performance. We therefore excluded these sites from pooling analyses, which typically improved performance by 1–2 dB compared with analyses conducted with the full sample of sites. A unity relationship was observed across all recording sites between log-transformed mean spike count, calculated for each stimulus, and log-transformed variance, consistent with Poisson distribution of spike counts.
Behavioral AM thresholds.
Thresholds for behavioral AM detection of a 4-kHz tone-carrier signal were estimated at modulation frequencies of 16, 32, 64, 128, 256, and 512 Hz using operant conditioning. Thresholds were estimated four to six times/day in each bird during two, ∼20 min test periods. Behavioral sessions were conducted in a sound-isolation chamber (inside dimensions: 61 cm long, 81 cm wide, 61 cm tall; Industrial Acoustics), lined with 6.7 cm sound-attenuating foam (Pinta Acoustic). Birds perched in a wire-mesh, stainless-steel cage (0.2 m length, width, and height; 6.4 mm wire spacing) located centrally on the floor of the chamber. The wire cage contained three horizontally placed response switches and the delivery tube of a customized seed-dispensing system (ENV-203 Mini; Med Associates, St. Albans, VT). A single overhead loudspeaker (MC60; Polk Audio, Baltimore, MD) was mounted above the cage for presentation of acoustic stimuli. Stimulus generation and behavioral response acquisition were coordinated using a National Instruments data acquisition board (#PCI-6251) controlled by custom MATLAB programs. Stimulus waveforms (sampling frequency = 50 kHz) were converted to analog signals on the National Instruments board before passing to a power amplifier (D-75A; Crown Audio, Elkhart, IN) that drove the loudspeaker in the booth. Stimuli were calibrated using the same filtering procedure applied in the neurophysiology setup.
Birds were trained to perform a single-interval, two-alternative, nonforced-choice task during behavioral test sessions. Birds started each trial by pecking the center switch, which initiated presentation of a single stimulus. The stimulus was either a standard, unmodulated tone or target AM tone, presented for a maximum duration of 500 ms at 65 dB SPL with 50 ms cos2 onset and offset ramps. Birds were trained to peck the left switch in response to unmodulated stimuli (i.e., correct rejections) and the right switch in response to modulated stimuli (i.e., hits). Responses resulted in immediate termination of the stimulus. Correct responses (i.e., hits and correct rejections) were reinforced by delivery of a millet seed, whereas incorrect responses (misses and false alarms) were followed by a 5-s timeout, during which all lights in the chamber were turned off. The pecking of any of the switches during a timeout resulted in extension of the timeout by 5 s. In rare instances in which the bird did not respond left or right within 3 s of stimulus onset, a short, 2-s timeout was imposed before the next trial could begin. Every block of 10 trials contained a random sequence of 5 target and 5 standard stimuli.
For each modulation frequency tested, birds spent the first few sessions discriminating fully modulated test stimuli from the standard stimulus. Following mastery of this task, AM thresholds were estimated repeatedly at the same modulation frequency using a two-down, one-up adaptive-tracking procedure (Levitt 1970). With this procedure, the modulation depth of the target stimulus was systematically varied within a track until AM was just barely detectable. Each pair of consecutive hits at the same modulation depth was followed by a reduction in depth, whereas each miss was followed by an increase in depth (up to the maximum depth of 0 dB). Steps in modulation depth were equal to 6/n dB, where n is the number of steps accumulated since the beginning of the track (Levitt 1970; Robbins and Monro 1951). The value of n was reset to 1 in rare cases when the track returned to 0 dB depth. Behavioral tracks were allowed to continue for a minimum of 15 depth reversals until 2 stability criteria were met: 1) the absolute difference in mean modulation depth between the last 4 reversals of the track and the 4 preceding reversals was required to be <2 dB, and 2) the SD of the modulation depth of the last 8 reversals was required to be <2 dB. The threshold of the track was calculated as the mean modulation depth of the last eight reversals. Response bias was monitored and controlled by varying the percentage of two-seed reinforcements for correct responses on the side against that which was biased. Thresholds, during which bias exceeded 0.3, were excluded from further analysis.
Thresholds were estimated repeatedly at the same modulation frequency a minimum of 13 times until 2 stability criteria were met: 1) the absolute difference in mean threshold among the last 3 thresholds and the preceding 3 thresholds was required to be <2 dB, and 2) the SD of the last 6 thresholds was required to be <2 dB. The overall threshold was calculated as the mean of the last six thresholds. Other modulation frequencies were then tested in different orders across birds. Following estimation of thresholds at all six modulation frequencies, AM thresholds were estimated again, at least two more times at each modulation frequency, until the overall threshold changed by <2 dB. Final thresholds were computed as the average of the last two overall thresholds (where each overall threshold was the average of 6 session thresholds).
Behavioral AM thresholds in the budgerigar.
Behavioral thresholds for AM detection of a 4-kHz tone-carrier stimulus were estimated in four budgerigars over ∼400 test sessions in each bird. Sessions were conducted four to six times per day and consisted of 100–200 trials each. Behavioral AM thresholds in the budgerigar improved by ∼5 dB, with increasing modulation frequency from 16 to 64 Hz (Fig. 3; n = 4 birds). Thresholds remained constant at −20 to −25 dB for modulation frequencies from 64 Hz up to the highest modulation frequency tested, 512 Hz.
Behavioral sensitivity to AM tones has been studied previously in humans and rabbits (Carney et al. 2014) using similar stimuli (5 kHz carrier, 500 ms duration, 50 dB SPL) and the same single-interval behavioral discrimination task and threshold-tracking procedure used here in the budgerigar (Fig. 3). Compared with humans, behavioral AM thresholds of budgerigars were similar at modulation frequencies from 16 to 128 Hz and slightly more sensitive at 256 Hz. Compared with rabbit, budgerigar thresholds were ∼10 dB more sensitive across the full range of modulation frequencies studied.
Neural pure-tone tuning curves.
Neurophysiological data were collected from the IC of three awake, unrestrained budgerigars over a total of 68, 2-h recording sessions. Individual neural recording sites in the budgerigar IC typically showed excitatory rate responses to tone stimuli with sharp frequency tuning (Fig. 1, A and B). CF increased from ∼400 Hz at dorsal recording sites to 5 kHz at the most ventral sites. The 10-dB bandwidth of pure-tone tuning increased with increasing CF (Fig. 1B), whereas mean Q10 (CF divided by 10 dB bandwidth) increased from 2.0 at CF of 1 kHz to 3.5 at CF of 4 kHz. These values are similar to Q10 values reported over the same range of CFs in the auditory periphery of the starling (Manley et al. 1985) and both the periphery and IC of cats (Miller et al. 1997; Ramachandran et al. 1999). Excitatory rate responses were frequently observed at sound levels as low as 15 dB SPL (the lowest stimulus level tested), particularly at recording sites with CFs from 1.5 to 3.5 kHz.
Individual neural recording sites in the budgerigar IC exhibited robust responses to fully modulated tone (carrier frequency equal to BF, the frequency of the maximal rate response to pure tones)- and noise-carrier stimuli. Representative neural responses are shown in Fig. 4A. For tone-carrier stimuli, MTFs plotting average discharge rate by stimulus modulation frequency usually showed band-enhanced AM tuning, i.e., an excitatory rate response to AM relative to the unmodulated stimulus response that spanned a limited range of modulation frequencies (58/68 recording sites; e.g., Fig. 4B). BMFs varied from 50.8 to 512 Hz within recording sites with band-enhanced modulation tuning [median = 203.2 Hz, interquartile range (IQR) = 128–322.5 Hz; Fig. 5A] and showed no apparent association with BF (Pearson correlation between log-transformed variables, r = 0.081, P = 0.55). Remaining rate MTFs obtained with AM tones were high pass (9/68 sites) or all pass in shape (i.e., flat; 1/68 sites).
Rate MTFs obtained with AM noise were often band enhanced in shape as well (51/68 sites; Fig. 4B) but were limited to lower BMFs (median = 101.6 Hz, IQR = 80.6–153 Hz; Wilcoxon sign rank Z = 5.177, P < 0.0001; Fig. 5A) and showed less enhancement of discharge rate at the MTF peak than MTFs obtained with AM tones (noise-carrier median = 25.0%; tone-carrier median = 200.9%; Wilcoxon sign rank Z = 7.161, P < 0.0001). Noise-carrier MTFs not showing band-enhanced rate responses to AM were invariably all pass in shape (17/68 sites).
Neural responses to AM tones and noise were generally well synchronized to the AM envelope of the stimulus. Envelope synchrony is evident in poststimulus time histograms (Fig. 4A) and period histograms (Fig. 6A) as temporal fluctuations in instantaneous discharge rate at the modulation frequency of the stimulus. Envelope synchrony peaked at modulation frequencies less than a few hundred hertz at most recording sites (tone-carrier median = 80.6 Hz; noise-carrier median = 101.6 Hz; Wilcoxon sign rank Z = −1.421, P = 0.155; Figs. 4B and 5B) and declined with both increasing and decreasing modulation frequency. Maximum synchrony was greater for tone-carrier (median = 0.75, IQR = 0.68–0.79) than for noise-carrier (median = 0.62, IQR = 0.55–0.67; Wilcoxon sign rank Z = 7.106, P < 0.0001; Fig. 5B) stimuli.
Rate-based neural thresholds fail to explain behavioral sensitivity to low modulation frequencies.
Neural thresholds for AM detection of tone-carrier stimuli could be calculated based on ROC analysis of average rate responses (e.g., Fig. 6B) from 90% of modulation depth functions (220/244 functions), with every recording site contributing at least one measurable threshold (68/68 sites). Across individual neural recording sites, rate-based AM thresholds improved with increasing modulation frequency (Fig. 7A; n = 220 thresholds, 68 sites). The median rate threshold improved by ∼15 dB with increasing modulation frequency from 32 to 512 Hz. This pattern contrasts with behavioral sensitivity to AM tones in this species, which varied by <5 dB over the same range of modulation frequencies. Rate-based AM thresholds were not sensitive enough to explain behavioral AM sensitivity at modulation frequencies <128 Hz. At higher modulation frequencies, rate thresholds were frequently as sensitive as or more sensitive than behavioral thresholds.
Rate-based thresholds for AM detection of noise-carrier stimuli could be calculated in many cases as well (i.e., 109/216 depth functions, 56/68 recording sites). Compared with rate thresholds for tone-carrier stimuli, rate thresholds for AM noise were generally less sensitive and exhibited peak sensitivity at lower modulation frequencies (64–256 Hz; Fig. 7A; n = 109 thresholds, 56 sites). Compared with previously published behavioral AM thresholds obtained in this species with noise-carrier stimuli (Dooling and Searcy 1981), IC rate thresholds were not sensitive enough to explain behavioral AM thresholds at modulation frequencies <128 Hz. At higher modulation frequencies, the best rate thresholds observed within the neural population approached behavioral performance.
Perhaps not surprisingly, individual recording sites with more strongly peaked rate MTFs (i.e., stronger rate-based modulation tuning) tended to have more sensitive rate-based AM thresholds than recording sites with flatter-rate MTFs. This conclusion is supported by the results of correlation analyses conducted in a subgroup of rate-based AM thresholds measured near the BMF of the recording site (i.e., within ±0.75 octaves; Fig. 7A; tone carrier: n = 82 thresholds, 57 sites; noise carrier: n = 49 thresholds, 44 sites). Inverse relationships were observed between log-transformed, percent-rate enhancement of the MTF and normalized rate threshold (observed threshold minus the mean threshold of the population at the test modulation frequency) for both tone (r = −0.406, P = 0.0002)- and noise (r = −0.710, P < 0.0001; Pearson correlations)-carrier stimuli.
Rate-based AM thresholds of the pooled neural population were approximately as sensitive as the best thresholds of individual recording sites (Fig. 7B) and hence, were also insufficient to explain behavioral sensitivity to low AM frequencies. Pooling was conducted with a maximum likelihood-based decoder analysis that optimally combines information across neural recording sites based on their individual discharge statistics (Day and Delgutte 2013; Jazayeri and Movshon 2006), thus estimating the best performance of the system.
Neural thresholds based on envelope synchrony explain behavioral AM sensitivity across stimulus conditions.
Neural response synchrony to the stimulus AM envelope typically increased with increasing modulation depth for both tone- and noise-carrier signals (Fig. 6B). Neural thresholds for AM detection based on ROC analysis of envelope synchrony could be calculated from nearly every modulation depth function recorded (tone carrier: 240/244 functions, 68/68 sites; noise carrier: 211/216 functions, 68/68 sites). Synchrony-based thresholds for AM detection of tone- and noise-carrier signals were generally most sensitive at modulation frequencies from 64 to 256 Hz (Fig. 7C; tone carrier: n = 240 thresholds, 68 sites; noise carrier: n = 211 thresholds, 68 sites). Notably, envelope synchrony thresholds were sufficiently sensitive to explain behavioral AM thresholds for both carrier types across the full range of modulation frequencies studied. Synchrony thresholds for tone-carrier stimuli were more sensitive than for noise at modulation frequencies >256 Hz, in agreement with behavioral data obtained with these two different carrier signals. Envelope synchrony thresholds were substantially more sensitive than rate thresholds at modulation frequencies up to 128 Hz for both carrier types. Envelope synchrony thresholds at higher modulation frequencies were similar to rate thresholds for noise-carrier stimuli and less sensitive than rate thresholds for tone-carrier stimuli.
Midbrain thresholds for AM detection are similar between budgerigar and rabbit.
The finding that behavioral thresholds for AM detection are ∼10 dB more sensitive in the budgerigar than in the rabbit raises the question of how thresholds of the IC compare between these two species. Neural AM thresholds in the rabbit were recalculated from an existing dataset of IC single-unit and multiunit neural recordings (Carney et al. 2014) using the same rate- and synchrony-based (VSPP) analyses used here. In general, rate and synchrony thresholds were similar between budgerigar and rabbits for both carrier types, up to the maximum modulation frequency studied in the rabbit, 256 Hz (Fig. 8). Whereas subtle differences exist between species (e.g., slightly more sensitive rate-based thresholds of rabbit single-unit responses to noise-carrier stimuli; Table 1), they are too minor to explain the differences in behavioral sensitivity to AM between budgerigar and rabbit.
The present study compared neural and behavioral sensitivity with AM stimuli in the budgerigar to gain insight into the midbrain processing mechanisms underlying behavioral detection abilities. Behavioral thresholds for AM detection of tone-carrier stimuli in the budgerigar were similar to human AM thresholds, as shown previously for AM noise (Dooling and Searcy 1981) and tone-carrier stimuli with lower modulation frequencies (Carney et al. 2013). Budgerigar IC thresholds based on average discharge rate were not sensitive enough to explain behavioral AM thresholds at modulation frequencies <128 Hz for both tone and noise stimuli. In contrast, thresholds based on envelope synchrony could explain behavioral performance for both carrier types across the full range of modulation frequencies studied (16–512 Hz).
The results show that behavioral thresholds for AM detection in the budgerigar are best explained by envelope synchrony in the IC rather than average discharge rate. This conclusion is based on the observation that synchrony thresholds in the budgerigar IC frequently exceeded the sensitivity of behavioral AM thresholds. The best rate-based thresholds in the neural population, in contrast, were considerably less sensitive than behavioral thresholds at low modulation frequencies (e.g., by 10–15 dB at 32 Hz). Whereas rate thresholds in the budgerigar IC were theoretically sensitive enough to support behavioral AM sensitivity at modulation frequencies >128 Hz, reliance on envelope synchrony may still be advantageous at these frequencies, given that envelope synchrony remains a more reliable indicator of AM structure than average rate in the presence of competing acoustic features. For example, average discharge rates in the IC vary with SPL (Ramachandran et al. 1999) and sound-source location (Calford et al. 1985; Day and Delgutte 2013; Kuwada et al. 2006). in addition to modulation frequency and depth.
Average rate responses in the budgerigar IC remained insufficient to explain behavioral AM sensitivity at low modulation frequencies even after optimal pooling of information across recording sites. Indeed, AM thresholds of the pooled population were approximately as sensitive as the best thresholds of individual recording sites, consistent with previous pooling analyses of AM responses in the macaque auditory cortex (Johnson et al. 2012). Information pooling was conducted with a maximum likelihood-based pattern decoder that scales the contribution of individual population elements based on the reliability of their discharge statistics and hence, estimates of the upper limit of the performance of the system (Jazayeri and Movshon 2006). This pooling model has been shown to decode sound-source location from population rate responses in the rabbit IC (Day and Delgutte 2013, 2016) and the caudolateral region of the macaque cortex (Miller and Recanzone 2009) with performance similar to behaving animals.
The conclusion that envelope synchrony in the budgerigar IC provides a stronger neural correlate of behavioral AM thresholds than average discharge rate contrasts with previous findings in the rabbit, a relatively nonvocal species with limited behavioral AM detection abilities. The results of the rabbit study point to rate coding in the IC rather than envelope synchrony as the primary determinant of behavioral AM sensitivity (Carney et al. 2014). The best neural rate thresholds in the rabbit IC are approximately as sensitive as behavioral thresholds. Best synchrony thresholds, in contrast, are substantially more sensitive than the behaving animal and appear sufficient, even to explain AM detection abilities in budgerigars and humans (Carney et al. 2014; Nelson and Carney 2007). A similar discrepancy between behavioral and synchrony-based estimates of AM sensitivity has been observed in macaque monkeys at the level of the primary auditory cortex (Johnson et al. 2012; Niwa et al. 2012). Macaques, which like rabbits, struggle to detect low modulation frequencies (O'Connor et al. 2011), fail to match the best synchrony thresholds of cortical neurons during behavioral detection of AM, performing more closely to the best rate-based thresholds of the cortical population. Rate and synchrony-based thresholds for AM detection have not been studied in the IC of nonhuman primates.
Greater behavioral sensitivity to AM in the budgerigar compared with the rabbit appears to reflect an improvement in the ability of more central auditory processing stages to decode envelope synchrony in IC responses rather than heightened sensitivity of IC responses per se. Indeed, both rate- and synchrony-based estimates of neural AM sensitivity were similar between the budgerigar and rabbit IC at modulation frequencies up to at least 256 Hz. The important distinction between species is that the budgerigar appears to make effective use of envelope synchrony in IC responses during behavioral AM detection, whereas the rabbit does not. An alternative explanation is that rabbits were simply not sufficiently trained at the AM detection task, but this possibility seems unlikely considering that individual rabbits maintained stable behavioral performance for ∼48,000 trials during testing at near-threshold modulation depths with the Bayesian procedure (Carney et al. 2014).
Human-like behavioral sensitivity to AM in the budgerigar, observed here for tone-carrier stimuli with modulation frequencies from 16 to 256 Hz, has been demonstrated previously at lower modulation frequencies, from 4 to 8 Hz (Carney et al. 2013). Whereas budgerigar AM thresholds were relatively stable from 64 to 512 Hz, human AM thresholds typically show a decline in sensitivity above 100–130 Hz, followed by improvement beyond 300–500 Hz (dependent on carrier frequency), as subjects begin to rely on spectral resolution of the stimulus sidebands for AM detection (Kohlrausch et al. 2000). It is unclear whether budgerigars lack this dip in AM sensitivity or whether a similar pattern might emerge with finer sampling of higher modulation frequencies, for example, closer to the minimum bandwidth of frequency tuning in IC neurons with CFs near 4 kHz (800–900 Hz; see Fig. 1B). Budgerigar AM thresholds for noise-carrier stimuli have been shown to decline steadily in sensitivity with increasing modulation frequency, from 5 to 1,280 Hz (Dooling and Searcy 1981). Budgerigar thresholds for AM noise are similar to thresholds of humans (Viemeister 1979), as well as European starlings and barn owls (Dent et al. 2002; Klump and Okanoya 1991), across this entire range of modulation frequencies.
Neural coding of AM in the budgerigar IC was broadly similar to coding in the midbrain of other bird species (Keller and Takahashi 2000; Woolley and Casseday 2005) and mammal species studied to date (Carney et al. 2014; Krishna and Semple 2000; Langner and Schreiner 1988; Nelson and Carney 2007; Rees and Møller 1987; Rees and Palmer 1989). Band-enhanced AM tuning, as observed at most budgerigar IC recording sites, is a common response property of IC neurons across taxa, although some mammalian species may possess a greater proportion of neurons with other rate-based MTF types (e.g., band suppressed and low pass) (Joris et al. 2004). The present results add to an emerging pattern of conserved IC physiological response properties in birds and mammals that in addition to similar AM response properties, includes shared tonotopic organization, diversity of frequency tuning curve shapes, and nonlinear spectral interactions [reviewed in Woolley and Portfors (2013)]. The combination of conserved physiological response properties and human-like perceptual capabilities, including temporal gap detection, modulation detection of linear-rippled noise, and detection of inharmonicity [reviewed in Dooling et al. (2000)], makes the budgerigar an interesting animal model for studying the neural bases of complex signal perception.
BMFs of rate-based MTFs in the budgerigar IC typically ranged from 130 to 320 Hz for tone-carrier stimuli, presented at BF. Although similar to rate BMFs in some mammalian species [chinchilla: Langner et al. (2002); squirrel monkey: Müller-Preuss et al. (1994)] and higher than in others [gerbil: Krishna and Semple (2000); cat: Langner and Schreiner (1988); rabbit: Nelson and Carney (2007)], the extent to which these patterns reflect true species differences vs. differences in recording methodology is unclear. Multiunit neural recordings, as used here in the budgerigar IC, may shift the distribution of observed-rate BMFs to higher modulation frequencies, either because they are less influenced by recording biases (i.e., neurons with higher BMFs may be more difficult to isolate as single units) or because they contain responses from lemniscal input fibers as well as IC neurons (Langner and Schreiner 1988). Whereas multiunit recordings could potentially yield different AM detection thresholds than single-unit recordings, similarity between multi- and single-unit AM thresholds in the rabbit IC (e.g., Fig. 8) suggests that this possibility may be unlikely. Furthermore, we estimate that only a small number of neurons contributed to budgerigar IC recordings based on moderate maximum discharge rates (<150–250 spikes/s) observed in response to Gaussian noise stimuli, which evoked robust neural activity at all recording sites and our use of relatively high impedance electrodes (3–5 MΩ) with small recording areas.
Between carrier types, budgerigar IC thresholds, based on both envelope synchrony and average rate, were more sensitive for tone-carrier stimuli than for AM noise. These differences, which correlate with differences in behavioral AM sensitivity, can ultimately be attributed to the stochastic nature of noise carrier signals, which in addition to any imposed sinusoidal AM, contain inherent temporal fluctuations in envelope amplitude related to narrow-band cochlear filtering of a wideband acoustic signal. Inherent envelope fluctuations, which are absent for AM tone stimuli, can interfere with neural representation of the target AM signal and ultimately mask behavioral detection. Because inherent envelope fluctuations can drive neurons, even when the stimulus modulation frequency is remote from the BMF, they can also explain the flatter-rate MTFs observed with noise-carrier stimuli compared with AM tones at many recording sites.
Sensitive AM processing abilities in the budgerigar may play an important role in the vocal communication behavior of this gregarious species. Budgerigars of both sexes produce a repertoire of temporally modulated contact calls used for coordination of group activity. Budgerigar vocalizations contain modulation frequencies ranging from 100 to 740 Hz (Lavenex 1999). Contact calls are learned through auditory feedback throughout life and when new social bonds form, undergo a transformation, during which the calls of individual birds converge on a shared call structure (Farabaugh et al. 1994). Male budgerigars also produce longer, multisyllabic warble songs that incorporate novel elements (e.g., mimicked environmental and interspecific sounds) and are directed at females (Tu et al. 2011). Warbles can go on for several minutes and play an important role in mating, courtship, and pair-bond maintenance. Similar behavioral AM sensitivity among the budgerigar, starling, and barn owl (Dent et al. 2002; Dooling and Searcy 1981; Klump and Okanoya 1991) suggests that this trait might be broadly shared across birds, rather than a specialization in the budgerigar.
In conclusion, the present study in the budgerigar demonstrates a clear correlation between thresholds for envelope synchrony in IC neurons and human-like behavioral AM sensitivity. Average rate responses, in contrast, which can explain behavioral performance in a species with more-limited abilities (rabbit), are unable to account for behavioral AM sensitivity in this vocal specialist. These new results highlight both the continued significance of envelope synchrony in the IC and the possible contribution of envelope synchrony to behavioral performance in other species with sensitive AM detection abilities, including humans (Nelson and Carney 2007). The importance of robust envelope synchrony in the IC should be a key design consideration in the development of stimulation strategies for central auditory prostheses (i.e., midbrain implants), which currently provide inadequate AM envelope cues for robust speech perception (Lim and Lenarz 2015).
Support for this work was provided by the National Institute on Deafness and Other Communication Disorders (Grants R01-DC001641 to L. H. Carney and K99-DC013792 to K. S. Henry).
No conflicts of interest, financial or otherwise, are declared by the authors.
Author contributions: K.S.H., F.I., and L.H.C. conception and design of research; K.S.H., E.G.N., and K.S.A. performed experiments; K.S.H. analyzed data; K.S.H., F.I., and L.H.C. interpreted results of experiments; K.S.H. prepared figures; K.S.H. drafted manuscript; K.S.H., E.G.N., K.S.A., F.I., and L.H.C. edited and revised manuscript; K.S.H., E.G.N., K.S.A., F.I., and L.H.C. approved final version of manuscript.
Mitchell L. Day provided the analysis code for calculation of rate-based population thresholds.
- Copyright © 2016 the American Physiological Society