|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Department of Psychology and Helen Wills Neuroscience Institute, University of California–Berkeley, Berkeley, California
Submitted 19 November 2007; accepted in final form 20 February 2008
|
|
ABSTRACT |
|---|
|
|
|
INTRODUCTION |
|---|
|
The hypothesis that higher-level sensory neurons are sensitive to unexpected features in the stimulus can also be framed in the context of redundancy reduction, which has been proposed as a general coding principle for sensory systems (Barlow 1961
). Natural stimuli have prominent correlations and thus have nonwhite power spectra, with low temporal, spatial, and spectral modulation frequency components having more power than that of higher-frequency components (Dong and Atick 1995
; Field 1987
; Singh and Theunissen 2003
). To maximize the information being carried by lower-level sensory neurons, it has been argued that neuronal stimulus–response functions should account for expected stimulus correlations by attenuating the lower frequencies and accentuating the higher frequencies (Atick 1992
; van Hateren 1992b
). This operation would result in neural responses to natural stimuli that have equal power at all frequencies and, for this reason, such filters are called whitening filters. Indeed, this theoretical prediction appears to hold in both the visual and auditory systems (Attias and Schreiner 1998
; Dan et al. 1996
; Escabi et al. 2003
; van Hateren 1992a
). For natural stimuli that have 1/f amplitude spectra (or 1/f2 power spectra), a filter that performs the derivative (spatial or temporal) of the stimulus will effectively whiten the stimulus, removing all second-order correlations (Srinivasan et al. 1982
).
However, the power spectra of natural stimuli do not capture all of the redundancy present in natural stimuli. We therefore hypothesized that higher-order sensory areas represent natural stimuli using a neural code that performs redundancy reduction beyond whitening and thus for natural stimuli, with 1/f amplitude distributions, would have stimulus–response functions that decrease redundancy beyond what would be obtained from a derivative. If so, the neurons would be sensitive not only to differences in stimulus intensity, but also to how predictable these otherwise equal changes in intensity would be. For example, zebra finch song typically contains some harmonic stacks with a smoothly changing fundamental frequency. Modulating the fundamental of a harmonic stack produces widespread changes in the stimulus intensity at all frequencies. However, stimulus intensity changes within these harmonic stacks are more predictable than stimulus intensity changes associated with a song syllable onset, even though the magnitudes of these two intensity changes are similar. In such cases, a surprise-STRF performing a full redundancy reduction would be more sensitive to the surprising changes at the syllable onset than to the predictable changes during frequency sweep of the harmonic stack.
We hypothesize that the surprise-STRF should be a good model of neural responses in brain regions where redundancy reduction extends beyond simply whitening the power spectrum of natural stimuli. To test this hypothesis, we compared the surprise-STRF's predictive power to that obtained from both a classical-STRF and from a STRF based on the split absolute derivative of the stimulus intensity. Although the classical-STRF can perform a derivative, the underlying model assumes symmetric sensitivity to positive and negative intensity changes. We designed the split absolute derivative STRF so that it could treat positive and negative changes separately. We call STRFs to the split absolute temporal derivative of stimulus intensity the derivative-STRF for short. To the extent the surprise-STRF outperforms the classical- and derivative-STRFs, we can say sensory spikes signify stimulus surprise, not stimulus intensity changes.
We estimated classical-, derivative-, and surprise-STRFs for single neurons at three levels of the zebra finch auditory system: the midbrain nucleus mesencephalicus lateralis dorsalis (MLd; analogous to the inferior colliculus in mammals); the primary auditory forebrain area Field L (analogous to primary auditory cortex in mammals); and a secondary auditory region, the caudal lateral mesopallium (CLM). Caudal mesopallium has been implicated in the processing of learned and behaviorally relevant sounds (Gentner and Margoliash 2003
). CLM may therefore have stimulus–response properties that are particularly efficient at representing birdsong. Songbirds are a model system studied to understand both vocal learning and processing of complex vocalizations (Zeigler and Marler 2004
). In the songbird auditory system, natural sounds are processed preferentially, eliciting higher spiking rates in the forebrain (Grace et al. 2003
), higher information rates in the midbrain and forebrain (Hsu et al. 2004b
), and higher neural synchronization in the midbrain (Woolley et al. 2006
) than do synthetic sounds. Moreover, classical-STRFs of these neurons show tuning for the informative features of natural sounds (Woolley et al. 2005
). Nonetheless, the classical-STRF cannot fully describe the observed spiking patterns and the discrepancy between model prediction and actual response increases from MLd to Field L to CLM (Gill et al. 2006
; Sen et al. 2001
). We tested whether we could improve our model predictions by using surprise-STRFs or derivative-STRFs.
|
|
METHODS |
|---|
|
The surprise of any specific sound element depends on what is expected. To determine how surprising each stimulus element is, we based our expectations on an ensemble of conspecific birdsongs because of the relevance of this class of sounds for the survival of the animal (Gil and Gahr 2002
) and because higher auditory and vocal areas in songbirds might be specialized for processing behaviorally relevant sounds (Theunissen et al. 2004
). We then calculated stimulus probability based on the typical behavior of birdsong and on a window of stimulus history we call the domain D. To find the best D, we first ran an analysis that showed that neurons in MLd, Field L, and CLM are largely insensitive to the context of zebra finch song on long timescales; specifically their response does not appear to be sensitive to expectations based on song-motif structure (also see RESULTS). The domain D could therefore be limited to the past few hundreds of milliseconds (the length of one motif). The D that led to the best predictions was a rectangle of the spectrogram including the recent history of the stimulus between 4 and 7 ms prior to the stimulus whose surprise is predicted and nearby frequency information within 0.625 kHz of the stimulus frequency being predicted (see Fig. 1B for an illustration of D and RESULTS for an account of how the optimal D was found). We quantified surprise in the following way
![]() | (1) |
|
The surprise-STRF is mathematically similar to the classical-STRF in the sense that both models assume that firing rates are linearly related to a nonlinear transformation of the sound pressure waveform. However, the nature of the nonlinear stimulus transformations is very different. Spikes represent specific patterns of surprise when modeled by surprise-STRFs, and specific patterns of stimulus intensity when modeled by classical-STRFs, although mathematically the former is not more nonlinear than the latter.
Our surprise-STRF is also formally similar to the linear probabilistic receptive field (LPRF; Averbeck and Romanski 2006
) in that both assume stimulus probability is encoded, but differs in that the LPRFs in that study made firing rates proportional to the probability of the stimulus belonging to a given class of macaque vocalizations, whereas our surprise-STRFs make firing rates proportional to surprise in local sound intensity features.
Why log (P)?
There are several reasons to expect the log of stimulus probability to be linearly related to firing rates. Using the log function, the summed surprise of two independent events a and b is the same as the surprise of the conjunction of a and b, since log [P(a) x P(b)] = log [P(a)] + log [P(b)] and the probability of both a and b occurring is P(a) x P(b). The log function also has special relevance to information theory (Shannon and Weaver 1963
). Equation 1 implies that surprise will be linearly related to the number of bits needed to encode the signal given knowledge of D and P(S|D). Thus if bit rates and spike rates are roughly proportional as reported (reviewed in Borst and Theunissen 1999
), then we would expect –log [P(S|D)] to be proportional to the number of spikes a neuron fires.
Calculating P(S|D)
To sample our domain we used a time bin of 1 ms and a frequency bin of 125 Hz (the same sampling as for our spectrograms; see Fig. 1A). Since the domain D has 3 different latencies and 9 different frequencies, it has a total of 27 free parameters. For the estimation of surprise using Eq. 1, we need to compute the conditional probability P(S|D), which without modification would be a 28-dimensional object (one for the stimulus intensity S in addition to the domain parameters). Since it is more difficult to estimate the conditional probabilities of high-dimensional objects and since there is redundancy in the 27 adjacent spectrogram bins of any zebra finch song, we reduced D to a 9-dimensional object using principal component analysis (PCA). Nine PCs were the maximum computationally practical number of PCs to consider and they captured >97% of the variance of D. We then estimated the 10-dimensional P(S|D) using a corpus of 59 adult male zebra finch songs chosen for their diversity and high recording quality. We made an independent estimate of P(S|D) for each frequency band, since the statistics of spectrograms of zebra finch song depend on frequency. Frequency bands for which D overlapped the edge of the spectrogram were discarded.
We used a discretized Gaussian kernel method for our estimate of P(S|D) as follows. We created a 10 dimensional lattice array to keep track of the joint probability P(S, D) of the domain and its stimulus. For each (S, D) pair in the corpus, we incremented the probability of each of the vertices of the 10 dimensional hypercube containing (S, D) (i.e., the closest lattice points in all directions) with an amount proportional to e–d2, where d is the Euclidian distance from (S, D) to the hypercube vertex. The d between lattice points was normalized to 1. To make the computation feasible, our method differed from traditional kernel density estimation in two ways: the discretization we used was relatively coarse and we incremented only the closest lattice points in all directions. We used more lattice bins to describe S and the stronger PCs of D than for the weakest PCs of D. We divided P(S, D) by P(D) to find P(S|D) (by Bayes' Theorem). After computing P(S|D), for every point in each stimulus we used for presentation, we quantified surprise by linearly interpolating P(S|D) from the closest lattice points.
Our method for estimating surprise is not necessarily unique or definitive. For example, relatively large changes in the latency, time window, and frequency range of D led to small differences in performance (see RESULTS). We believe the key features for good stimulus representations for secondary forebrain areas are 1) to represent the stimulus not in terms of its intensity, but in terms of how unexpected it is; and 2) the use of the log function, which gives the representation a theoretical grounding in information theory, and compresses large discrepancies in different estimations of P into small discrepancies in log (P).
Stimulus intensity derivatives
Our surprise representation has two confounding potential advantages over traditional spectrograms: twice as many free parameters allow onsets and offsets to be modeled differently (i.e., onsets are not constrained to have the opposite effect of offsets) and temporal changes (which are known to drive higher auditory areas) are highlighted. To control for both of these effects we also estimated split absolute derivative STRFs using a split time derivative of the spectrogram, where increasingly loud features (crescendo) are separated from increasingly quiet features (decrescendo) as in
![]() | (2) |
Receptive field estimation
For all our stimulus representations, we calculated STRFs using STRFPAK version 5.2 (http://strfpak.berkeley.edu), which incorporates a general linear regression algorithm, regularization techniques, validation techniques, and metrics described in detail previously (Hsu et al. 2004a
; Theunissen et al. 2001
; Woolley et al. 2006
). Version 5.2 also includes a double jackknife, which ensures that data used to find the best regularization parameters are not also used to evaluate the final goodness of fit. The goodness of fit of the STRF was quantified by estimating the correlation coefficient and the mutual information between the predicted and actual poststimulus time histograms (PSTHs). The predicted information is a function of the integral of the coherence over all frequencies between the actual and predicted responses (Hsu et al. 2004a
). Predicted information has the advantage over correlation coefficients in that the experimenter is not forced to choose a timescale of relevant PSTH features. Correctly predicted temporally precise features are rewarded by predicted information, in contrast with correlation coefficient methods that usually smooth out fine PSTH features like those shown in Fig. 1I. In this study we use predicted information to quantify most of our results, but also report correlation coefficient measures for comparison. The correlation coefficients were calculated after smoothing the PSTH with an 11-ms Hanning window and correcting for noise in the PSTH estimation (Hsu et al. 2004a
).
Neurophysiological recordings
Neural data were obtained from 46 adult male zebra finches. All subjects were reared in a colony in natural family groups and were not exposed to any of the songs used as a stimulus prior to the neurophysiological recordings session. Single-unit responses were obtained with extracellular tungsten electrodes in urethane-anesthetized birds. The location of the recordings was verified with standard histological techniques: for CLM, n = 37; Field L, n = 189; and MLd, n = 142. Each subject underwent simultaneous recording in either in both Field L and MLd (n = 29) or in both CLM and Field L (n = 17). Sounds were played from a loudspeaker placed 15 cm in front of the animal and sound levels had a peak intensity of 70 dB SPL. All neurons in CLM were in the lateral subdivision. Neurons in Field L were sampled from all subregions (L1, L2a, L2b, and L3). Data from 23 of these birds were also used in previously published work, and additional information on stimulus design, neurophysiological recordings, and histological techniques can be found in previous studies (Hsu et al. 2004b
; Woolley et al. 2005
, 2006
). All experimental procedures were approved by the Animal Care and Use Committee of UC Berkeley.
|
|
RESULTS |
|---|
|
We calculated and validated classical-, derivative-, and surprise-STRFs for all the neurons in our data set. Figure 1F shows the classical-STRF (the STRF based on stimulus intensity shown in the spectrogram) of a CLM neuron, whereas Fig. 1G shows the surprise-STRF of the same neuron. Figure 1G (top) shows the filter to be convolved with louder-than-expected events and Fig. 1G (bottom) shows the filter for quieter-than-expected events.
From a purely linear standpoint, the classical-STRF model predicts that this neuron increases its firing at onsets and decreases its firing below mean levels at offsets. However, the surprise-STRF indicates that this neuron is not inhibited by quieter-than-expected activity; if anything, firing rates may increase slightly in response to unexpected quietness.
The prediction improvement of the surprise-STRF shown in Fig. 1 is typical of area CLM. Figure 2 compares performances of classical-, derivative-, and surprise-STRFs in CLM. As shown in Fig. 2A, every CLM neuron that can be reasonably modeled with linear filters (those with a prediction score >1 bit/s) is described better by a surprise-STRF than by a derivative-STRF. Moreover, the preference for surprise-STRFs was evident in most subjects. For 16 of the 17 zebra finches that had CLM recordings, CLM neurons are described better by surprise-STRFs than by derivative-STRFs, and the one counterexample is a subject with only one CLM recording site. The surprise-STRF also outperforms the classical-STRF in CLM by a larger average margin, as shown in Fig. 2B. Figure 2C shows that the derivative-STRF outperforms the classical-STRF by a smaller margin.
|
|
Field L, but not MLd, showed some evidence of surprise coding. On average, PSTHs in Field L are described only 24% better by surprise-STRFs than by classical-STRFs (P = 8 x 10–5). The surprise-STRF also outperforms the derivative-STRF in Field L by 14% (P = 2 x 10–7) and the derivative-STRF outperforms the classical-STRF by 9% (P = 0.01). The effect size of the surprise improvement over the derivative-STRF is smaller in Field L than that in CLM (P = 7 x 10–5, Wilcoxon rank-sum test on the differences between surprise-STRF performance and derivative-STRF performance in CLM and Field L) and MLd shows no significant improvement of surprise-STRFs over either derivative-STRFs or classical-STRFs. Field L, physically between MLd and CLM along the ascending auditory pathway, also shows a functionally intermediate degree of tuning for surprise.
The advantages of using the surprise-STRF were not limited to better prediction of the neural response to onsets and offsets. In fact, surprise-STRFs offer the least improvement in MLd (see Fig. 3), the area with the most onset detection (see Fig. 4C). Moreover, surprising syllable features other than onsets, like the spectral change within a syllable (pointed out by the red braces above A, B, D, and E, in Fig. 1), could elicit neural activity that was well captured by a surprise-STRF and not as well captured by the other STRF models (as shown in Fig. 1, H and I). Conversely, nonsurprising acoustic changes often led to predicted spikes in classical- and derivative-STRFs that were not observed in the actual data.
|
In the search for the correct domain (see METHODS), only domains with a relatively short history were examined because the auditory areas studied did not appear to exhibit sensitivity to long-range context (>500 ms) and because computational and data limitations prevented us from obtaining good estimates of longer probabilistic relationships in zebra finch song. Zebra finch song typically has repeated motifs consisting of the same syllables in the same order (Fig. 4, A and B). If neurons encode surprise and are sensitive to long-term context, such as syllable order within a recently heard motif, they should be less surprised by the second repetition of a motif and firing rates in response to the second repetition should be systematically lower. Figure 4, C–E shows the mean PSTHs for MLd, Field L, and CLM for two repetitions of the same motif in one stimulus. These mean PSTHs are not substantially different, suggesting that, on average, the neuron's responses are not sensitive to motif-long context.
We therefore focused on investigating the optimal domain on shorter timescales. For computational reasons, we tested domains that were restricted to the immediate past: domain widths of 2 to 6 ms, minimum latencies of 3 to 6 ms, and spectral half-widths of 325 to 750 Hz. These timescales allowed us to capture expectations within a zebra finch syllable but failed to capture expectations that would result from reproducible sequences of syllables in motifs. The spatiotemporal extent of domain D was found through gradient ascent of PSTH predictability by changing the latency range and the frequency width parameters of D. The optimal D had latency from 4 to 7 ms and spectral width of 625 Hz. Our initial guess of D was half the size of the optimal D and yet predicted <1% worse than our final choice, suggesting that the details of how surprise is quantified do not appreciably affect model performance. Correlations in conspecific song on these spectral and temporal scales are so strong that all of the domains we tried contained nearly identical information, and thus our estimate of surprise does not depend on the exact extent of D. Conversely, since all D options we chose performed similarly we do not claim that the shape of D has a biological analogue. Moreover, it is possible that a larger domain that captures relationships in syllable sequence could further improve predictions, although a mathematically more complicated D would be needed to make searching for these dependencies computationally feasible.
No stimulus adaptation found
Aside from characterizing the stimulus–response function of CLM neurons, we asked whether these neurons exhibited plasticity during the course of the recording sessions. If CLM neurons change their expectations on a timescale ranging from minutes to hours, then systematic reductions in firing rates to repetitions of the same stimuli should be observed. We did not find any evidence of habituation in CLM, Field L, or MLd neurons, as is presented in the remainder of this section.
To assess habituation in firing rates, we ran two statistical tests on PSTHs. First, we performed a linear regression between the total number of spikes elicited in a given trial and the trial number (most stimuli were repeated 10 times): the "same neuron, later trial" test, pooling all data from all CLM neurons. The slope of this regression was not significantly different from 0, but showed an insignificant increase in mean firing rate of 0.4 spikes per stimulus per trial (two-sided F-test, P = 0.3). Second, we performed a linear regression between the mean number of spikes a CLM neuron fired per stimulus presentation and the number of times the (
1-h-long) stimulus protocol had been played to the animal being tested: the "same subject, later neuron" test. This regression yielded an insignificant decrease in mean firing rates of 0.2 spikes per stimulus per previous presentation (P = 0.9). Thus the CLM neurons we recorded from did not systematically decrease their firing rates in response to repetitions of initially unfamiliar stimuli. Neither Field L nor MLd showed significant systematic adaptation either (uncorrected P values for Field L are 0.3 and 0.8; and for MLd are 0.8 and 0.1, for the "same neuron, later trial" and the "same subject, later neuron" tests, respectively; the smallest Bonferroni-corrected P value is 0.7). Therefore systematic stimulus specific adaptation is not relevant to understanding MLd, Field L, and CLM under these experimental conditions, although firing rates might be plastic over the course of longer habituation, under different anesthesia, or by pairing a reward with a particular stimulus. It should be noted, however, that during our recording sessions we stopped acquiring data from specific sites if firing rates dropped to zero long before the presentation protocol was finished. These neurons were not analyzed and it is possible that they constitute a separate population of highly adapting neurons. Alternatively, these are neurons that could have been damaged by the electrode or the neural signal was lost because of other experimental artifacts such as brain movement.
Offset cells and auditory complex cells
The use of the surprise-STRF revealed two functional classes of forebrain auditory neurons that could not be well characterized by classical-STRFs: offset-only neurons and auditory complex cells. The former are modeled as firing in response to offsets, whereas the latter are modeled as firing whenever there is any surprising change in intensity (whether louder or quieter) within the latency and frequency range of the neuron. The classical- and surprise-STRFs of an offset-detecting neuron from Field L are shown in Fig. 5A.
|
Auditory complex neurons, found primarily in CLM, are general change detectors, firing to any unexpected stimulus pattern happening within a window of frequency and latency, regardless of the direction of that change (i.e., louder- or quieter-than-expected features). Figure 5, B–D shows three CLM complex neurons with different tuning characteristics. Figure 5B shows a neuron that fires when any surprising change happened between about 5 and 25 ms ago, in the frequency range of about 3 to 5 kHz. More broadband complex neurons are shown in Fig. 5, C and D.
Nine of the 37 CLM neurons we examined seemed to be auditory complex cells. Auditory complex cells and offset cells can be fit to some degree by derivative-STRFs, which is why in CLM and Field L derivative-STRFs outperformed classical-STRFs (see Fig. 3). However, since surprise-STRFs outperform derivative-STRFs by 40% (P = 5 x 10–7), the probability-related aspect of surprise is still crucial for understanding CLM.
Random stimuli and unmet expectations
Next, we checked whether the surprise-STRF model yields similar increases in performance when we predict responses to modulation-limited (ML) noise, a type of random stimulus that is frequency limited in its temporal (0–50 Hz) and spectral (0–2 cycles/kHz) modulations (Hsu et al. 2004b
). Contrary to white noise, ML noise drives high-level auditory neurons with firing rates similar to those driven by behaviorally relevant complex sounds and with spike patterns that are reliably in phase with acoustical structure in the sound. ML noise has therefore been used to extract basic response parameters including STRFs from cortical auditory neurons (Escabi and Schreiner 2002
; Klein et al. 2000
). Figure 6A shows a spectrogram of a sample of ML noise.
|
We calculated three types of STRF to ML noise: a classical-STRF, a surprise-STRF expecting ML noise, and a surprise-STRF expecting song. Figure 7 gives all pairwise performance comparisons of these three STRF types in CLM. Figure 7, A and B show that surprise-STRFs based on song expectations in general outperform both classical-STRFs and surprise-STRFs based on expectations of ML noise.
|
|
|
|
DISCUSSION |
|---|
|
An alternative interpretation to our main result (that coding in CLM is well characterized by surprise-STRFs) is that CLM neurons do not so much encode surprise as they implement some nonlinearities that are similar to our surprise formulation (Eq. 1). Our theoretical methodology is still useful for two reasons. First, surprise led us to a model for CLM neurons with unprecedented predictive power, which may prove to be an important step toward even better models of secondary forebrain neural functions. Second, the concept that higher sensory neurons encode surprise is theoretically rich, providing not just a functional description of CLM but a theoretical grounding for the type of stimulus representation found there.
The fact that surprise has more predictive power in CLM than that in Field L suggests two possibilities. First, Field L might not complete the computation of what is surprising. Some computation is required to silence unsurprising features and, in Field L, auditory information may not have passed through enough synapses to perform the needed surprise calculation. Second, the neural representation of surprising features in Field L might depend on expectations from any sound stimulus and not just conspecific song. This second hypothesis would be verifiable in a study where surprise is estimated with the expectation of a variety of different natural sounds, not just zebra finch song.
When determining the optimal domain for the surprise estimation, we selected a relatively small domain (with latency from 4 to 7 ms and spectral width of 625 Hz) that did not capture the expectations in song found at the level of syllable sequence and repeated motifs. What advantage could there be for the zebra finch auditory code in MLd, Field L, and CLM to effectively ignore the large redundancies of motif repetition, as shown in Fig. 4? One possible explanation is that having the same neural representation for the two motifs makes it easier for a song grammar detector downstream of CLM to notice whether the repetition of motifs was exact. Note the remarkable similarity between the first and second motif repetitions in Fig. 4, A and B. The ability to produce exactly the same motif might be used in female zebra finches as a proxy for general fitness during mate selection. Detecting precise motif repetition may be easier if the neural representation of each motif is identical. Another explanation for the absence of grammar-sensitive coding is that the neural circuitry needed to store uncompressed detailed auditory information may be unwieldy. If one of the listener's tasks is to learn a song efficiently or to notice repeated motifs, it is most efficient to first remove redundancy from the song's acoustic representation so that fewer spikes need to be recalled. The surprise-detecting mechanism found in CLM is ideal to perform this preprocessing step, since it provides an efficient and complete representation of zebra finch song.
Because of the short timescale we use, our use of the term "surprise" can appear to be significantly different from the use of surprise in the cognitive neuroscience literature, in which the domain is longer in time and stimulus semantics are more relevant. However, these differences are differences of scale and level, and surprise might be represented at all levels of forebrain processing. Surprise is used here to mean sensory-model mismatch left over after extensive redundancy reduction (Barlow 2001
). The type of redundancy reduction at the level of single neuronal responses that we observed in CLM (also referred to as lifetime sparseness) might work in conjunction with redundancy reduction across neurons (population sparseness). In the mammalian auditory system, population redundancy has been shown to decrease in the ascending auditory pathway (Chechik et al. 2006
).
Another interpretation consistent with our results is that CLM acts less as a specialized auditory encoder than as a mediator of bottom-up attention. In a study estimating surprise in natural movies (Itti and Baldi 2005
), it was noted that the observer's gaze consistently shifted toward areas in the movie that had more surprise (quantified using a mechanism similar to that of Eq. 1; see METHODS). A correlate to our central hypothesis (that CLM encodes birdsong well by using a surprise-based coding strategy) is that CLM can signal to other neurons when parts of a song have changed enough to warrant a resampling of recent auditory history.
Neural codes that use few spikes to represent surprising features of stimuli are efficient for both metabolic and computational reasons (Olshausen and Field 2004
). Although estimations of metabolic cost are somewhat controversial given the physiological complexity of the problem, there is a consensus that spiking and synaptic transmission account for most of the energy expenditure of the brain (Attwell and Laughlin 2001
; Lennie 2003
). In some calculations, the cost of spiking is estimated to be so high that only a very small fraction of neurons could be active at the same time, necessitating a very efficient representation (Lennie 2003
). Among sparse representations, encoding surprising features is desirable since the surprise transformation is invertible (see METHODS), meaning the entire stimulus has been encoded.
Independent of metabolic constraints, representing the stimulus in terms of surprise might be important for computational reasons. For example, sparse codes have also been shown to be beneficial in memory systems: it is easier to make new associations between features when their representations are sparse (Schweighofer et al. 2001
). Moreover, the complicated task of visual object recognition is facilitated by combining prior knowledge of the images while processing current image features (Kersten et al. 2004
). In a Bayesian framework, prior knowledge and sensory information are combined by multiplying probability distributions. If the neural representation of the stimulus is in a form where spikes correspond to log probabilities, posteriors could be calculated through addition (not multiplication) of spikes, since adding log probabilities is equivalent to multiplying raw probabilities. In most dendrites, addition is a simpler operation than multiplication, so representing the stimulus in terms of its log probability is an ideal preprocessing step for Bayesian object recognition.
In the context of memory storage, it should be noted that CM has been implicated in storing songs used in learning tasks (reviewed in Bolhuis and Gahr 2006
). Firing rates in CMM (the medial portion of CM) in response to familiar conspecific song motifs used in two-alternative choice tasks and go/no-go tasks are higher than in response to unfamiliar conspecific song motifs (Gentner and Margoliash 2003
). In the song system, responses to the bird's own song (which is very familiar) are maximal (reviewed in Theunissen et al. 2004
). Higher firing rates to familiar songs apparently contradict the idea that firing rates are proportional to stimulus surprise. However, it is possible that surprise is used to represent songs efficiently in the naïve state. This representation then gets modified in behavioral tasks that engage motivational systems and learning. Task relevance (not reflected in Eq. 1) should also influence the neural code; task-relevant features should have a stronger representation than irrelevant features, although our surprise model does not yet reflect this principle. Training or vocal learning would enhance relevance of certain stimuli and thus the corresponding spike rates. It is also likely that task relevance explains why random (and thus intrinsically surprising) artificial stimuli like white noise do not drive CLM as well as familiar but relevant stimuli like conspecific song. Furthermore, higher responses to a familiar song might still be to the surprising features of that song, given the statistics of all recently heard songs. It should also be noted that there are anesthetic (urethane vs. awake), species (zebra finch vs. starling), and region (CLM vs. CMM) differences between this study and that of Gentner and Margoliash (2003)
. Moreover, the surprise-STRF model did not capture all the nonlinear encoding observed in the response. What acoustical features CLM neurons represent in behavioral tasks and how this representation changes with learning remain open questions.
We also note that the surprise formulation is relevant to the stimulus-specific adaptation observed in auditory cortical areas (Ulanovsky et al. 2003
) as well as in another secondary auditory region, the medial caudal neostriatum (NCM), in the songbird (Chew et al. 1995
; Phan et al. 2006
; Stripling et al. 1997
). In both the mammalian and avian systems, repeated presentations of the same stimulus lead to long-term adaptation and, by consequence, a priming of response to novel (or deviant) stimuli. These responses could be modeled as a surprise, as we have done here, but with an update on what is expected given recent experience. Our current formulation assumes expectations that we labeled as naïve, meaning that they are either innate or acquired through the course of normal development. To model areas such as NCM, recently learned expectations could be added to the model to capture memory effects. More specifically, P(S|D) could be made to be experience dependent and in this way capture the effect that repeated songs become unsurprising relative to novel song. In other words, the stimulus-specific adaptation that is observed in these higher auditory areas could reflect how the stimulus prior is incorporated in the neural circuitry.
Our analysis was also useful for characterizing functional properties at the single-neuron level. Specifically, we found some neurons in CLM that were sensitive to surprising features irrespective of whether they were surprisingly soft or loud. We labeled such neurons auditory complex cells because they have properties that are analogous to those of visual complex cells in V1 (Skottun et al. 1991
). They are also reminiscent of the onset–offset or phasic neurons observed in the mammalian auditory cortex (Chimoto et al. 2002
; Recanzone 2000
; Wang et al. 2005
). We also found offset-only neurons in both Field L and CLM. Similar response properties have been described in Field L in awake birds when using high-intensity temporally shaped noise stimuli (Nagel and Doupe 2006
).
In summary, our results suggest that the firing rates of high-level sensory neurons depend more on the probability of natural stimulus features than on intensity or intensity changes. Thus expectations and natural statistics form a key part of the neural code. Moreover, using our approach, we were able to describe the responses of single neurons that were poorly described with the classical-STRF and the derivative-STRF. Our technique of using log probabilities is motivated by information theory, and not by any considerations unique to zebra finch CLM (or to audition in general). If neurons in other forebrain areas also encode stimulus surprise, our methodology will lead to a better understanding of sensory coding not only in terms of being better able to predict spike trains, but also by revealing a new operational principle: spikes in higher sensory areas indicate stimulus surprise, not intensity changes.
|
|
GRANTS |
|---|
|
|
|
ACKNOWLEDGMENTS |
|---|
|
Present addresses: S.M.N. Woolley: Department of Psychology, Columbia University, 406 Schermerhorn Hall, 1190 Amsterdam Ave., New York, NY 10027; T. Fremouw: Department of Psychology, University of Maine, 301 Little Hall, Orono, ME 04469-5742.
|
|
FOOTNOTES |
|---|
Address for reprint requests and other correspondence: F. Theunissen, Univeristy of California, Berkeley, Department of Psychology, 3210 Tolman Hall, Berkeley, CA 94720-1650 (E-mail: Theunissen{at}berkeley.edu)
|
|
REFERENCES |
|---|
|
Atick J. Could information theory provide an ecological theory of sensory processing? Network 3: 213–251, 1992.[Web of Science]
Attias H, Schreiner CE. Coding of naturalistic stimuli by auditory midbrain neurons. In: Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 1998, vol. 10, p. 103–109.
Attwell D, Laughlin SB. An energy budget for signaling in the grey matter of the brain. J Cereb Blood Flow Metab 21: 1133–1145, 2001.[Web of Science][Medline]
Averbeck BB, Romanski LM. Probabilistic encoding of vocalizations in macaque ventral lateral prefrontal cortex. J Neurosci 26: 11023–11033, 2006.
Barlow H. Redundancy reduction revisited. Network 12: 241–253, 2001.[Web of Science][Medline]
Barlow HB. Possible principles underlying the transformation of sensory messages. In: Sensory Communication, edited by Rosenbluth WA. Cambridge, MA: MIT Press, 1961, p. 217–234.
Bolhuis JJ, Gahr M. Neural mechanisms of birdsong memory. Nat Rev Neurosci 7: 347–357, 2006.[CrossRef][Web of Science][Medline]
Borst A, Theunissen FE. Information theory and neural coding. Nat Neurosci 2: 947–957, 1999.[CrossRef][Web of Science][Medline]
Chechik G, Anderson MJ, Bar-Yosef O, Young ED, Tishby N, Nelken I. Reduction of information redundancy in the ascending auditory pathway. Neuron 51: 359–368, 2006.[CrossRef][Web of Science][Medline]
Chew SJ, Mello C, Nottebohm F, Jarvis E, Vicario DS. Decrements in auditory responses to a repeated conspecific song are long-lasting and require two periods of protein synthesis in the songbird forebrain. Proc Natl Acad Sci USA 92: 3406–3410, 1995.
Chimoto S, Kitama T, Qin L, Sakayori S, Sato Y. Tonal response patterns of primary auditory cortex neurons in alert cats. Brain Res 934: 34–42, 2002.[CrossRef][Web of Science][Medline]
Cohen YE, Theunissen FE, Russ BE, Gill P. Acoustic features of rhesus vocalizations and their representation in the ventrolateral prefrontal cortex. J Neurophysiol 97: 1470–1484, 2007.
Dan Y, Atick JJ, Reid RC. Efficient coding of natural scenes in the lateral geniculate nucleus: experimental test of a computational theory. J Neurosci 16: 3351–3362, 1996.
David SV, Vinje WE, Gallant JL. Natural stimulus statistics alter the receptive field structure of V1 neurons. J Neurosci 24: 6991–7006, 2004.
DeAngelis GC, Ohzawa I, Freeman RD. Spatiotemporal organization of simple-cell receptive fields in the cat's striate cortex. I. General characteristics and postnatal development. J Neurophysiol 69: 1091–1117, 1993.
Dong DW, Atick JJ. Statistics of natural time-varying images. Network Comput Neural Syst 6: 345–358, 1995.[CrossRef]
Escabi MA, Miller LM, Read HL, Schreiner CE. Naturalistic auditory contrast improves spectrotemporal coding in the cat inferior colliculus. J Neurosci 23: 11489–11504, 2003.
Escabi MA, Schreiner CE. Nonlinear spectrotemporal sound analysis by neurons in the auditory midbrain. J Neurosci 22: 4114–4131, 2002.
Field DJ. Relations between the statistics of natural images and the response properties of cortical cells. J Opt Soc Am A 4: 2379–2394, 1987.[Web of Science][Medline]
Gentner TQ, Margoliash D. Neuronal populations and single cells representing learned auditory objects. Nature 424: 669–674, 2003.[CrossRef][Medline]
Gil D, Gahr M. The honesty of bird song: multiple constraints for multiple traits. Trends Ecol Evol 17: 133–141, 2002.[CrossRef]
Gill P, Zhang J, Woolley SM, Fremouw T, Theunissen FE. Sound representation methods for spectro-temporal receptive field estimation. J Comput Neurosci 21: 5–20, 2006.[CrossRef][Web of Science][Medline]
Goldstein A, Spencer KM, Donchin E. The influence of stimulus deviance and novelty on the P300 and novelty P3. Psychophysiology 39: 781–790, 2002.[CrossRef][Web of Science][Medline]
Grace JA, Amin N, Singh NC, Theunissen FE. Selectivity for conspecific song in the zebra finch auditory forebrain. J Neurophysiol 89: 472–487, 2003.
Hsu A, Borst A, Theunissen FE. Quantifying variability in neural responses and its application for the validation of model predictions. Network 15: 91–109, 2004a.[Web of Science][Medline]
Hsu A, Woolley SM, Fremouw TE, Theunissen FE. Modulation power and phase spectrum of natural sounds enhance neural encoding performed by single auditory neurons. J Neurosci 24: 9201–9211, 2004b.
Itti L, Baldi P. A principled approach to detecting surprising events in video. Proc IEEE Conf Comput Vision Pattern Recogn 1: 631–637, 2005.
Kersten D, Mamassian P, Yuille A. Object perception as Bayesian inference. Annu Rev Psychol 55: 271–304, 2004.[CrossRef][Web of Science][Medline]
Kiehl KA, Stevens MC, Laurens KR, Pearlson G, Calhoun VD, Liddle PF. An adaptive reflexive processing model of neurocognitive function: supporting evidence from a large scale (n = 100) fMRI study of an auditory oddball task. Neuroimage 25: 899–915, 2005.[CrossRef][Web of Science][Medline]
Klein DJ, Depireux DA, Simon JZ, Shamma SA. Robust spectro-temporal reverse correlation for the auditory system: optimizing stimulus design. J Comp Neurosci 9: 85–111, 2000.[CrossRef][Web of Science][Medline]
Lennie P. The cost of cortical computation. Curr Biol 13: 493–497, 2003.[CrossRef][Web of Science][Medline]
Machens CK, Wehr MS, Zador AM. Linearity of cortical receptive fields measured with natural sounds. J Neurosci 24: 1089–1100, 2004.
Nagel KI, Doupe AJ. Temporal processing and adaptation in the songbird auditory forebrain. Neuron 51: 845–859, 2006.[CrossRef][Web of Science][Medline]
Olshausen BA, Field DJ. Sparse coding of sensory inputs. Curr Opin Neurobiol 14: 481–487, 2004.[CrossRef][Web of Science][Medline]
Phan ML, Pytte CL, Vicario DS. Early auditory experience generates long-lasting memories that may subserve vocal learning in songbirds. Proc Natl Acad Sci USA 103: 1088–1093, 2006.
Prenger R, Wu MC, David SV, Gallant JL. Nonlinear V1 responses to natural scenes revealed by neural network analysis. Neural Networks 17: 663–679, 2004.[CrossRef][Web of Science][Medline]
Recanzone GH. Response profiles of auditory cortical neurons to tones and noise in behaving macaque monkeys. Hear Res 150: 104–118, 2000.[CrossRef][Web of Science][Medline]
Schweighofer N, Doya K, Lay F. Unsupervised learning of granule cell sparse codes enhances cerebellar adaptive control. Neuroscience 103: 35–50, 2001.[CrossRef][Web of Science][Medline]
Sen K, Theunissen FE, Doupe AJ. Feature analysis of natural sounds in the songbird auditory forebrain. J Neurophysiol 86: 1445–1458, 2001.
Shannon CE, Weaver W. The Mathematical Theory of Communication. Chicago, IL: Univ. of Illinois Press, 1963.
Singh NC, Theunissen FE. Modulation spectra of natural sounds and ethological theories of auditory processing. J Acoust Soc Am 114: 3394–3411, 2003.[CrossRef][Web of Science][Medline]
Skottun BC, DeValois RL, Grosof DH, Movshon JA, Albrecht DG, Bonds AB. Classifying simple and complex cells on the basis of response modulation. Vision Res 31: 1079–1086, 1991.[CrossRef][Web of Science][Medline]
Srinivasan MV, Laughlin SB, Dubs A. Predictive coding: a fresh view of inhibition in the retina. Proc R Soc Lond B Biol Sci 216: 427–459, 1982.[Medline]
Stripling R, Volman S, Clayton D. Response modulation in the zebra finch caudal neostriatum: relationship to nuclear gene regulation. J Neurosci 17: 3883–3893, 1997.
Theunissen FE, Amin N, Shaevitz SS, Woolley SM, Fremouw T, Hauber ME. Song selectivity in the song system and in the auditory forebrain. Ann NY Acad Sci 1016: 222–245, 2004.[CrossRef][Web of Science][Medline]
Theunissen FE, David SV, Singh NC, Hsu A, Vinje W, Gallant JL. Estimating spatio-temporal receptive fields of auditory and visual neurons from their responses to natural stimuli. Network Comput Neural Syst 12: 1–28, 2001.
Theunissen FE, Sen K, Doupe AJ. Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds. J Neurosci 20: 2315–2331, 2000.
Ulanovsky N, Las L, Nelken I. Processing of low-probability sounds by cortical neurons. Nat Neurosci 6: 391–398, 2003.[CrossRef][Web of Science][Medline]
van Hateren JH. Theoretical predictions of spatiotemporal receptive fields of fly LMCs, and experimental validation. J Comp Physiol A Sens Neural Behav Physiol 171: 157–170, 1992a.
van Hateren JH. A theory of maximizing sensory information. Biol Cybern 68: 23–29, 1992b.[CrossRef][Web of Science][Medline]
Wang X, Lu T, Snider RK, Liang L. Sustained firing in auditory cortex evoked by preferred stimuli. Nature 435: 341–346, 2005.[CrossRef][Medline]
Woolley SM, Fremouw TE, Hsu A, Theunissen FE. Tuning for spectro-temporal modulations as a mechanism for auditory discrimination of natural sounds. Nat Neurosci 8: 1371–1379, 2005.[CrossRef][Web of Science][Medline]
Woolley SM, Gill PR, Theunissen FE. Stimulus-dependent auditory tuning results in synchronous population coding of vocalizations in the songbird midbrain. J Neurosci 26: 2499–2512, 2006.
Zeigler HP, Marler P. Behavioral neurobiology of birdsong. Ann NY Acad Sci 1016: 724–735, 2004.[CrossRef][Web of Science][Medline]
This article has been cited by other articles:
![]() |
R. Kurtz, M. Egelhaaf, H. G. Meyer, and R. Kern Adaptation accentuates responses of fly motion-sensitive visual neurons to sudden stimulus changes Proc R Soc B, October 22, 2009; 276(1673): 3711 - 3719. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Q. Gentner Surprising Twist on Auditory Representation. Focus on: "What's That Sound? Auditory Area CLM Encodes Stimulus Surprise, Not Intensity or Intensity Changes" J Neurophysiol, June 1, 2008; 99(6): 2755 - 2756. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Visit Other APS Journals Online |