JN  AJP: Regulatory, Integrative and Comparative Physiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J Neurophysiol 99: 2809-2820, 2008. First published February 20, 2008; doi:10.1152/jn.01270.2007
0022-3077/08 $8.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
99/6/2809    most recent
01270.2007v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (2)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Gill, P.
Right arrow Articles by Theunissen, F. E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gill, P.
Right arrow Articles by Theunissen, F. E.

What's That Sound? Auditory Area CLM Encodes Stimulus Surprise, Not Intensity or Intensity Changes

Patrick Gill, Sarah M. N. Woolley, Thane Fremouw and Frédéric E. Theunissen

Department of Psychology and Helen Wills Neuroscience Institute, University of California–Berkeley, Berkeley, California

Submitted 19 November 2007; accepted in final form 20 February 2008


 ABSTRACT
 
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
High-level sensory neurons encoding natural stimuli are not well described by linear models operating on the time-varying stimulus intensity. Here we show that firing rates of neurons in a secondary sensory forebrain area can be better modeled by linear functions of how surprising the stimulus is. We modeled auditory neurons in the caudal lateral mesopallium (CLM) of adult male zebra finches under urethane anesthesia with linear filters convolved not with stimulus intensity, but with stimulus surprise. Surprise was quantified as the logarithm of the probability of the stimulus given the local recent stimulus history and expectations based on conspecific song. Using our surprise method, the predictions of neural responses to conspecific song improved by 67% relative to those obtained using stimulus intensity. Similar prediction improvements cannot be replicated by assuming CLM performs derivative detection. The explanatory power of surprise increased from the midbrain through the primary forebrain and to CLM. When the stimulus presented was a random synthetic ripple noise, CLM neurons (but not neurons in lower auditory areas) were best described as if they were expecting conspecific song, finding the inconsistencies between birdsong and noise surprising. In summary, spikes in CLM neurons indicate stimulus surprise more than they indicate stimulus intensity features. The concept of stimulus surprise may be useful for modeling neural responses in other higher-order sensory areas whose functions have been poorly understood.


 INTRODUCTION
 
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
The response properties of neurons in higher-level sensory areas have been difficult to describe in functional terms, hampering our understanding of the computations performed in these brain areas. One systematic approach has been to describe the stimulus–response relation in terms of the spatiotemporal receptive field (STRF) in vision (DeAngelis et al. 1993Go) or the spectrotemporal receptive field (also STRF) in audition (Aertsen and Johannesma 1981Go). In higher sensory areas, this classical-STRF often fails to yield good estimates of the observed responses to natural stimuli (Cohen et al. 2007Go; David et al. 2004Go; Machens et al. 2004Go; Theunissen et al. 2000Go). The failure of classical-STRFs in higher sensory areas may be related to the consistent finding that the sensory cortex responds robustly to unexpected events (Goldstein et al. 2002Go; Kiehl et al. 2005Go; Ulanovsky et al. 2003Go). If the firing rates of forebrain sensory neurons are proportional to surprise, it should be possible to find linear filters relating firing rates to how surprising each stimulus element is, given an appropriate probabilistic model. To differentiate these filters from classical-STRFs, we call them surprise-STRFs. Inasmuch as surprise-STRFs fit neural data well, we can say action potentials in the sensory forebrain indicate the degree of unexpectedness in the stimulus.

The hypothesis that higher-level sensory neurons are sensitive to unexpected features in the stimulus can also be framed in the context of redundancy reduction, which has been proposed as a general coding principle for sensory systems (Barlow 1961Go). Natural stimuli have prominent correlations and thus have nonwhite power spectra, with low temporal, spatial, and spectral modulation frequency components having more power than that of higher-frequency components (Dong and Atick 1995Go; Field 1987Go; Singh and Theunissen 2003Go). To maximize the information being carried by lower-level sensory neurons, it has been argued that neuronal stimulus–response functions should account for expected stimulus correlations by attenuating the lower frequencies and accentuating the higher frequencies (Atick 1992Go; van Hateren 1992bGo). This operation would result in neural responses to natural stimuli that have equal power at all frequencies and, for this reason, such filters are called whitening filters. Indeed, this theoretical prediction appears to hold in both the visual and auditory systems (Attias and Schreiner 1998Go; Dan et al. 1996Go; Escabi et al. 2003Go; van Hateren 1992aGo). For natural stimuli that have 1/f amplitude spectra (or 1/f2 power spectra), a filter that performs the derivative (spatial or temporal) of the stimulus will effectively whiten the stimulus, removing all second-order correlations (Srinivasan et al. 1982Go).

However, the power spectra of natural stimuli do not capture all of the redundancy present in natural stimuli. We therefore hypothesized that higher-order sensory areas represent natural stimuli using a neural code that performs redundancy reduction beyond whitening and thus for natural stimuli, with 1/f amplitude distributions, would have stimulus–response functions that decrease redundancy beyond what would be obtained from a derivative. If so, the neurons would be sensitive not only to differences in stimulus intensity, but also to how predictable these otherwise equal changes in intensity would be. For example, zebra finch song typically contains some harmonic stacks with a smoothly changing fundamental frequency. Modulating the fundamental of a harmonic stack produces widespread changes in the stimulus intensity at all frequencies. However, stimulus intensity changes within these harmonic stacks are more predictable than stimulus intensity changes associated with a song syllable onset, even though the magnitudes of these two intensity changes are similar. In such cases, a surprise-STRF performing a full redundancy reduction would be more sensitive to the surprising changes at the syllable onset than to the predictable changes during frequency sweep of the harmonic stack.

We hypothesize that the surprise-STRF should be a good model of neural responses in brain regions where redundancy reduction extends beyond simply whitening the power spectrum of natural stimuli. To test this hypothesis, we compared the surprise-STRF's predictive power to that obtained from both a classical-STRF and from a STRF based on the split absolute derivative of the stimulus intensity. Although the classical-STRF can perform a derivative, the underlying model assumes symmetric sensitivity to positive and negative intensity changes. We designed the split absolute derivative STRF so that it could treat positive and negative changes separately. We call STRFs to the split absolute temporal derivative of stimulus intensity the derivative-STRF for short. To the extent the surprise-STRF outperforms the classical- and derivative-STRFs, we can say sensory spikes signify stimulus surprise, not stimulus intensity changes.

We estimated classical-, derivative-, and surprise-STRFs for single neurons at three levels of the zebra finch auditory system: the midbrain nucleus mesencephalicus lateralis dorsalis (MLd; analogous to the inferior colliculus in mammals); the primary auditory forebrain area Field L (analogous to primary auditory cortex in mammals); and a secondary auditory region, the caudal lateral mesopallium (CLM). Caudal mesopallium has been implicated in the processing of learned and behaviorally relevant sounds (Gentner and Margoliash 2003Go). CLM may therefore have stimulus–response properties that are particularly efficient at representing birdsong. Songbirds are a model system studied to understand both vocal learning and processing of complex vocalizations (Zeigler and Marler 2004Go). In the songbird auditory system, natural sounds are processed preferentially, eliciting higher spiking rates in the forebrain (Grace et al. 2003Go), higher information rates in the midbrain and forebrain (Hsu et al. 2004bGo), and higher neural synchronization in the midbrain (Woolley et al. 2006Go) than do synthetic sounds. Moreover, classical-STRFs of these neurons show tuning for the informative features of natural sounds (Woolley et al. 2005Go). Nonetheless, the classical-STRF cannot fully describe the observed spiking patterns and the discrepancy between model prediction and actual response increases from MLd to Field L to CLM (Gill et al. 2006Go; Sen et al. 2001Go). We tested whether we could improve our model predictions by using surprise-STRFs or derivative-STRFs.


 METHODS
 
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
The surprise metric

The surprise of any specific sound element depends on what is expected. To determine how surprising each stimulus element is, we based our expectations on an ensemble of conspecific birdsongs because of the relevance of this class of sounds for the survival of the animal (Gil and Gahr 2002Go) and because higher auditory and vocal areas in songbirds might be specialized for processing behaviorally relevant sounds (Theunissen et al. 2004Go). We then calculated stimulus probability based on the typical behavior of birdsong and on a window of stimulus history we call the domain D. To find the best D, we first ran an analysis that showed that neurons in MLd, Field L, and CLM are largely insensitive to the context of zebra finch song on long timescales; specifically their response does not appear to be sensitive to expectations based on song-motif structure (also see RESULTS). The domain D could therefore be limited to the past few hundreds of milliseconds (the length of one motif). The D that led to the best predictions was a rectangle of the spectrogram including the recent history of the stimulus between 4 and 7 ms prior to the stimulus whose surprise is predicted and nearby frequency information within 0.625 kHz of the stimulus frequency being predicted (see Fig. 1B for an illustration of D and RESULTS for an account of how the optimal D was found). We quantified surprise in the following way

Formula 1(1)
Here D is the domain, S is the stimulus intensity whose surprise is being considered, SML is the most likely stimulus given D, and P(S|D) stands for the conditional probability of S given D and depends on the knowledge of statistical dependencies in the sounds from a representative corpus of unfamiliar conspecific songs. The SML term ensures that Surprise is continuous at S = SML. Surprise, SML, S, and D are all functions of frequency and time. Surprise is an array with twice as many spatial dimensions as the original spectrogram; its top half contains louder-than-expected features and its bottom half contains quieter-than-expected features. Treating louder-than-expected and quieter-than-expected features identically would destroy the sign of the surprise in the sense that an event so loud that it was only 2% as likely as the most probable event would have the same representation as a quieter-than-expected event with equally low probability. Therefore as long as P(S|D) is unimodal (with mode SML) the surprise representation is invertible; thus surprise is a complete stimulus representation: any neural code that efficiently relates stimulus surprise also relates all details of the stimulus intensity (and not just the surprising parts of the stimulus). It should also be noted that for certain domains, a stimulus change is more probable than a constant stimulus. In those cases, an observed constant stimulus would be more surprising than the expected change in the stimulus.


Figure 1
View larger version (86K):
[in this window]
[in a new window]

 
FIG. 1. The classical-STRF (spatio- or spectrotemporal receptive field) and the surprise-STRF. A and B: spectrogram segments of a zebra finch song. B: detail from A. Rectangles highlight a stimulus element S and the domain D used to calculate how surprising S is. C: illustration of Eq. 1. The quantified surprise of S is –log [P(S|D)] (see METHODS and Eq. 1). D and E: surprisingly loud (top) and quiet (bottom) elements of the zebra finch song shown in A and B. E: detail from D. F: the classical-STRF obtained from reverse correlation between the poststimulus time histogram (PSTH) and a spectrogram of the stimulus. G: the surprise-STRF, obtained from reverse correlation using the quantified surprise of the stimulus represented as in D. H and I: comparison of the predicted PSTHs to the actual PSTHs using: the surprise-STRF, the classical-STRF, and the derivative-STRF (see Eq. 2 and METHODS). I: detail from H. Legend: PSTH is the raw PSTH smoothed with a 3-ms Hanning window, and S-STRF, D-STRF, and C-STRF are jackknife predictions made by the surprise-STRF, derivative-STRF, and classical-STRF, respectively. Red braces with arrows highlight areas in A and D where there is a surprising change within a syllable. This intrasyllable change, like the syllable onsets, elicits a response, which is predictable using the surprise-STRF, as seen in H and I.

 
The surprise metric is similar to that used in another study (Itti and Baldi 2005Go), where the surprise in natural video was defined as the Kullback–Leibler (K–L) divergence between expectations about the stimulus (the Bayesian prior) and the stimulus representation once the stimulus had been perceived (the Bayesian posterior). The above-cited K–L divergence is proportional to our surprise metric if there is a small degree of uncertainty in the Bayesian posterior and if this uncertainty is constant.

The surprise-STRF is mathematically similar to the classical-STRF in the sense that both models assume that firing rates are linearly related to a nonlinear transformation of the sound pressure waveform. However, the nature of the nonlinear stimulus transformations is very different. Spikes represent specific patterns of surprise when modeled by surprise-STRFs, and specific patterns of stimulus intensity when modeled by classical-STRFs, although mathematically the former is not more nonlinear than the latter.

Our surprise-STRF is also formally similar to the linear probabilistic receptive field (LPRF; Averbeck and Romanski 2006Go) in that both assume stimulus probability is encoded, but differs in that the LPRFs in that study made firing rates proportional to the probability of the stimulus belonging to a given class of macaque vocalizations, whereas our surprise-STRFs make firing rates proportional to surprise in local sound intensity features.

Why log (P)?

There are several reasons to expect the log of stimulus probability to be linearly related to firing rates. Using the log function, the summed surprise of two independent events a and b is the same as the surprise of the conjunction of a and b, since log [P(a) x P(b)] = log [P(a)] + log [P(b)] and the probability of both a and b occurring is P(a) x P(b). The log function also has special relevance to information theory (Shannon and Weaver 1963Go). Equation 1 implies that surprise will be linearly related to the number of bits needed to encode the signal given knowledge of D and P(S|D). Thus if bit rates and spike rates are roughly proportional as reported (reviewed in Borst and Theunissen 1999Go), then we would expect –log [P(S|D)] to be proportional to the number of spikes a neuron fires.

Calculating P(S|D)

To sample our domain we used a time bin of 1 ms and a frequency bin of 125 Hz (the same sampling as for our spectrograms; see Fig. 1A). Since the domain D has 3 different latencies and 9 different frequencies, it has a total of 27 free parameters. For the estimation of surprise using Eq. 1, we need to compute the conditional probability P(S|D), which without modification would be a 28-dimensional object (one for the stimulus intensity S in addition to the domain parameters). Since it is more difficult to estimate the conditional probabilities of high-dimensional objects and since there is redundancy in the 27 adjacent spectrogram bins of any zebra finch song, we reduced D to a 9-dimensional object using principal component analysis (PCA). Nine PCs were the maximum computationally practical number of PCs to consider and they captured >97% of the variance of D. We then estimated the 10-dimensional P(S|D) using a corpus of 59 adult male zebra finch songs chosen for their diversity and high recording quality. We made an independent estimate of P(S|D) for each frequency band, since the statistics of spectrograms of zebra finch song depend on frequency. Frequency bands for which D overlapped the edge of the spectrogram were discarded.

We used a discretized Gaussian kernel method for our estimate of P(S|D) as follows. We created a 10 dimensional lattice array to keep track of the joint probability P(S, D) of the domain and its stimulus. For each (S, D) pair in the corpus, we incremented the probability of each of the vertices of the 10 dimensional hypercube containing (S, D) (i.e., the closest lattice points in all directions) with an amount proportional to e–d2, where d is the Euclidian distance from (S, D) to the hypercube vertex. The d between lattice points was normalized to 1. To make the computation feasible, our method differed from traditional kernel density estimation in two ways: the discretization we used was relatively coarse and we incremented only the closest lattice points in all directions. We used more lattice bins to describe S and the stronger PCs of D than for the weakest PCs of D. We divided P(S, D) by P(D) to find P(S|D) (by Bayes' Theorem). After computing P(S|D), for every point in each stimulus we used for presentation, we quantified surprise by linearly interpolating P(S|D) from the closest lattice points.

Our method for estimating surprise is not necessarily unique or definitive. For example, relatively large changes in the latency, time window, and frequency range of D led to small differences in performance (see RESULTS). We believe the key features for good stimulus representations for secondary forebrain areas are 1) to represent the stimulus not in terms of its intensity, but in terms of how unexpected it is; and 2) the use of the log function, which gives the representation a theoretical grounding in information theory, and compresses large discrepancies in different estimations of P into small discrepancies in log (P).

Stimulus intensity derivatives

Our surprise representation has two confounding potential advantages over traditional spectrograms: twice as many free parameters allow onsets and offsets to be modeled differently (i.e., onsets are not constrained to have the opposite effect of offsets) and temporal changes (which are known to drive higher auditory areas) are highlighted. To control for both of these effects we also estimated split absolute derivative STRFs using a split time derivative of the spectrogram, where increasingly loud features (crescendo) are separated from increasingly quiet features (decrescendo) as in

Formula 2(2)
Here S is the spectrogram intensity, t is time, and Der is the split derivative to be used as a stimulus representation for prediction. If the role of neurons is to report all changes in stimulus intensity equally (as would be expected with a whitening code applied to stimuli with a 1/f2 power spectrum), then this representation should be ideal.

Receptive field estimation

For all our stimulus representations, we calculated STRFs using STRFPAK version 5.2 (http://strfpak.berkeley.edu), which incorporates a general linear regression algorithm, regularization techniques, validation techniques, and metrics described in detail previously (Hsu et al. 2004aGo; Theunissen et al. 2001Go; Woolley et al. 2006Go). Version 5.2 also includes a double jackknife, which ensures that data used to find the best regularization parameters are not also used to evaluate the final goodness of fit. The goodness of fit of the STRF was quantified by estimating the correlation coefficient and the mutual information between the predicted and actual poststimulus time histograms (PSTHs). The predicted information is a function of the integral of the coherence over all frequencies between the actual and predicted responses (Hsu et al. 2004aGo). Predicted information has the advantage over correlation coefficients in that the experimenter is not forced to choose a timescale of relevant PSTH features. Correctly predicted temporally precise features are rewarded by predicted information, in contrast with correlation coefficient methods that usually smooth out fine PSTH features like those shown in Fig. 1I. In this study we use predicted information to quantify most of our results, but also report correlation coefficient measures for comparison. The correlation coefficients were calculated after smoothing the PSTH with an 11-ms Hanning window and correcting for noise in the PSTH estimation (Hsu et al. 2004aGo).

Neurophysiological recordings

Neural data were obtained from 46 adult male zebra finches. All subjects were reared in a colony in natural family groups and were not exposed to any of the songs used as a stimulus prior to the neurophysiological recordings session. Single-unit responses were obtained with extracellular tungsten electrodes in urethane-anesthetized birds. The location of the recordings was verified with standard histological techniques: for CLM, n = 37; Field L, n = 189; and MLd, n = 142. Each subject underwent simultaneous recording in either in both Field L and MLd (n = 29) or in both CLM and Field L (n = 17). Sounds were played from a loudspeaker placed 15 cm in front of the animal and sound levels had a peak intensity of 70 dB SPL. All neurons in CLM were in the lateral subdivision. Neurons in Field L were sampled from all subregions (L1, L2a, L2b, and L3). Data from 23 of these birds were also used in previously published work, and additional information on stimulus design, neurophysiological recordings, and histological techniques can be found in previous studies (Hsu et al. 2004bGo; Woolley et al. 2005Go, 2006Go). All experimental procedures were approved by the Animal Care and Use Committee of UC Berkeley.


 RESULTS
 
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
The quantified surprise of the stimulus was estimated by computing the log of the probability of the stimulus intensity at all frequencies and time points given a recent history of the stimulus, i.e., its domain D (see METHODS). These conditional probabilities were obtained from a corpus of 59 zebra finch songs (see METHODS and Fig. 1, AE). To avoid throwing away stimulus information and to keep our representation invertible, surprisingly quiet and surprisingly loud stimulus elements were separated.

We calculated and validated classical-, derivative-, and surprise-STRFs for all the neurons in our data set. Figure 1F shows the classical-STRF (the STRF based on stimulus intensity shown in the spectrogram) of a CLM neuron, whereas Fig. 1G shows the surprise-STRF of the same neuron. Figure 1G (top) shows the filter to be convolved with louder-than-expected events and Fig. 1G (bottom) shows the filter for quieter-than-expected events.

From a purely linear standpoint, the classical-STRF model predicts that this neuron increases its firing at onsets and decreases its firing below mean levels at offsets. However, the surprise-STRF indicates that this neuron is not inhibited by quieter-than-expected activity; if anything, firing rates may increase slightly in response to unexpected quietness.

The prediction improvement of the surprise-STRF shown in Fig. 1 is typical of area CLM. Figure 2 compares performances of classical-, derivative-, and surprise-STRFs in CLM. As shown in Fig. 2A, every CLM neuron that can be reasonably modeled with linear filters (those with a prediction score >1 bit/s) is described better by a surprise-STRF than by a derivative-STRF. Moreover, the preference for surprise-STRFs was evident in most subjects. For 16 of the 17 zebra finches that had CLM recordings, CLM neurons are described better by surprise-STRFs than by derivative-STRFs, and the one counterexample is a subject with only one CLM recording site. The surprise-STRF also outperforms the classical-STRF in CLM by a larger average margin, as shown in Fig. 2B. Figure 2C shows that the derivative-STRF outperforms the classical-STRF by a smaller margin.


Figure 2
View larger version (9K):
[in this window]
[in a new window]

 
FIG. 2. Pairwise STRF comparisons in caudal lateral mesopallium (CLM). A: pairwise comparison between the performance of surprise-STRFs and derivative-STRFs for all neurons in CLM. B and C: scatterplots of surprise- and classical-STRFs, and of derivative- and classical-STRFs, respectively. All units are in predicted information (see METHODS). One neuron (with a surprise-STRF performance of 84 bits/s, a derivative-STRF performance of 70 bits/s, and a classical-STRF performance of 59 bits/s) was omitted from this figure to keep the x and y axes on an appropriate scale for viewing the other data, but this neuron is included in all other analyses.

 
In more peripheral areas, surprise-STRFs do not improve predictions to the same extent as in CLM. Figure 3A shows the mean prediction scores of classical-, derivative-, and surprise-STRFs for CLM, Field L, and MLd. The error bars show 1SE; however, since the data are non-Gaussian (see Fig. 2) the SE is not a definitive indicator of variability. Surprise-STRFs consistently outperform classical-STRFs and derivative-STRFs in CLM (already shown in Fig. 2, B and A) and in Field L, as can be seen in Fig. 3B, which shows the improvement of using surprise-STRFs over classical- and derivative-STRFs. Numbers above bars in Fig. 3B are the P values using paired two-tailed Wilcoxon signed-rank tests. In CLM, the mean surprise-STRF prediction score is 67% larger (P = 2 x 10–6) than the mean classical-STRF score. The improvement can also be seen when correlation coefficients (r) are used to quantify the goodness of fit. For classical-STRFs the mean r is 0.38, for derivative-STRFs the mean r is 0.39, and for surprise-STRFs the mean r is 0.48. The improvement of the surprise-STRF over the classical-STRF in terms of correlation coefficients is 26% (P = 4 x 10–6). The correlation coefficient shows a smaller improvement because, compared with the information calculation, it is more insensitive to the gains obtained in predicting the faster transients. [In predicting responses to natural stimuli, improvements on the order of 10% are considered noteworthy (e.g., see Prenger et al. 2004Go).]


Figure 3
View larger version (20K):
[in this window]
[in a new window]

 
FIG. 3. Evaluating stimulus representations. A: bar graph of mean performance of different stimulus representations and different areas. Key: "Specs," spectrograms; "Derivative," split temporal derivative of the spectrogram (see Eq. 2); "Surprise," quantified surprise (see Eq. 1). Error bars show 1SE, although the spread of the data was not Gaussian, so these error bars are not sufficient to characterize performance distributions. B: pairwise improvements from using surprise-STRFs instead of classical-STRFs and derivative-STRFs. Error bars show 1SE, although the spread of the data was not Gaussian. P values, measured with a 2-tailed Wilcoxon signed-rank test, are shown above each comparison.

 
The prediction improvement of surprise-STRFs in CLM cannot be explained by the ability of surprise-STRFs to react to intensity derivatives. Although the derivative-STRF performs better than the classical-STRF (by 19%, P = 8 x 10–4; see Figs. 2C and 3A), the surprise-STRF outperforms the derivative-STRF by 40% (P = 5 x 10–7; see Figs. 2A and 3B). Although, as illustrated in Fig. 1, there is room for prediction improvement, our current best model for CLM is that it encodes stimulus surprise, not intensity changes.

Field L, but not MLd, showed some evidence of surprise coding. On average, PSTHs in Field L are described only 24% better by surprise-STRFs than by classical-STRFs (P = 8 x 10–5). The surprise-STRF also outperforms the derivative-STRF in Field L by 14% (P = 2 x 10–7) and the derivative-STRF outperforms the classical-STRF by 9% (P = 0.01). The effect size of the surprise improvement over the derivative-STRF is smaller in Field L than that in CLM (P = 7 x 10–5, Wilcoxon rank-sum test on the differences between surprise-STRF performance and derivative-STRF performance in CLM and Field L) and MLd shows no significant improvement of surprise-STRFs over either derivative-STRFs or classical-STRFs. Field L, physically between MLd and CLM along the ascending auditory pathway, also shows a functionally intermediate degree of tuning for surprise.

The advantages of using the surprise-STRF were not limited to better prediction of the neural response to onsets and offsets. In fact, surprise-STRFs offer the least improvement in MLd (see Fig. 3), the area with the most onset detection (see Fig. 4C). Moreover, surprising syllable features other than onsets, like the spectral change within a syllable (pointed out by the red braces above A, B, D, and E, in Fig. 1), could elicit neural activity that was well captured by a surprise-STRF and not as well captured by the other STRF models (as shown in Fig. 1, H and I). Conversely, nonsurprising acoustic changes often led to predicted spikes in classical- and derivative-STRFs that were not observed in the actual data.


Figure 4
View larger version (73K):
[in this window]
[in a new window]

 
FIG. 4. Comparisons in midbrain nucleus mesencephalicus lateralis dorsalis (MLd), primary auditory forebrain area (Field L), and CLM of ensemble PSTHs to repetitions of a motif. A: spectrogram of the first instance of the motif in a song. The first motif lasts from 818 to 1,529 ms (relative to the song's beginning); the second lasts from 1,724 to 2,438 ms. B: spectrogram of the second instance of the motif in the song. CE: ensemble PSTHs (units are spikes per second per trial per neuron) for MLd, Field L, and CLM, respectively. We claim there is no large difference between PSTHs to the first and second motif repeats, suggesting these areas are not less surprised by the second repetition than by the first.

 
Finding the domain D

In the search for the correct domain (see METHODS), only domains with a relatively short history were examined because the auditory areas studied did not appear to exhibit sensitivity to long-range context (>500 ms) and because computational and data limitations prevented us from obtaining good estimates of longer probabilistic relationships in zebra finch song. Zebra finch song typically has repeated motifs consisting of the same syllables in the same order (Fig. 4, A and B). If neurons encode surprise and are sensitive to long-term context, such as syllable order within a recently heard motif, they should be less surprised by the second repetition of a motif and firing rates in response to the second repetition should be systematically lower. Figure 4, CE shows the mean PSTHs for MLd, Field L, and CLM for two repetitions of the same motif in one stimulus. These mean PSTHs are not substantially different, suggesting that, on average, the neuron's responses are not sensitive to motif-long context.

We therefore focused on investigating the optimal domain on shorter timescales. For computational reasons, we tested domains that were restricted to the immediate past: domain widths of 2 to 6 ms, minimum latencies of 3 to 6 ms, and spectral half-widths of 325 to 750 Hz. These timescales allowed us to capture expectations within a zebra finch syllable but failed to capture expectations that would result from reproducible sequences of syllables in motifs. The spatiotemporal extent of domain D was found through gradient ascent of PSTH predictability by changing the latency range and the frequency width parameters of D. The optimal D had latency from 4 to 7 ms and spectral width of 625 Hz. Our initial guess of D was half the size of the optimal D and yet predicted <1% worse than our final choice, suggesting that the details of how surprise is quantified do not appreciably affect model performance. Correlations in conspecific song on these spectral and temporal scales are so strong that all of the domains we tried contained nearly identical information, and thus our estimate of surprise does not depend on the exact extent of D. Conversely, since all D options we chose performed similarly we do not claim that the shape of D has a biological analogue. Moreover, it is possible that a larger domain that captures relationships in syllable sequence could further improve predictions, although a mathematically more complicated D would be needed to make searching for these dependencies computationally feasible.

No stimulus adaptation found

Aside from characterizing the stimulus–response function of CLM neurons, we asked whether these neurons exhibited plasticity during the course of the recording sessions. If CLM neurons change their expectations on a timescale ranging from minutes to hours, then systematic reductions in firing rates to repetitions of the same stimuli should be observed. We did not find any evidence of habituation in CLM, Field L, or MLd neurons, as is presented in the remainder of this section.

To assess habituation in firing rates, we ran two statistical tests on PSTHs. First, we performed a linear regression between the total number of spikes elicited in a given trial and the trial number (most stimuli were repeated 10 times): the "same neuron, later trial" test, pooling all data from all CLM neurons. The slope of this regression was not significantly different from 0, but showed an insignificant increase in mean firing rate of 0.4 spikes per stimulus per trial (two-sided F-test, P = 0.3). Second, we performed a linear regression between the mean number of spikes a CLM neuron fired per stimulus presentation and the number of times the (~1-h-long) stimulus protocol had been played to the animal being tested: the "same subject, later neuron" test. This regression yielded an insignificant decrease in mean firing rates of 0.2 spikes per stimulus per previous presentation (P = 0.9). Thus the CLM neurons we recorded from did not systematically decrease their firing rates in response to repetitions of initially unfamiliar stimuli. Neither Field L nor MLd showed significant systematic adaptation either (uncorrected P values for Field L are 0.3 and 0.8; and for MLd are 0.8 and 0.1, for the "same neuron, later trial" and the "same subject, later neuron" tests, respectively; the smallest Bonferroni-corrected P value is 0.7). Therefore systematic stimulus specific adaptation is not relevant to understanding MLd, Field L, and CLM under these experimental conditions, although firing rates might be plastic over the course of longer habituation, under different anesthesia, or by pairing a reward with a particular stimulus. It should be noted, however, that during our recording sessions we stopped acquiring data from specific sites if firing rates dropped to zero long before the presentation protocol was finished. These neurons were not analyzed and it is possible that they constitute a separate population of highly adapting neurons. Alternatively, these are neurons that could have been damaged by the electrode or the neural signal was lost because of other experimental artifacts such as brain movement.

Offset cells and auditory complex cells

The use of the surprise-STRF revealed two functional classes of forebrain auditory neurons that could not be well characterized by classical-STRFs: offset-only neurons and auditory complex cells. The former are modeled as firing in response to offsets, whereas the latter are modeled as firing whenever there is any surprising change in intensity (whether louder or quieter) within the latency and frequency range of the neuron. The classical- and surprise-STRFs of an offset-detecting neuron from Field L are shown in Fig. 5A.


Figure 5
View larger version (96K):
[in this window]
[in a new window]

 
FIG. 5. Classical- and surprise-STRFs for 4 neurons. A: the first column shows the classical-STRF of a Field L neuron as estimated by normalized reverse correlation to a spectrogram. The middle and the right columns together show the results of the surprise-STRF model; the firing rate is modeled by the middle filter convolved with surprisingly loud events plus the filter on the right convolved with surprisingly quiet events. BD are analogous to A, but show surprise-STRFs for CLM neurons that have an auditory complex character, in that they fire equally to surprisingly loud and quiet stimulus changes within the same window of latencies and frequencies.

 
From its classical-STRF alone the neuron in Fig. 5A appears to be both an onset detector at high latency (15 ms) and an offset detector at low latency (5 ms). However, using the surprise-STRF, it becomes clear that the best model for this neuron is of one that largely ignores louder-than-expected features (Fig. 5A, surprise-STRF, louder), but is excited by quieter-than-expected features between about 0.5 and 5 kHz with a latency of about 8 ms (Fig. 5A, surprise-STRF, quieter).

Auditory complex neurons, found primarily in CLM, are general change detectors, firing to any unexpected stimulus pattern happening within a window of frequency and latency, regardless of the direction of that change (i.e., louder- or quieter-than-expected features). Figure 5, BD shows three CLM complex neurons with different tuning characteristics. Figure 5B shows a neuron that fires when any surprising change happened between about 5 and 25 ms ago, in the frequency range of about 3 to 5 kHz. More broadband complex neurons are shown in Fig. 5, C and D.

Nine of the 37 CLM neurons we examined seemed to be auditory complex cells. Auditory complex cells and offset cells can be fit to some degree by derivative-STRFs, which is why in CLM and Field L derivative-STRFs outperformed classical-STRFs (see Fig. 3). However, since surprise-STRFs outperform derivative-STRFs by 40% (P = 5 x 10–7), the probability-related aspect of surprise is still crucial for understanding CLM.

Random stimuli and unmet expectations

Next, we checked whether the surprise-STRF model yields similar increases in performance when we predict responses to modulation-limited (ML) noise, a type of random stimulus that is frequency limited in its temporal (0–50 Hz) and spectral (0–2 cycles/kHz) modulations (Hsu et al. 2004bGo). Contrary to white noise, ML noise drives high-level auditory neurons with firing rates similar to those driven by behaviorally relevant complex sounds and with spike patterns that are reliably in phase with acoustical structure in the sound. ML noise has therefore been used to extract basic response parameters including STRFs from cortical auditory neurons (Escabi and Schreiner 2002Go; Klein et al. 2000Go). Figure 6A shows a spectrogram of a sample of ML noise.


Figure 6
View larger version (81K):
[in this window]
[in a new window]

 
FIG. 6. Three representations of modulation-limited (ML) noise. A: spectrogram of a segment of ML noise. B: quantified surprise of this segment of ML noise using a probability model based on ML noise. C: quantified surprise of this segment of ML noise using a probability model based on the same corpus of zebra finch song used to estimate song surprise. Since song has greater spectral and temporal correlations, the offsets quickly following onsets of ML noise chunks are particularly surprising, as are the sharp spectral edges in ML noise. Also, due to a generally higher degree of mismatch between stimulus and expectations, ML noise is more surprising when song is expected than when ML noise is expected, as can be seen comparing the scale of the color bars in B and C.

 
As with song, it is possible to represent ML noise in terms of surprise rather than intensity. We quantified the surprise associated with ML noise in two ways: first, by assuming knowledge of the statistics of ML noise (see Fig. 6B) and, second, by assuming that song was expected (see Fig. 6C). Expectations were controlled by estimating P(S|D) (see Eq. 1) using a corpus of zebra finch song (for the song-expecting surprise as in Fig. 6C) or a corpus of ML noise (for the ML-noise–expecting surprise as in Fig. 6B). As can be seen in the color scale bars in Fig. 6, B and C, when song is expected, ML noise has more surprising features than when ML noise is expected. (The maximum surprise value in Fig. 6C is 7.5, meaning this stimulus feature is only e7.5, or 0.06%, as likely as the most likely stimulus intensity. The maximum surprise value in Fig. 6B is 3.9, meaning this stimulus feature is 2% as likely as the most likely stimulus intensity.) Also note that when P(S|D) is based on song, different features are unexpected. Since the stimulus model P(S|D) expects slow broadband features common in zebra finch song, the rapid offsets occurring immediately after onsets in ML noise are quite surprising, as are the sharp spectral edges in ML noise.

We calculated three types of STRF to ML noise: a classical-STRF, a surprise-STRF expecting ML noise, and a surprise-STRF expecting song. Figure 7 gives all pairwise performance comparisons of these three STRF types in CLM. Figure 7, A and B show that surprise-STRFs based on song expectations in general outperform both classical-STRFs and surprise-STRFs based on expectations of ML noise.


Figure 7
View larger version (10K):
[in this window]
[in a new window]

 
FIG. 7. Pairwise STRF comparisons in CLM with ML noise as the stimulus. A: pairwise comparison between the performance of surprise-STRFs expecting song and expecting ML noise for all neurons in CLM. Key: "Classical Fit," fit using the classical-STRF operating on the spectrogram (see Fig. 6A); "S, MLN Fit," fit using surprise-STRF expecting ML noise (see Fig. 6B); "S, Song Fit," fit using the surprise-STRF expecting song (see Fig. 6C). Most points are above the line, indicating that in CLM surprise-STRFs based on the expectation of song outperform those based on the expectation of ML noise even when ML noise is the stimulus. B and C: scatterplots of the performance of these 2 surprise-STRFs compared with the performance of a classical-STRF. All units are in predicted information (see METHODS). One neuron (with an S, Song Fit of 36 bits/s, an S, MLN Fit of 34 bits/s, and a Classical Fit of 20 bits/s) was omitted from this figure to keep the x and y axes on an appropriate scale for viewing the other data, but this neuron is included in all other analyses.

 
Once again, surprise-STRFs performed differently in the three brain regions examined here. Mean and relative prediction scores are plotted in Fig. 8.


Figure 8
View larger version (20K):
[in this window]
[in a new window]

 
FIG. 8. Evaluating ML noise representations. A: bar graph of mean performance of different stimulus representations and different areas. Key: "Specs," spectrograms; "Surp, MLN," surprise expecting ML noise (see Fig. 6B); "Surp, Song," surprise expecting song (see Fig. 6C). Error bars show 1SE, although the spread of the data was not Gaussian, so these error bars are not sufficient to characterize performance distributions. B: pairwise improvements from using surprise-STRFs expecting song instead of classical-STRFs and surprise-STRFs expecting ML noise. Error bars show 1SE, although the spread of the data was not Gaussian. P values, measured with a 2-tailed Wilcoxon signed-rank test, are shown above each comparison. The MLd comparison between the 2 surprise-STRFs is within 1SE of being zero, but for 88 of the 142 MLd neurons the surprise-STRF expecting ML noise outperforms the surprise-STRF expecting song, and thus the Wilcoxon test is significant.

 
Figure 8A shows mean prediction qualities of the three ML noise STRF types for the three brain areas investigated. The most striking result from ML noise STRFs is the improved performance of the surprise-STRF in CLM when song is expected, as shown in Fig. 7A. Not only does the song-expecting surprise-STRF outperform the classical-STRF (by 62%, P = 3 x 10–5; see Figs. 7B and 8B), but estimating surprise using expectations of song predicts 58% better (P = 6 x 10–5; see Figs. 7A and 8B) than by expecting ML noise. In other words, responses to ML noise are best described if we assume that CLM uses a relatively fixed coding strategy: its neurons spike when they encounter an unexpected acoustical feature given they are expecting song even when the stimulus is not song. This song expectation is not found in Field L or in MLd, as shown in Fig. 8B. However, neurons in MLd tended to be described better by either surprise-STRF than by a classical-STRF.


 DISCUSSION
 
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Using the surprise-STRF model, we were better able to predict the responses of forebrain auditory neurons to both natural and synthetic sounds, especially in the secondary auditory forebrain area, CLM. Two main points underscore the importance of surprise in the zebra finch higher auditory forebrain. First, surprise-STRFs in CLM outperform classical-STRFs by 67% and outperform derivative-STRFs by 40%. The evidence that CLM performs an operation more sophisticated than derivative detection is conclusive: the size of the effect is large (40%) and the difference is highly significant (P = 5 x 10–7; see Fig. 2A). In comparison, in CLM the derivative-STRF outperforms the classical-STRF by only 19% (P = 8 x 10–4). Therefore the explanatory power of surprise-STRFs arises mostly from capturing the higher-order structure of the stimulus correlations and not just from the emphasis on temporal stimulus changes or from an increased number of model parameters. Second, when the stimulus is not song, the best description for CLM behavior is still a surprise-STRF. When processing ML noise, the improvement over other models of CLM activity is clear: the song-based surprise-STRF performs 62% better (P = 3 x 10–5) than the classical-STRF and 58% better (P = 6 x 10–5) than a surprise-STRF expecting ML noise. Although the surprise-STRF is not capable of capturing all of the deterministic behavior of CLM neurons, we propose that it is a better foundation than stimulus-intensity–based models on which to build more sophisticated nonlinear models, including surprise-STRF models with longer memory.

An alternative interpretation to our main result (that coding in CLM is well characterized by surprise-STRFs) is that CLM neurons do not so much encode surprise as they implement some nonlinearities that are similar to our surprise formulation (Eq. 1). Our theoretical methodology is still useful for two reasons. First, surprise led us to a model for CLM neurons with unprecedented predictive power, which may prove to be an important step toward even better models of secondary forebrain neural functions. Second, the concept that higher sensory neurons encode surprise is theoretically rich, providing not just a functional description of CLM but a theoretical grounding for the type of stimulus representation found there.

The fact that surprise has more predictive power in CLM than that in Field L suggests two possibilities. First, Field L might not complete the computation of what is surprising. Some computation is required to silence unsurprising features and, in Field L, auditory information may not have passed through enough synapses to perform the needed surprise calculation. Second, the neural representation of surprising features in Field L might depend on expectations from any sound stimulus and not just conspecific song. This second hypothesis would be verifiable in a study where surprise is estimated with the expectation of a variety of different natural sounds, not just zebra finch song.

When determining the optimal domain for the surprise estimation, we selected a relatively small domain (with latency from 4 to 7 ms and spectral width of 625 Hz) that did not capture the expectations in song found at the level of syllable sequence and repeated motifs. What advantage could there be for the zebra finch auditory code in MLd, Field L, and CLM to effectively ignore the large redundancies of motif repetition, as shown in Fig. 4? One possible explanation is that having the same neural representation for the two motifs makes it easier for a song grammar detector downstream of CLM to notice whether the repetition of motifs was exact. Note the remarkable similarity between the first and second motif repetitions in Fig. 4, A and B. The ability to produce exactly the same motif might be used in female zebra finches as a proxy for general fitness during mate selection. Detecting precise motif repetition may be easier if the neural representation of each motif is identical. Another explanation for the absence of grammar-sensitive coding is that the neural circuitry needed to store uncompressed detailed auditory information may be unwieldy. If one of the listener's tasks is to learn a song efficiently or to notice repeated motifs, it is most efficient to first remove redundancy from the song's acoustic representation so that fewer spikes need to be recalled. The surprise-detecting mechanism found in CLM is ideal to perform this preprocessing step, since it provides an efficient and complete representation of zebra finch song.

Because of the short timescale we use, our use of the term "surprise" can appear to be significantly different from the use of surprise in the cognitive neuroscience literature, in which the domain is longer in time and stimulus semantics are more relevant. However, these differences are differences of scale and level, and surprise might be represented at all levels of forebrain processing. Surprise is used here to mean sensory-model mismatch left over after extensive redundancy reduction (Barlow 2001Go). The type of redundancy reduction at the level of single neuronal responses that we observed in CLM (also referred to as lifetime sparseness) might work in conjunction with redundancy reduction across neurons (population sparseness). In the mammalian auditory system, population redundancy has been shown to decrease in the ascending auditory pathway (Chechik et al. 2006Go).

Another interpretation consistent with our results is that CLM acts less as a specialized auditory encoder than as a mediator of bottom-up attention. In a study estimating surprise in natural movies (Itti and Baldi 2005Go), it was noted that the observer's gaze consistently shifted toward areas in the movie that had more surprise (quantified using a mechanism similar to that of Eq. 1; see METHODS). A correlate to our central hypothesis (that CLM encodes birdsong well by using a surprise-based coding strategy) is that CLM can signal to other neurons when parts of a song have changed enough to warrant a resampling of recent auditory history.

Neural codes that use few spikes to represent surprising features of stimuli are efficient for both metabolic and computational reasons (Olshausen and Field 2004Go). Although estimations of metabolic cost are somewhat controversial given the physiological complexity of the problem, there is a consensus that spiking and synaptic transmission account for most of the energy expenditure of the brain (Attwell and Laughlin 2001Go; Lennie 2003Go). In some calculations, the cost of spiking is estimated to be so high that only a very small fraction of neurons could be active at the same time, necessitating a very efficient representation (Lennie 2003Go). Among sparse representations, encoding surprising features is desirable since the surprise transformation is invertible (see METHODS), meaning the entire stimulus has been encoded.

Independent of metabolic constraints, representing the stimulus in terms of surprise might be important for computational reasons. For example, sparse codes have also been shown to be beneficial in memory systems: it is easier to make new associations between features when their representations are sparse (Schweighofer et al. 2001Go). Moreover, the complicated task of visual object recognition is facilitated by combining prior knowledge of the images while processing current image features (Kersten et al. 2004Go). In a Bayesian framework, prior knowledge and sensory information are combined by multiplying probability distributions. If the neural representation of the stimulus is in a form where spikes correspond to log probabilities, posteriors could be calculated through addition (not multiplication) of spikes, since adding log probabilities is equivalent to multiplying raw probabilities. In most dendrites, addition is a simpler operation than multiplication, so representing the stimulus in terms of its log probability is an ideal preprocessing step for Bayesian object recognition.

In the context of memory storage, it should be noted that CM has been implicated in storing songs used in learning tasks (reviewed in Bolhuis and Gahr 2006Go). Firing rates in CMM (the medial portion of CM) in response to familiar conspecific song motifs used in two-alternative choice tasks and go/no-go tasks are higher than in response to unfamiliar conspecific song motifs (Gentner and Margoliash 2003Go). In the song system, responses to the bird's own song (which is very familiar) are maximal (reviewed in Theunissen et al. 2004Go). Higher firing rates to familiar songs apparently contradict the idea that firing rates are proportional to stimulus surprise. However, it is possible that surprise is used to represent songs efficiently in the naïve state. This representation then gets modified in behavioral tasks that engage motivational systems and learning. Task relevance (not reflected in Eq. 1) should also influence the neural code; task-relevant features should have a stronger representation than irrelevant features, although our surprise model does not yet reflect this principle. Training or vocal learning would enhance relevance of certain stimuli and thus the corresponding spike rates. It is also likely that task relevance explains why random (and thus intrinsically surprising) artificial stimuli like white noise do not drive CLM as well as familiar but relevant stimuli like conspecific song. Furthermore, higher responses to a familiar song might still be to the surprising features of that song, given the statistics of all recently heard songs. It should also be noted that there are anesthetic (urethane vs. awake), species (zebra finch vs. starling), and region (CLM vs. CMM) differences between this study and that of Gentner and Margoliash (2003)Go. Moreover, the surprise-STRF model did not capture all the nonlinear encoding observed in the response. What acoustical features CLM neurons represent in behavioral tasks and how this representation changes with learning remain open questions.

We also note that the surprise formulation is relevant to the stimulus-specific adaptation observed in auditory cortical areas (Ulanovsky et al. 2003Go) as well as in another secondary auditory region, the medial caudal neostriatum (NCM), in the songbird (Chew et al. 1995Go; Phan et al. 2006Go; Stripling et al. 1997Go). In both the mammalian and avian systems, repeated presentations of the same stimulus lead to long-term adaptation and, by consequence, a priming of response to novel (or deviant) stimuli. These responses could be modeled as a surprise, as we have done here, but with an update on what is expected given recent experience. Our current formulation assumes expectations that we labeled as naïve, meaning that they are either innate or acquired through the course of normal development. To model areas such as NCM, recently learned expectations could be added to the model to capture memory effects. More specifically, P(S|D) could be made to be experience dependent and in this way capture the effect that repeated songs become unsurprising relative to novel song. In other words, the stimulus-specific adaptation that is observed in these higher auditory areas could reflect how the stimulus prior is incorporated in the neural circuitry.

Our analysis was also useful for characterizing functional properties at the single-neuron level. Specifically, we found some neurons in CLM that were sensitive to surprising features irrespective of whether they were surprisingly soft or loud. We labeled such neurons auditory complex cells because they have properties that are analogous to those of visual complex cells in V1 (Skottun et al. 1991Go). They are also reminiscent of the onset–offset or phasic neurons observed in the mammalian auditory cortex (Chimoto et al. 2002Go; Recanzone 2000Go; Wang et al. 2005Go). We also found offset-only neurons in both Field L and CLM. Similar response properties have been described in Field L in awake birds when using high-intensity temporally shaped noise stimuli (Nagel and Doupe 2006Go).

In summary, our results suggest that the firing rates of high-level sensory neurons depend more on the probability of natural stimulus features than on intensity or intensity changes. Thus expectations and natural statistics form a key part of the neural code. Moreover, using our approach, we were able to describe the responses of single neurons that were poorly described with the classical-STRF and the derivative-STRF. Our technique of using log probabilities is motivated by information theory, and not by any considerations unique to zebra finch CLM (or to audition in general). If neurons in other forebrain areas also encode stimulus surprise, our methodology will lead to a better understanding of sensory coding not only in terms of being better able to predict spike trains, but also by revealing a new operational principle: spikes in higher sensory areas indicate stimulus surprise, not intensity changes.


 GRANTS
 
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
This work was supported by National Institutes of Health Grants DC-007293, MH-66990, and MH-59189 to F. E. Theunissen.


 ACKNOWLEDGMENTS
 
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
We thank Y.M. for outstanding animal husbandry; B.R., research assistant, for invaluable histological assistance; T. Elliott for thoughtful comments on the manuscript; and the anonymous reviewers for helpful comments and suggestions.

Present addresses: S.M.N. Woolley: Department of Psychology, Columbia University, 406 Schermerhorn Hall, 1190 Amsterdam Ave., New York, NY 10027; T. Fremouw: Department of Psychology, University of Maine, 301 Little Hall, Orono, ME 04469-5742.


 FOOTNOTES
 
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Address for reprint requests and other correspondence: F. Theunissen, Univeristy of California, Berkeley, Department of Psychology, 3210 Tolman Hall, Berkeley, CA 94720-1650 (E-mail: Theunissen{at}berkeley.edu)


 REFERENCES
 
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Aertsen AM, Johannesma PI. The spectro-temporal receptive field. A functional characteristic of auditory neurons. Biol Cybern 42: 133–143, 1981.[CrossRef][Web of Science][Medline]

Atick J. Could information theory provide an ecological theory of sensory processing? Network 3: 213–251, 1992.[Web of Science]

Attias H, Schreiner CE. Coding of naturalistic stimuli by auditory midbrain neurons. In: Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 1998, vol. 10, p. 103–109.

Attwell D, Laughlin SB. An energy budget for signaling in the grey matter of the brain. J Cereb Blood Flow Metab 21: 1133–1145, 2001.[Web of Science][Medline]

Averbeck BB, Romanski LM. Probabilistic encoding of vocalizations in macaque ventral lateral prefrontal cortex. J Neurosci 26: 11023–11033, 2006.[Abstract/Free Full Text]

Barlow H. Redundancy reduction revisited. Network 12: 241–253, 2001.[Web of Science][Medline]

Barlow HB. Possible principles underlying the transformation of sensory messages. In: Sensory Communication, edited by Rosenbluth WA. Cambridge, MA: MIT Press, 1961, p. 217–234.

Bolhuis JJ, Gahr M. Neural mechanisms of birdsong memory. Nat Rev Neurosci 7: 347–357, 2006.[CrossRef][Web of Science][Medline]

Borst A, Theunissen FE. Information theory and neural coding. Nat Neurosci 2: 947–957, 1999.[CrossRef][Web of Science][Medline]

Chechik G, Anderson MJ, Bar-Yosef O, Young ED, Tishby N, Nelken I. Reduction of information redundancy in the ascending auditory pathway. Neuron 51: 359–368, 2006.[CrossRef][Web of Science][Medline]

Chew SJ, Mello C, Nottebohm F, Jarvis E, Vicario DS. Decrements in auditory responses to a repeated conspecific song are long-lasting and require two periods of protein synthesis in the songbird forebrain. Proc Natl Acad Sci USA 92: 3406–3410, 1995.[Abstract/Free Full Text]

Chimoto S, Kitama T, Qin L, Sakayori S, Sato Y. Tonal response patterns of primary auditory cortex neurons in alert cats. Brain Res 934: 34–42, 2002.[CrossRef][Web of Science][Medline]

Cohen YE, Theunissen FE, Russ BE, Gill P. Acoustic features of rhesus vocalizations and their representation in the ventrolateral prefrontal cortex. J Neurophysiol 97: 1470–1484, 2007.[Abstract/Free Full Text]

Dan Y, Atick JJ, Reid RC. Efficient coding of natural scenes in the lateral geniculate nucleus: experimental test of a computational theory. J Neurosci 16: 3351–3362, 1996.[Abstract/Free Full Text]

David SV, Vinje WE, Gallant JL. Natural stimulus statistics alter the receptive field structure of V1 neurons. J Neurosci 24: 6991–7006, 2004.[Abstract/Free Full Text]

DeAngelis GC, Ohzawa I, Freeman RD. Spatiotemporal organization of simple-cell receptive fields in the cat's striate cortex. I. General characteristics and postnatal development. J Neurophysiol 69: 1091–1117, 1993.[Abstract/Free Full Text]

Dong DW, Atick JJ. Statistics of natural time-varying images. Network Comput Neural Syst 6: 345–358, 1995.[CrossRef]

Escabi MA, Miller LM, Read HL, Schreiner CE. Naturalistic auditory contrast improves spectrotemporal coding in the cat inferior colliculus. J Neurosci 23: 11489–11504, 2003.[Abstract/Free Full Text]

Escabi MA, Schreiner CE. Nonlinear spectrotemporal sound analysis by neurons in the auditory midbrain. J Neurosci 22: 4114–4131, 2002.[Abstract/Free Full Text]

Field DJ. Relations between the statistics of natural images and the response properties of cortical cells. J Opt Soc Am A 4: 2379–2394, 1987.[Web of Science][Medline]

Gentner TQ, Margoliash D. Neuronal populations and single cells representing learned auditory objects. Nature 424: 669–674, 2003.[CrossRef][Medline]

Gil D, Gahr M. The honesty of bird song: multiple constraints for multiple traits. Trends Ecol Evol 17: 133–141, 2002.[CrossRef]

Gill P, Zhang J, Woolley SM, Fremouw T, Theunissen FE. Sound representation methods for spectro-temporal receptive field estimation. J Comput Neurosci 21: 5–20, 2006.[CrossRef][Web of Science][Medline]

Goldstein A, Spencer KM, Donchin E. The influence of stimulus deviance and novelty on the P300 and novelty P3. Psychophysiology 39: 781–790, 2002.[CrossRef][Web of Science][Medline]

Grace JA, Amin N, Singh NC, Theunissen FE. Selectivity for conspecific song in the zebra finch auditory forebrain. J Neurophysiol 89: 472–487, 2003.[Abstract/Free Full Text]

Hsu A, Borst A, Theunissen FE. Quantifying variability in neural responses and its application for the validation of model predictions. Network 15: 91–109, 2004a.[Web of Science][Medline]

Hsu A, Woolley SM, Fremouw TE, Theunissen FE. Modulation power and phase spectrum of natural sounds enhance neural encoding performed by single auditory neurons. J Neurosci 24: 9201–9211, 2004b.[Abstract/Free Full Text]

Itti L, Baldi P. A principled approach to detecting surprising events in video. Proc IEEE Conf Comput Vision Pattern Recogn 1: 631–637, 2005.

Kersten D, Mamassian P, Yuille A. Object perception as Bayesian inference. Annu Rev Psychol 55: 271–304, 2004.[CrossRef][Web of Science][Medline]

Kiehl KA, Stevens MC, Laurens KR, Pearlson G, Calhoun VD, Liddle PF. An adaptive reflexive processing model of neurocognitive function: supporting evidence from a large scale (n = 100) fMRI study of an auditory oddball task. Neuroimage 25: 899–915, 2005.[CrossRef][Web of Science][Medline]

Klein DJ, Depireux DA, Simon JZ, Shamma SA. Robust spectro-temporal reverse correlation for the auditory system: optimizing stimulus design. J Comp Neurosci 9: 85–111, 2000.[CrossRef][Web of Science][Medline]

Lennie P. The cost of cortical computation. Curr Biol 13: 493–497, 2003.[CrossRef][Web of Science][Medline]

Machens CK, Wehr MS, Zador AM. Linearity of cortical receptive fields measured with natural sounds. J Neurosci 24: 1089–1100, 2004.[Abstract/Free Full Text]

Nagel KI, Doupe AJ. Temporal processing and adaptation in the songbird auditory forebrain. Neuron 51: 845–859, 2006.[CrossRef][Web of Science][Medline]

Olshausen BA, Field DJ. Sparse coding of sensory inputs. Curr Opin Neurobiol 14: 481–487, 2004.[CrossRef][Web of Science][Medline]

Phan ML, Pytte CL, Vicario DS. Early auditory experience generates long-lasting memories that may subserve vocal learning in songbirds. Proc Natl Acad Sci USA 103: 1088–1093, 2006.[Abstract/Free Full Text]

Prenger R, Wu MC, David SV, Gallant JL. Nonlinear V1 responses to natural scenes revealed by neural network analysis. Neural Networks 17: 663–679, 2004.[CrossRef][Web of Science][Medline]

Recanzone GH. Response profiles of auditory cortical neurons to tones and noise in behaving macaque monkeys. Hear Res 150: 104–118, 2000.[CrossRef][Web of Science][Medline]

Schweighofer N, Doya K, Lay F. Unsupervised learning of granule cell sparse codes enhances cerebellar adaptive control. Neuroscience 103: 35–50, 2001.[CrossRef][Web of Science][Medline]

Sen K, Theunissen FE, Doupe AJ. Feature analysis of natural sounds in the songbird auditory forebrain. J Neurophysiol 86: 1445–1458, 2001.[Abstract/Free Full Text]

Shannon CE, Weaver W. The Mathematical Theory of Communication. Chicago, IL: Univ. of Illinois Press, 1963.

Singh NC, Theunissen FE. Modulation spectra of natural sounds and ethological theories of auditory processing. J Acoust Soc Am 114: 3394–3411, 2003.[CrossRef][Web of Science][Medline]

Skottun BC, DeValois RL, Grosof DH, Movshon JA, Albrecht DG, Bonds AB. Classifying simple and complex cells on the basis of response modulation. Vision Res 31: 1079–1086, 1991.[CrossRef][Web of Science][Medline]

Srinivasan MV, Laughlin SB, Dubs A. Predictive coding: a fresh view of inhibition in the retina. Proc R Soc Lond B Biol Sci 216: 427–459, 1982.[Medline]

Stripling R, Volman S, Clayton D. Response modulation in the zebra finch caudal neostriatum: relationship to nuclear gene regulation. J Neurosci 17: 3883–3893, 1997.[Abstract/Free Full Text]

Theunissen FE, Amin N, Shaevitz SS, Woolley SM, Fremouw T, Hauber ME. Song selectivity in the song system and in the auditory forebrain. Ann NY Acad Sci 1016: 222–245, 2004.[CrossRef][Web of Science][Medline]

Theunissen FE, David SV, Singh NC, Hsu A, Vinje W, Gallant JL. Estimating spatio-temporal receptive fields of auditory and visual neurons from their responses to natural stimuli. Network Comput Neural Syst 12: 1–28, 2001.

Theunissen FE, Sen K, Doupe AJ. Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds. J Neurosci 20: 2315–2331, 2000.[Abstract/Free Full Text]

Ulanovsky N, Las L, Nelken I. Processing of low-probability sounds by cortical neurons. Nat Neurosci 6: 391–398, 2003.[CrossRef][Web of Science][Medline]

van Hateren JH. Theoretical predictions of spatiotemporal receptive fields of fly LMCs, and experimental validation. J Comp Physiol A Sens Neural Behav Physiol 171: 157–170, 1992a.

van Hateren JH. A theory of maximizing sensory information. Biol Cybern 68: 23–29, 1992b.[CrossRef][Web of Science][Medline]

Wang X, Lu T, Snider RK, Liang L. Sustained firing in auditory cortex evoked by preferred stimuli. Nature 435: 341–346, 2005.[CrossRef][Medline]

Woolley SM, Fremouw TE, Hsu A, Theunissen FE. Tuning for spectro-temporal modulations as a mechanism for auditory discrimination of natural sounds. Nat Neurosci 8: 1371–1379, 2005.[CrossRef][Web of Science][Medline]

Woolley SM, Gill PR, Theunissen FE. Stimulus-dependent auditory tuning results in synchronous population coding of vocalizations in the songbird midbrain. J Neurosci 26: 2499–2512, 2006.[Abstract/Free Full Text]

Zeigler HP, Marler P. Behavioral neurobiology of birdsong. Ann NY Acad Sci 1016: 724–735, 2004.[CrossRef][Web of Science][Medline]




This article has been cited by other articles:


Home page
Proc R Soc BHome page
R. Kurtz, M. Egelhaaf, H. G. Meyer, and R. Kern
Adaptation accentuates responses of fly motion-sensitive visual neurons to sudden stimulus changes
Proc R Soc B, October 22, 2009; 276(1673): 3711 - 3719.
[Abstract] [Full Text] [PDF]


Home page
J. Neurophysiol.Home page
T. Q. Gentner
Surprising Twist on Auditory Representation. Focus on: "What's That Sound? Auditory Area CLM Encodes Stimulus Surprise, Not Intensity or Intensity Changes"
J Neurophysiol, June 1, 2008; 99(6): 2755 - 2756.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
99/6/2809    most recent
01270.2007v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (2)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Gill, P.
Right arrow Articles by Theunissen, F. E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gill, P.
Right arrow Articles by Theunissen, F. E.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Visit Other APS Journals Online
Copyright © 2008 by the The American Physiological Society.