Abstract
A central theme of systems neuroscience is to characterize the tuning of neural responses to sensory stimuli or the production of movement. Statistically, we often want to estimate the parameters of the tuning curve, such as preferred direction, as well as the associated degree of uncertainty, characterized by error bars. Here we present a new samplingbased, Bayesian method that allows the estimation of tuningcurve parameters, the estimation of error bars, and hypothesis testing. This method also provides a useful way of visualizing which tuning curves are compatible with the recorded data. We demonstrate the utility of this approach using recordings of orientation and direction tuning in primary visual cortex, direction of motion tuning in primary motor cortex, and simulated data.
INTRODUCTION
A primary goal of many neuroscience experiments is to understand the relationship between the firing properties of a neuron and a single external variable. For sensory neurons, this external variable is typically a property of the stimulus, such as the orientation of a bar or a grating (cf. Hubel and Wiesel 1959, 1962). For motor neurons, this external variable typically refers to a property of the executed movement, for example the direction of hand movement (e.g., Georgopoulos et al. 1986). The relationship between the external variable and the firing rate, the tuning curve (see Fig. 1A), is one of the most frequently used tools to describe the response properties of neurons.
In these experiments, researchers often wish to answer questions such as the following. (1) Is the recorded neuron selective for the stimulus property in question? For example, a number of studies have asked what percentage of neurons in visual cortex are tuned to orientation (e.g., Maldonado et al. 1997; Ohki et al. 2005). 2) If the neuron is selective, how can its selectivity be described quantitatively? For example, many studies describe how well a neuron is tuned to the orientation of a stimulus (Carandini and Ferster 2000; Ringach et al. 2002). 3) How much uncertainty remains in this quantification after all of the evidence from the data has been taken into account? Often studies aim at giving error bounds on estimated neuronal properties. For example, it has long been a debate if neurons have sharper orientation tuning late in response than earlier in the response (Gillespie et al. 2001; Ringach et al. 1997a). Answering such questions demands reliable margins of certainty (Schummers et al. 2007). 4) Given two or more hypotheses concerning the functional or qualitative form of the neuron's selectivity, for which of these hypotheses does the data provide the most evidence (Amirikian and Georgopulos 2000; Swindale 1998)? For example, it has been asked if orientationselective cells in monkey striate cortex are tuned in the shape of a socalled Mexican hat (Ringach et al. 1997a). 5) Do the neuron's selectivity properties change after an experimental manipulation, either qualitatively or quantitatively? For example, several studies have asked whether adaptation, pharmacological intervention, or attention affect the tuning of neural responses to visual stimuli (Dragoi et al. 2001; McAdams and Maunsell 1999; Nelson et al. 1994; Sharma et al. 2003; Somers et al. 2001), while others have posed similar questions in the motor domain (Gandolfo et al. 2000; Li et al. 2001).
In addition to well controlled experiments, answering the questions in the preceding text requires statistical tools. Generally, there are two classes of methods for statistical analysis of tuning curves: parametric methods, which assume an explicit tuning function with several parameters, and nonparametric methods, which allow for essentially complete freedom in the form the tuning curve can take. Here we focus on parametric methods, such as circular Gaussian models for the orientation selectivity of simple cells in primary visual cortex (Hubel and Wiesel 1959, 1962) or cosine tuning models for direction of movement in primary motor cortex. These parametric models are particularly powerful when we know the form of tuning curves from past research—until the form is known nonparametric analysis such as spiketriggered averaging may be more appropriate. From a statistical point of view, we wish to estimate the parameter values given the data, estimate confidence intervals for the parameters, and select the best model from a set of possible models.
There are a number of wellestablished techniques in the statistical literature for solving these problems such as maximum likelihood estimation (MLE) (e.g., Swindale 1998), reversecorrelation (Simoncelli et al. 2004), bootstrapping (Stark and Abeles 2005), model comparison methods, and many others (Kass et al. 2005). These approaches have provided principled answers to scientific questions across a wide range of species, brain areas, and experimental manipulations. In this paper, we introduce a set of methods based on Bayesian statistics, which complement and extend these previous approaches for tuningcurve analysis.
Bayesian models start by formalizing how the variables we are interested in depend on one another and how uncertainty arises. In neuroscience, this implies specifying the assumptions we have about how neural responses are generated from stimuli or movements. We first describe the framework for making these assumptions (a Bayesian hierarchical modeling framework). Second, we present a Markov chain Monte Carlo (MCMC) sampling process for performing the statistical inference procedures necessary for estimation of parameters and their credibility intervals, as well as, hypothesis testing. The central difference between the Bayesian approach presented here and more commonly used methods is that we attempt to describe the entire distribution over tuning curves (the posterior probability distribution) rather than a single tuningcurve estimate (such as the maximum likelihood estimate). Each point in this posterior probability distribution describes how compatible a given tuning curve is with the measured data. Samples from this distribution allow for straightforward estimation of parameters and error bars, as well as, hypothesis testing and model selection.
For many analyses, Bayesian methods yield the same results one would obtain when using more traditional ones. However, there are two cases where Bayesian methods can be particularly helpful: when there is limited data and when there are many parameters. Both cases are typical of problems in neuroscience where recording times are limited and neurons exhibit complicated tuning properties. Moreover, in both cases, estimating the parameter of tuning curves is difficult. Without considering distributions over parameters or using priors most models may overfit the data. In the descriptions that follow, we focus on illustrating the utility of Bayesian methods for limited data and assume noninformative (flat) priors over the parameters. When there are many parameters, informative priors are often useful, but the application of these priors may require specific knowledge about the problem. However, even without using informative priors, many problems may benefit from Bayesian approaches, given the fact that electrophysiological experiments obtain datasets of limited size.
The statistical methods described here have been successfully applied in a range of scientific fields (Gelman et al. 2004; Liu 2002; MacKay 2003; Metropolis et al. 1953; Robert and Casella 2004). In electrophysiology, MCMC sampling has been used for fitting peristimulus time histograms (Dimatteo et al. 2001), fitting nonparametric tuning curves (Kaufman et al. 2005), spike sorting (Nguyen et al. 2003; Wood et al. 2006), and neural decoding. However, these methods, and Bayesian methods in general, have not yet been adopted by neurophsyiologists on a large scale. One major barrier to their adoption has been the difficulty in implementing MCMC sampling routines that are both appropriate and computationally efficient. To alleviate this obstacle, we make available a software toolbox that will allow researchers to easily and routinely apply all of the methods discussed here in the analysis of their own data.
METHODS
The spiking patterns of neurons will depend both on the properties of a stimulus and on the tuning properties of the neuron. In a typical experiment, we observe signals from neurons (e.g., spikes), and we want to ask which tuning properties are exhibited by the recorded neuron. In Bayesian terms, we want to ask which tuning properties are most consistent with the measured data given assumptions about their functional form (e.g., Gaussian or sigmoid). These Bayesian models assume an explicit generative process that defines how the observed firing patterns are (statistically) related to both the stimuli and the underlying tuning properties of the recorded neurons. We then use numerical methods to optimally estimate the values of the tuning properties as well as the uncertainty which remains (i.e., the error bars).
Hierarchical model
The key feature of our analysis method is that it describes a (hypothesized) probabilistic relationship between the parameters of a chosen tuningcurve (TC) function, any external variables (e.g., stimuli), and the neuronal responses collected in an experiment (see Fig. 1, B and C).
Given experimental data, such as spike counts, knowledge of the stimuli presented over the course of the experiment, and an assumption about the functional form of the TC, we want to know how likely each set of tuningcurve parameters is. Example tuningcurve functions, including the circular Gaussian and the sigmoid function, are described in the supplemental materials.^{1}
In statistical terms, this means that we need to estimate the parameters of the tuning function p_{1} through p_{K} (such as tuning width and preferred orientation) from the observed neural responses x and stimuli S (both vectors of length N, the number of experimental trials). We also assume that we may have some prior knowledge [Pr(p_{j}φ_{j})] about the properties of these parameters, as characterized by a hyperparameter φ.
We can use Bayes' rule to estimate the posterior distribution of the TC parameter variables
R_{TC} is the value which results from evaluating the TC function for the particular stimulus value S_{i} (i.e., the stimulus presented on trial i), given parameter values p_{1} through p_{K}. The term on the lefthand side of the equation is the joint posterior distribution of the parameter values, whereas the two terms on the righthand side are the data likelihood and the prior distributions of the parameters.
This model can be adapted to suit a particular experimental context in two main ways. First the TC function should be chosen to reflect as accurately as possible the hypothesized selectivity properties of the neuron in question. For example, for neurons in primary visual cortex, a sigmoid function might be chosen when the stimulus contrast is being manipulated (e.g., Sclar et al. 1989), while a circular Gaussian (CG) function might be chosen when stimulus orientation is varied (e.g., McAdams and Maunsell 1999). Second, an appropriate probability model should be chosen to describe the relationship between the TC model output and the observed data. For example, a Poisson probability model might be appropriate when the data consists of spike counts, whereas a normal (i.e., Gaussian) model might be preferred when the data consist of continuous values such as fluorescence readings from confocal or twophoton microscopy (see the supplemental materials for more details).
MCMC sampling
Because the TC functions commonly used to model neural responses can impose complex nonlinear relationships between the parameters of interest, it is usually impossible to compute the joint posterior in Eq. 1 analytically. Sampling methods such as MCMC offer a powerful and flexible set of tools by which to overcome these limitations (Liu 2002; Metropolis et al. 1953; Robert and Casella 2004). These methods can be used to compute an approximation of the joint posterior parameter distribution.
Sampling methods operate by performing a search through parameter space. In this regard, they are similar to optimization techniques such as leastsquares fitting methods. Unlike standard optimization algorithms, which only move “uphill” from less optimal solutions to better ones, sampling methods move both up and downhill to explore the parameter space. The resulting random walk through parameter space is biased such that the amount of time spent in any region of the space is proportional to the probability density of the parameter values corresponding to that region. As a result, MCMC sampling returns an approximation of the joint probability distribution of the model parameters, given the data, any prior information, and assumptions about the form of the model (see Fig. 2A).
Sampling for each parameter p_{i} proceeds by starting with the current sampled value, p_{i}^{(t)}, and proposing a new value p i. While this proposal can in principle be generated in a number of ways, we will use normally distributed proposals the mean of which is the current sample value, i.e., pi ∼ N(p_{i}^{(t)}, σ_{0}). Next, the probability in Eq. 1 is computed for the two different values of p_{i}, yielding qi = Pr(p_{1}^{(t)}… p_{i}… p_{K}^{(t)}x, S, φ_{1}… φ_{K}) and q_{i}^{(t)} = Pr(p_{1}^{(t)}… p_{K}^{(t)}x, S, φ_{1}… φ_{K}). If qi is greater than q_{i}^{(t)}, then the proposal is accepted, and p_{i}^{(t+1)} ← pi. If qi is less than q_{i}^{(t)} (i.e., the proposal represents a “downhill” move), then it might still be accepted with a probability equal to the ratio of the two posterior probabilities, qiq_{i}^{(t)}; otherwise, it is rejected. If the proposal is rejected, then the previous sample value is maintained, i.e., p_{i}^{(t+1)} ← p_{i}^{(t)}.
It can be shown that sampling according to this procedure will result in parameter samples the values of which approximate the joint posterior distribution of the parameters, given the data (Metropolis et al. 1953). The samples can therefore be used as a proxy for the complete joint parameter distribution, and questions about this true distribution can be answered by referring to the samples. Figure 2B shows the results of sampling using a circular Gaussian tuningcurve function on simulated data, for which ground truth is known.
It should noted that optimization techniques, such as those that use the mean squared error (MSE), also make (implicit) assumptions about the joint probability distribution of the model parameters. In particular, such methods assume that the parameters are normally distributed, an assumption which may not be appropriate in many cases. Figure 2B also shows the fit obtained with such a method.
In the discussion that follows, we assume that we have applied an MCMC procedure like the one just described to obtain M samples for each of the K tuningcurve model parameters: {(p̃_{1}^{(1)}… p̃_{1}^{(M)}}… (p̃_{K}^{(1)}… p̃_{K}^{M})}.
Bayesian model selection
Once we use Bayesian sampling methods to estimate tuning functions, we can straightforwardly use these methods to also ask if one model is better at explaining the data than another model. To compare different models in this way, it is necessary to compare how well each of them can explain the data. In a Bayesian setting, this implies integrating over the specific parameter values (in Eq. 1). Bayesian model selection relies on the ratio of the probabilities of each model holding true, given the experimentally observed data (MacKay 2003)
Here, M_{1} and M_{2} are two models to be compared, and x is the observed data, as before. The first term on the righthand side is the prior preference for one model over the other. In all cases, we assume that the two models being compared are, a priori, equally probably; under this assumption, this term is equal to 1. The second term, called the Bayes factor, is the ratio of the likelihood of the observed data, given each model (i.e., with all model parameters integrated out). This Bayes factor measures how much support the data provides for one model relative to the other. For example, if this quantity is 10, then it would be reasonable to say that the data provides evidence that model 2 is 10 times as likely to explain the data as model 1.
In general, it can be very difficult to actually perform the necessary integration to obtain Pr(xM). For models of the complexity described here, however, very reasonable approximations of this integral can be obtained directly from the MCMC samples. One reasonable approximation of this quantity is simply the harmonic mean (Kass and Raftery 1995) of the sampled probabilities
This quantity is easy to compute from the output of the MCMC sampling routing, and this procedure provides good estimates of how preferable one model is relative to another (especially for lowdimensional models). For higher dimensional models, other methods can be more stable (Chib and Jeliazkov 2001).
Simulations
Figs. 4 and 7 are based on the model of orientation/direction tuned simple cells used throughout the text. We use a circular Gaussian tuning curve (b = 1, a = 4, μ = 90°, σ = 20°) (Fig. 4A and Fig. 7, rows 2 and 3) or a directionselective (asymmetric) circular Gaussian tuning curve (b = 1, a_{1} = 4, a_{2} = 2, μ = 90°, σ = 20°; Figs. 4B and 7, bottom). Figure 7 (top) uses an untuned neuron with b = 1. All models use Poisson noise, and stimulus directions were selected uniformly between 0 and 180° or 0 and 360°. One trial corresponds to a single random sample from a Poisson distribution with the mean given by the tuningcurve model at the randomly selected stimulus orientation. After 10,000 burnin samples, 20,000 samples from the posterior were drawn. To reduce correlations between the samples only 400 samples (every 50th) were used for later estimation and model selection. The likelihood ratios in Fig. 7 were computed using crossvalidated data from the same underlying tuning functions.
Bootstrap and nonlinear optimization
The bootstrap method is a general and powerful approach to statistical inference that relies on the basic operation of creating surrogate data sets by resampling from the observed data (Efron and Tibshirani 1986). Given a set of (presumed independent) observations x = {x_{i}… x_{n}}, a single bootstrap sample x^{(b)} is created by sampling, with replacement, from the original collection of data until the sample contains the same number of data points as the original set. For example, x^{(1)} = {x_{1},x_{1},x_{3}, x_{4},x_{4}} is a valid bootstrap sample when n = 5.
The bootstrap can be used to compute the uncertainty (i.e., error bars) about any summary of the observed data, T(x). The general bootstrap procedure is as follows: 1) construct B bootstrap samples from the original observations, as described in the preceding text. 2) For each sample b, compute T(x^{(b)}); 3) construct confidence intervals from the sample summaries.
Our complete method for assessing the confidence intervals on the tuningcurve parameters combines nonlinear optimization for curve fitting (to compute the parameter estimates for each bootstrap sample), and the bootstrap for assessing the degree of uncertainty about those estimates. 1) Draw B bootstrap samples from the observed data. 2) For each bootstrap sample, use constrained nonlinear optimization with a squarederror objective function to compute the tuningcurve parameter values that best fit the data. 3) For each tuningcurve parameter of interest, use the estimate provided by the optimization routine on each bootstrap sample to construct confidence intervals.
This procedure results in a set of parameter samples that can be compared directly to those samples obtained from the Metropolis procedure.
RESULTS
To estimate tuning curves, several assumptions need to be made. We need to assume a dimension along which firing rates may vary (e.g., direction of movement). We need to assume a functional form of the tuning curve (e.g., cosine). Last, we need to assume how the tuning curve affects neural firing (e.g., assuming Poisson noise). To estimate the parameters of a tuning curve, we then use MCMC sampling techniques in the context of a hierarchical generative model that describes our prior knowledge about tuning curves; details are provided in methods and the supplemental methods section.
To give a more concrete example, a common generative model for data recorded from simple cells in mammalian primary visual cortex is a circular Gaussian tuning curve, characterized by the baseline value, amplitude, preferred orientation, and tuning width. If spike count data are collected, then assuming that measurements vary with Poisson noise might be appropriate (see Fig. 1). Together the tuningcurve function and the noise model describe how likely the observed neural signals are given the parameters of the tuning curve (preferred direction, baseline firing rate, etc). Applying Bayes' rule, this likelihood can then be combined with prior assumptions to calculate the posterior probability distribution over the parameters. We can thus calculate how likely each combination of parameters is given observations from a neuron.
Ideally we would take every possible combination of parameters and calculate how likely the measured spikes are given the assumed combination. However, in many cases this is simply impossible: if we want to consider 100 values for each parameter and have five parameters we would have 10^{10} combinations of parameter values. MCMC is a welldefined way of having many of the advantages of considering all possible values without requiring the associated run times. Our algorithm generates samples from the posterior using the Metropolis method, a particular example of an MCMC technique. This method produces parameter samples by performing a random walk through parameter space, proposing new parameter values as random perturbations of previously accepted values. By accepting and rejecting these samples on the basis of their jointposterior probability, the algorithm effectively builds a picture of this distribution (Fig. 2A). By generating more samples, the distribution can be approximated with arbitrary precision. At the conclusion of the sampling procedure, the collection of parameter samples obtained by the walk can be used to estimate the most probable parameter values, as well as the residual uncertainty—the error bars (Fig. 2B).
In the following sections, we demonstrate how these general methods can be applied by experimentalists to answer a range of scientific questions that frequently arise in the analysis of tuningcurve data. We use data obtained from orientationtuned neurons in the visual system (V1), from directionselective neurons in the motor system (M1) as well as several simulations to provide specific examples. For illustration purposes, these examples use circular Gaussian tuningcurve functions or cosine tuning functions and a Poisson noise model with noninformative priors on all parameters. These forms of tuning curves are frequently used for the study of primary visual cortex and motor cortex, respectively. The assumption that neurons spike according to Poisson statistics with a changing intensity function is the basis of many recent studies of neural systems. The general approach of using hierarchical generative models with MCMC for inference applies to a wide range of tuningcurve and neural models and can easily be extended to higher dimensional problems (cf. Schummers et al. 2007). Additional tuningcurve functions and neural models are discussed in the supplemental materials and the supplied toolbox supports the use of each of these models.
Estimating parameter values and credibility intervals
In many experiments, one of the first challenges confronting an experimentalist is to quantify the tuning properties of each recorded neuron. It is important to compute not only the most probable parameters but also the range of plausible parameters (the error bars or credibility intervals). We can pose this question more precisely by asking, for each parameter of the chosen tuningcurve function, what range of parameter values is consistent with the set of observed neuronal responses? For example, what is the plausible range of preferred orientations for an orientation selective neuron in V1?
Questions of this type can be answered directly from a set of MCMC samples because they approximate the joint posterior distribution over the parameter values. Parameter values that are sampled more often correspond to those that are more likely to account for the data. To obtain, say, a 95% credibility interval for parameter p_{i}, the samples {p̃_{i}^{(1)}… p̃_{i}^{(M)}} are first sorted from lowest to highest to obtain the rank order 〈p̃_{i}^{[1]}… p̃_{i}^{[M]}〉. Next, the lowest and highest 2.5% of the samples are discarded. For example, if 1,000 samples were collected, then the lowest and highest 25 would be discarded. The remaining extreme values define the desired error margin (see Fig. 3, A and B). In the 1,000sample example, these would be the 26th and 975th sample from the ranked list. Importantly, this method does not need to make the assumption that the posterior is Gaussian. Depending on the model, the distribution may even been asymmetric or multimodal. As such, this method can deal with many of the potentially nonGaussian distributions occurring in neuroscience.
In addition to credibility intervals, several other statistical estimates can be calculated from the samples. We can find a maximum a posteriori (MAP) estimate as well as the median or mean values of each parameter (Fig. 2B). When the prior over parameters is noninformative, the MAP estimate is equivalent to a MLE. However, in many cases, the mean or median posterior is a more robust estimate of the parameters. These estimators are optimal in that the posterior mean minimizes the squared error between the estimated parameter and its true value while the posterior median minimizes linear loss (Berger 1985). Like the maximum likelihood estimate, these estimators are asymptotically unbiased and efficient in most cases. The difference between MLE and Bayes estimators is generally apparent only for small sample sizes and disappears as more data are collected. In simulations of orientationtuned neurons, for instance, the MLE and Bayes tuningcurve estimates are the same after ∼40 trials (Fig. 4). For small sample sizes, using Bayes estimators, which take the distribution over parameters into account, can improve estimation.
Visualization of potential tuning curves
The presented approach also allows for simple visualization of the set of potential tuning curves that are compatible with the observed data. This visualization can be a useful tool for understanding the quantitative power that is provided by the data from a single cell (as used in Figs. 3 and 5). Because the posterior distribution over potential tuning curves may not be Gaussian and the parameters may not always be linearly independent, error bars on each parameter provide only part of the picture. For instance, the distribution of potential tuning curves may be skewed or multimodal (Fig. 5, B and E). This is one advantage of representing an entire distribution over the tuning curves described by a certain model.
Assaying quantitative changes in response properties
Quantifying tuning properties using the methods in the preceding text is often merely the first step in answering scientific questions about physiological data. Frequently, we wish to know whether a particular manipulation has resulted in a significant change in tuning properties. For example, it might be asked whether visual neurons change their orientation selectivity properties in response to attentional influences, sensory pattern adaptation, or perceptual learning (Dragoi et al. 2000, 2001; McAdams and Maunsell 1999) or if the preferred directions of neurons in motor cortex change in a force field (Rokni et al. 2007). In these kinds of experiments, neuronal responses would be measured in both the control and test conditions, and MCMC samples would be generated separately for these two sets of data. To determine whether the data supports a conclusion of significant changes in tuning, credibility intervals are computed for each of the two conditions as in the preceding text. To ascertain whether a significant shift has occurred, it is sufficient to observe whether these two intervals overlap. If there is no overlap (i.e., the intervals are disjoint), then the data support the conclusion that the corresponding response property has changed between the two conditions (see Fig. 3, C and D). If the two intervals do overlap, then this conclusion is not supported. This negative result is consistent with two possibilities. First, the underlying response property might really be unaffected by the experimental manipulation, which is a true negative result. Second, the tuning property has indeed changed but not enough data has been collected to reduce uncertainty to the point where this change can be reliably detected. In neither of the cases will this method wrongly report a change.
Assessing selectivity to the stimulus feature using model selection
In many cases, it is of primary scientific interest to determine whether a neuron is actually selective for the stimulus property that is being varied in the experiment. For example, many but not all neurons in striate cortex are selective for particular stimulus orientations (Maldonado et al. 1997), and ∼20% of neurons in primary motor cortex are not cosine tuned (e.g., Georgopoulos et al. 1986). Before drawing conclusions about the quantitative details of each neuron's orientationdependent response, it is necessary to assess whether the cell is selective in the first place.
Bayesian model selection (BMS) can be used to determine how much evidence is provided by the data for one response model over another (Kass and Raftery 1995; MacKay 2003). To determine whether a cell is indeed selective for a stimulus property, BMS can be employed to compare models with two different tuningcurve functions: one in which the response varies with the stimulus value, and another in which the response is assumed to be insensitive to the particular stimulus (see Fig. 5, A–C).
BMS is similar to traditional hypothesis testing methods such as the likelihood ratio test in that we compare the probability of observing the data under each model. However, unlike the likelihood ratio test, BMS uses Bayes factors—the ratio of probability assigned to the data by each model, integrating over all possible parameter values. In general, it is difficult to compute Bayes factors because this integration can be nonlinear and high dimensional. For models with relatively small numbers of parameters, however, such as those used here, approximate Bayes factors can be computed directly from the MCMC samples (see Eq. 2). Furthermore, the quality of these estimates can be improved by increasing the number of samples computed. We have found that excellent results can be obtained for the models discussed here with very reasonable computational resources (<1 min/cell on a desktop machine).
A key property of BMS is that it appropriately penalizes models with more degrees of freedom. This “regularization” ensures that models with extra parameters are not favored merely because they are more expressive, which is a wellknown complication in model comparison procedures (Buckland et al. 1997; MacKay 2003; Pitt and Myung 2002). This penalty is in some ways similar to the Akaike information criterion (AIC) or Bayesian information criterion (BIC). However, unlike these two methods, the Bayes factor applies to any priors over parameters or model types.
Comparing different selectivity models
A number of scientific questions can be posed in terms of BMS. For example, some neurons in primary visual cortex of the cat are direction selective, meaning that their responses to a grating of a particular orientation depend on the direction of motion of the grating. Other cells are not direction selective, and their responses are not modulated by the direction of motion of the grating. BMS can be used to distinguish between these two types of cells. In this case, two different tuningcurve functions would be used, each of which corresponds to one of the two hypotheses. The first is a circular Gaussian function with a periodicity of 180°; this function represents a nondirectionselective response because its values are the same for stimuli of opposite direction. The second TC function is a circular Gaussian with a periodicity of 360°, which represents responses for only one direction of motion (see Fig. 5, D and E). In the case of neurons in motor cortex, one might compare a cosine tuning function and a constant tuning function (Fig. 6). When BMS is applied in this setting, using the methods described above, its output indicates the strength of evidence provided by the data that a particular cell is tuned for direction of motion.
In real data, the “true” tuning curve is generally unknown. However, we can illustrate some of the features of BMS using simulated data. To this end, we performed four simulations where the true tuningcurve and noise model are known (Fig. 7). We simulated spikes from an untuned neuron (Fig. 7, top), an orientationtuned neuron (Fig. 7, 2nd row), a nondirectionselective neuron (Fig. 7, 3rd row), and a directionselective neuron (Fig. 7, bottom). For each of these sets of simulated data, we then generated MCMC samples from two models: untuned and tuned models for the first two cases (Fig. 7A) and nondirectionselective and directionselective models for the second two cases (B). A Bayes factor >1 (or a logratio >0) suggests that model 1 is preferred, while a Bayes factor <1 (logratio <0) suggests that model 2 is preferred. Similar to crossvalidated likelihood ratio tests, the Bayes factor selects the correct underlying tuningcurve model after a relatively small number of trials (Fig. 7, right).
Comparison with the bootstrap
One established alternative to the methods presented here is bootstrapping. Bootstrap methods are a very general set of techniques by which to estimate the residual uncertainty of arbitrary statistical summaries, including credibility intervals for model parameters (Efron and Tibshirani 1997). These techniques have reached fairly wide use in neuroscience and solve some of the same problems that the techniques presented here address, particularly the estimation of errorbars (Brown et al. 2004; Kass et al. 2005; Paninski et al. 2003; Ringach et al. 2003; Schwartz et al. 2006; Stark and Abeles 2005; Ventura et al. 2005). Briefly, the bootstrap involves generating a series of surrogate data sets (“bootstrap samples”) by sampling with replacement from the observed data; estimates of the uncertainty—error bars—are then computed by computing separate estimates from each of these samples. As such, bootstrap methods are a natural point of comparison for the hierarchical modeling and sampling methods described here. We have performed a series of comparisons between the two methods, using simulated orientation data for which the “true” tuning curve was known. A complete description of the bootstrap procedure used for these comparisons is provided in methods.
Figure 8 shows the results of several comparisons between the two methods. While the bootstrap performs similarly to posterior sampling when a large amount of data are available (Fig. 8A), the advantage of sampling becomes clear when less data are available. In this situation, the fully Bayesian approach is better able to recover the underlying tuning curve. This effect has been noted previously (Kass et al. 2005) and can generally by avoided by collecting more data, but it points to a more fundamental difference in the way these methods test tuningcurve models.
In addition to robustness to small sample sizes, the Bayesian method has several advantages over the bootstrap. Bootstrapping generally requires a nonlinear optimization, which can be subject to local minima in the error function. Manual intervention or more complicated methods, such as double bootstrap, may be necessary to correct for biases and dependencies in bootstrap estimates (Davison and Hinkley 1997). The Bayesian approach, on the other hand, does not include any optimization. Assuming appropriate mixing, the MCMC samples approximate the full posterior distribution over the parameters. Moreover, these techniques can easily be adapted to a wide range of tuningcurve functions, noise models, and priors.
DISCUSSION
Here we have shown how methods based on MCMC can help answer frequently occurring questions in the realm of electrophysiology. We have shown that the same method can be used to estimate properties of neural tuning for several kinds of neurons, obtain bounds on their values, examine if they change from one condition to the next and ask which model best explains the data. With the provided toolbox these methods are easy to implement and have the potential to significantly improve practical data analysis.
Comparison with other methods
The primary difference between the Bayesian sampling approach and more traditional approaches such as reversecorrelation, maximum likelihood estimation, and bootstrapping is that we attempt to consider all the tuning curves that would be compatible with the data. This analysis can reproduce the results from MLE by searching for the parameter values that are associated with the highest sample density and a noninformative prior. It obtains the same results as bootstrapping when there is enough data. However, for small sample sizes, Bayes estimators, such as the posterior median, and Bayesian credibility intervals are often more robust than these alternatives.
In hypothesis testing, Bayesian sampling methods converge to the results from likelihood ratio tests as well as AIC/BIC, under certain assumptions. For nonlinear or hierarchical models, the Bayes factor offers a more principled approach to model comparison. As experimental methods advance and stimulusresponse models become more sophisticated, accurate model selection will become more and more important. By looking at the distribution over potential tuning curves and not just the most likely tuning curve, physiologists may be able to gain insight into scientific questions as well as the statistical properties of tuningcurve models.
Another key feature of the sampling approach is its flexibility. Most analyses are based on an implicit assumption of Gaussian noise (Amirikian and Georgopulos 2000; Swindale 1998), an assumption that is inappropriate for some common forms of neuronal data (e.g., spike counts). The hierarchical model described here can incorporate a range of probability models, such as the Poisson and negative binomial distributions in addition to Gaussian. This modeling flexibility also means that the framework described here can be directly applied to data other than spike counts, including fluorescence measurements from confocal and twophoton microscopy. There are a wide range of easily applicable tuning functions, including sigmoids, bilinear, von Mises functions, rectified cosines, and so on (see the supplemental methods for a full description of the tuning functions and likelihood models).
Although the methods we present here are flexible in the choice of tuning functions and noise models, they are generally much more constrained than nonparametric methods. In addition to MCMC for nonparametric tuningcurve estimation (Dimatteo et al. 2001; Kaufman et al. 2005), a large body of work exists on reversecorrelation approaches. These approaches aim to directly map stimulus properties to neural responses (Dayan and Abbott 2001; Eggermont et al. 1983; Marmarelis and Marmarelis 1978; Ringach et al. 1997b; Simoncelli et al. 2004). A wide range of toolboxes support the analysis of neural data using such nonsampling based methods (for useful software packages, see: http://chronux.org/ http://pillowlab.cps.utexas.edu/code.html, http://find.bccn.unifreiburg.de/). More recently, techniques have been developed that constrain reversecorrelation estimates using specific noise models and priors (Paninski 2004; Sahani and Linden 2003; Smith and Lewicki 2006). In many ways, these two strategies, nonparametric estimation of response functions with priors and fully Bayesian parametric estimation of response functions, represent a continuum of modeling frameworks that extends from minimally constrained modeling to very constrained modeling. This continuum allows experimentalists to decide how to model their data depending how much is known about the neural system or experimental manipulation in question. For instance, in a system where the exact parametric form of a tuning curve is unknown, experimentalists may want to use nonparametric or semiparametric models rather than assuming a poorly fitting fully parameterized tuning function.
Inference assumptions and prior knowledge
All statistical inference methods, including nonparametric methods and bootstrapping, require assumptions. For example, to compute bootstrap estimates that were comparable to the results of the Bayesian methods, it was necessary to perform nonlinear leastsquares optimization to estimate the parameter values for each bootstrap sample (Swindale 1998). All optimization methods make use of an error function; the squarederror function that is used by this method is equivalent to an assumption of Gaussian noise in the data. In short, there is no way to estimate parameter values, or their residual uncertainty, or to compare different models with respect to their ability to explain the data without making assumptions. Bayesian methods generally make all such assumptions explicit.
One specific kind of inferential assumption that bears special attention is the prior distribution placed on the tuningcurve parameters. In most cases (including all of those used in the examples), a noninformative prior such as a uniform distribution over an appropriate range is sufficient. In this case, the posterior is equivalent to the likelihood, and one can easily find the maximum likelihood sample or compute a likelihood ratio between two models. However, in some cases, such as when there is very limited data or tuning functions are complicated, it may be necessary to include a more informative prior on one or more of the parameters. In these cases, it is incumbent on the experimenter to justify the use of a prior, such as by appealing to past measurements, literature, or to biophysical properties of the neurons under study. While such prior information may be controversial, we note that Bayesian methods provide the opportunity—though not the requirement—to introduce such information in a coherent and statistically principled way.
Extensions to more complex models
The general approach of using MCMC sampling to draw from the posterior distribution of model parameters can be applied to models that are more complex than those described here. These methods have been used previously to infer the parameter distribution of a model that described the orientation tuning dynamics of neurons in cat primary visual cortex in response to oriented reversecorrelation stimuli (Schummers et al. 2007). Nonparametric approaches to tuningcurve fitting using MCMC and splines have also been developed (Kaufman et al. 2005). These analyses use the same principles as those described here, illustrating that these techniques are applicable to a wide range of neurophysiological data analysis problems. We note, however, that more complex models (i.e., models containing more parameters than those described here) do require more care in the selection of inference methods. In particular, special attention must be paid to the proposal distribution that is used to generate candidate parameter samples; additional detail can be found in Schummers et al. (2007). This study demonstrates that, once an appropriate proposal distribution has been identified, sampling methods scale quite well, and can be successfully applied to models with ≤100 parameters or more.
Many if not most statistical estimation techniques are currently viewed as approximations to optimal Bayesian inference. The approach we used here has the benefit that even for small amounts of data, in the limit of sufficient computer power (large number of samples), it will provably converge to the correct answers. These tools should thus be useful for the general analysis of tuning curves, which is a central statistical objective in neuroscience.
GRANTS
B. Cronin and M. Sur were supported by National Eye Institute Grant EY07023.
ACKNOWLEDGMENTS
The authors thank C. Runyan, J. Schummers, and R. Mao for supplying the data used in the V1 examples and A. Cherian and L. Miller for the data used in the M1 examples.
Footnotes

↵1 The online version of this article contains supplemental data.
 Copyright © 2010 The American Physiological Society