## Abstract

On the basis of accumulating behavioral and neural evidences, it has recently been proposed that the brain neural circuits of humans and animals are equipped with several specific properties, which ensure that perceptual decision making implemented by the circuits can be nearly optimal in terms of Bayesian inference. Here, I introduce the basic ideas of such a proposal and discuss its implications from the standpoint of biophysical modeling developed in the framework of dynamical systems.

There has been an increasing number of psychological evidences indicating that humans and animals can often make decision behaviors that are nearly optimal in terms of Bayesian inference. Given that animals must have been somehow optimized through the Darwinian evolutionary processes, it would be reasonable to infer that our brain is equipped with an ability to execute computations necessary for making optimal decisions. The emerging question is how such computations as Bayesian inference can be implemented by biological substrates in the brain neural circuits. In fact, this is one of the central issues faced in modern computational and systems neuroscience (Doya et al. 2007). Herein, I introduce a recent computational modeling study (Beck et al. 2008) regarding neural circuits for perceptual decision making (Churchland et al. 2008; Roitman and Shadlen 2002), along with the underlying Bayesian framework (Ma et al. 2006), and discuss its implications compared with another major approach to study neural mechanisms of decision making–that is, the biophysical modeling of neural circuits developed in the framework of dynamical systems (Furman and Wang 2008; Wang 2002, 2008).

The model by Beck et al. (2008) is based on the concepts of the probabilistic population code and the neural implementation of Bayesian inference (Deneve et al. 1999; Ma et al. 2006; Pouget et al. 2003) and thus I first briefly introduce these concepts. Let us consider a neuron having a tuning curve *f*(*s*) for a certain stimulus parameter *s*. For example, consider that the neuron has a Gaussian tuning for the direction of the motion of visual stimuli and, as such, the stimulus parameter *s* takes the value of −180° ≤ *s* < 180° (Fig. 1 *A*, *top*). Now, assume that in a single presentation of a stimulus with *s*_{0} (e.g., visual motion toward *s*_{0} = 45°) for a certain fixed time period (e.g., 100 ms), the neuron generates *r* spikes (*r* is called the “spike count”). The question here concerns whether and how it is possible for those who observe this spike count (i.e., an experimenter recording this neuron or another neuron that receives inputs from this neuron) to infer (estimate) the parameter value (*s*) of the presented stimulus (i.e., in our example, *s*_{0} = 45°). Mathematically, the probability of the “cause” *s* given the “result” *r*, called the *posterior probability*, can be calculated according to Bayes' theorem (1) where *P*(*s*) is the probability of the occurrence of *s* regardless of the value of *r* (i.e., before the observation of *r*)—known as the *prior probability*—and the proportionality (∝) holds with respect to different values of *s* because the denominator is the same for any *s*. This posterior probability *P*(*s*|*r*) represents the probability that the true (actually occurred) value of the stimulus parameter was *s*, over different values of *s*, given the observation of the spike count *r*; obtaining this *P*(*s*|*r*) is referred to as *Bayesian inference* or *Bayesian estimation*. If the prior probability distribution *P*(*s*) is a uniform distribution (called a *flat prior*), meaning that there is no prior information about the value of *s* before the occurrence (observation) of the neuronal firings, *Eq. 1* becomes (2) where the proportionality again holds over different values of *s* and the right-hand side *P*(*r*|*s*) is called the *likelihood* [or the *likelihood function* as a function of *s*; intuitively, *P*(*r*|*s*) represents how likely a result *r* occurs given a cause *s*]. As seen in *Eq. 2*, when the prior is flat, the posterior probability is proportional to the likelihood (Fig. 1*A* illustrates this flat prior case). How does Bayesian inference actually operate? Although it would be straightforward for an experimenter who is recording this neuron to calculate *P*(*s*|*r*) or *P*(*r*|*s*), whether and how these probabilities can actually be represented by neurons in the brain and be read out by other neurons so as to drive or bias an animal's decision behavior is an entirely separate issue. This issue has been extensively studied (Deneve et al. 1999; Ma et al. 2006; Pouget et al. 2003), but for our purposes, let us simply assume that on the basis of these studies, these probabilities can be represented by and processed through the activity of neural populations.

Let us consider the situation where there are two neurons—neuron A and neuron B—with the tuning curves *f _{A}*(

*s*) and

*f*(

_{B}*s*) for the stimulus with parameter

*s*, respectively, and that they generate spikes independently of each other. Then, the posterior probability can be calculated according to Bayes' theorem, in a similar manner to that discussed earlier (3)

Let us assume a flat prior [i.e., assume that *P*(*s*) is a uniform distribution]. Then, *Eq. 3* becomes (4) Here, given the assumed independence of the firing of neurons A and B, the likelihood (the right-hand side of *Eq. 4*) can be rewritten as (5) and therefore *Eq. 4* becomes (6) Now let us assume that both neurons A and B generate spikes according to Poisson statistics; specifically, the spike counts of these neurons obey independent Poisson distributions (Fig. 1*A*, *top*, orange histograms) (7) and (8) The biological plausibility of this assumption lies in experimental findings that the spike count of cortical neurons recorded in vivo could be approximated by a Poisson distribution (see the references in Ma et al. 2006); in fact, this article demonstrates how a wider class of distributions termed Poisson-like distributions are dealt with in a similar way). The Poisson distribution has a prominent characteristic that the mean denoted by *E*[·] is equal to the variance *V*[·]: this is the square of the SD σ[·] (9) in *Eqs. 7* and *8*. Because of this characteristic, there appears to be an interesting relationship that is theoretically useful and might actually be used in the brain (Ma et al. 2006; Seung and Sompolinsky 1993). To explain this relationship, let us assume that the tuning curves of neurons A and B have the same shape [*f*(*s*)] but different gains (*g _{A}* and

*g*, assuming that

_{B}*g*<

_{A}*g*; the

_{B}*top*and

*bottom*portions of Fig. 1

*A*illustrate the cases with

*g*= 1 and

_{A}*g*= 2, respectively) (10) Then, because the SD is equal to the square root of the mean in the Poisson distribution (as is clear from

_{B}*Eq. 9*), the degree of variability in the spike count with respect to the mean (i.e., the SD divided by the mean) decreases along with the increase in the mean; thus when a stimulus

*s*is presented, the spike count of neuron B is less variable with respect to the mean than the spike count of neuron A (11) In other words, the relative variability of the spike count of neurons sharing the same tuning profile (tuning curve shape) decreases along with the increase in the gain of the tuning curve, in proportion to its square root. Therefore it holds, in turn, that given the observed spike count of neuron B, the observer (experimenter or other neuron) can estimate stimulus

*s*more accurately than when a spike count of neuron A is observed, by a factor of (i.e., the square root of the gain of the tuning curve represents the accuracy of the stimulus estimation); this is reflected in that the posterior probability is sharper in the case of neuron B (Fig. 1

*A*,

*bottom*, blue line) than that in the case of neuron A (Fig. 1

*A*,

*top*, blue line) (see also Fig. 1 of Ma et al. 2006). As mentioned earlier, it has been suggested that this relationship is used in the brain, specifically for Bayesian optimal cue integration by a population of neurons (Ma et al. 2006).

Now let us revert to the problem of obtaining the posterior probability of the stimulus parameter from the spike counts of neurons A and B (*Eq. 6*). Substituting *Eqs. 7*, *8*, and *10* into *Eq. 6* results in (12) Notably, the multiplication (*f*(*s*)^{r1} *f*(*s*)^{r2}), which originated from the assumed independence of the firings of the two neurons (*Eq. 5*), has been rewritten using the summation of exponents (*f*(*s*)^{r1+r2}), by virtue of the particular formula of the assumed Poisson distribution (*Eqs. 7* and *8*) and, consequently, the last formula (rightmost side of *Eq. 12*) looks somewhat similar to the defining equation of the Poisson distribution (*Eqs. 7* and *8*) with *r*_{1} or *r*_{2} substituted by *r*_{1} + *r*_{2}. To obtain a clearer idea of what this similarity can mean, if we denote *h*(*s*) ≡ (*g _{A}* +

*g*)

_{B}*f*(

*s*),

*Eq. 12*becomes (13) Here, the rightmost term of

*Eq. 13*takes exactly the same form as the definition of the Poisson distribution (

*Eqs. 7*and

*8*). Thus if there is another neuron that has the tuning curve

*h*(

*s*) ≡ (

*g*+

_{A}*g*)

_{B}*f*(

*s*) and generates spikes again according to Poisson statistics, then the last formula of

*Eq. 13*can be interpreted to represent a probability that such a neuron (for convenience, let us refer to this as neuron V) generates

*r*

_{1}+

*r*

_{2}spikes on the presentation of a stimulus with the parameter value

*s*, i.e. (14) where

*r*represents the spike count of neuron V. The problem can then be stated as such: Can such a neuron as neuron V exist—i.e., can its defining response property described earlier be shaped, in relation to neurons A and B and, if so, in what manner? Further, what does the relationship in

_{V}*Eq. 14*mean?

Now let us assume that there exists a neuron (neuron T) that receives inputs from both neurons A and B and, in response, generates spikes in a specific manner such that neuron T always generates *r*_{1} + *r*_{2} spikes during a time period in which neurons A and B generate *r*_{1} and *r*_{2} spikes, respectively (for arbitrary *r*_{1} and *r*_{2}). This could be expressed as neuron T's “linearly integrating” the inputs from neurons A and B. Then, notably, since neurons A and B are assumed to generate spikes according to Poisson statistics (*Eqs. 7* and *8*), the generation of spikes of neuron T will also automatically obey Poisson statistics because of a prominent characteristic of Poisson distribution, in that the summation of independent Poisson variables is also a Poisson variable. Moreover, since both neurons A and B are assumed to have the tuning curves *g _{A}f*(

*s*) and

*g*(

_{B}f*s*) (

*Eq. 10*), neuron T, which “linearly integrates” neurons A and B, should have the tuning curve

*g*(

_{A}f*s*) +

*g*(

_{B}f*s*) =

*h*(

*s*). Therefore this neuron T exactly satisfies the requirements for neuron V considered earlier and thus from

*Eq. 14*(and

*Eq. 6*), we can obtain the following relationship (15) where

*r*denotes the spike count of neuron T. This equation indicates that Bayesian estimation is now possible simply from the spike count of a single neuron (neuron T) only, instead of from the spike count of two neurons (neuron A and neuron B). Thus in summary, sensory evidences encoded by two neurons, which have the same tuning profile and independently generate spikes according to Poisson statistics, can be combined optimally, in terms of Bayesian inference with a flat prior, into a single neuron that linearly integrates the spike count of these two neurons. What is this good for? To address this question, consider, as an example, that the Poissonian neurons A and B encode the location of the same object through different sensory modalities such as visual and auditory pathways, respectively; further, the “multimodal” neuron T receives and linearly integrates inputs from these two sensory neurons. Then,

_{T}*Eq. 15*indicates that reading out the activity of neuron T only will ensure the same performance in estimating or decoding the object location as that obtained by reading out the activities of both neurons A and B. Thus Ma et al. (2006) suggested that this kind of organization, if existing in the brain, can realize optimal sensory evidence accumulation (although we have so far made several assumptions, Ma and colleagues demonstrated that the assumptions can be relaxed at least to a certain extent; for example, aside from the Poisson distribution, a wider class of distributions, named the Poisson-like distributions, could work similarly, as mentioned earlier). Thus in turn, it is possible that the brain has actually acquired such an organization through evolution. On the basis of this hypothesis, Beck et al. (2008) proposed a model of perceptual discrimination of the motion direction of the random-dot stimuli (Churchland et al. 2008; Roitman and Shadlen 2002; Shadlen and Newsome 2001). The central idea is that instead of assuming the linear integration of the activity of different neurons with the same tuning profile as considered earlier (Fig. 1

*B*,

*left*), Beck et al. (2008) assumed the (nearly) linear integration of the same neuron's activity at different epochs (Fig. 1

*B*,

*right*). Specifically, they considered that whereas a neuron selective for a particular motion direction in the middle temporal (MT) area generates

*r*

_{1}and

*r*

_{2}spikes during the first and second time intervals, respectively, another neuron in the lateral intraparietal (LIP) area with a similar direction selectivity nearly linearly integrates these MT spike counts, via a presumed long time constant, so as to generate approximately

*r*

_{1}and

*r*

_{1}+

*r*

_{2}spikes in the first and second time intervals, respectively. They have shown that the behavior of the model matched experimental observations (Churchland et al. 2008; Roitman and Shadlen 2002), suggesting that such an organization actually exists in the brain (Beck et al. 2008).

The model by Beck et al. (2008) can be said to be at an intermediate level between the bottom-up biophysical approach, as represented by the model of Furman and Wang (2008) for the same decision-making processes appearing in the same journal issue, and the top-down computational theory, as represented by the diffusion models (Palmer et al. 2005; Ratcliff and McKoon 2008). Specifically, although Beck and colleagues have set several a priori assumptions that are ideal for Bayesian optimal evidence accumulation—in particular, the near Poisson-like statistics of the neuronal firings and the long decay time constant of the LIP neurons—they have implemented these assumptions into a circuit model together with known anatomical relationships suggested from electrophysiological studies. In doing so, their study has illuminated issues that need to be addressed in future works, to further link these two levels. In particular, from the standpoint of biophysical modeling, it is interesting to examine whether, how, or to what extent the specific assumptions made by Beck et al. can be satisfied by combining known physiological features of neurons and synapses. It is conceivable that detailed biophysical modeling predicts that these assumptions cannot exactly be satisfied or, more specifically, that they can be accompanied by some particular types of deviations and, if this is the case, it would then be intriguing to test how such deviations degrade the optimality in terms of Bayesian inference and whether it matches actual behaviors. In what follows, I specifically discuss the two important assumptions in the Beck et al. (2008) model, i.e., nearly Poisson-like spike statistics and the long decay time constant of the LIP neurons.

Related to the Poisson-like firings, a considerable number of modeling studies as well as in vitro experimental works have been trying to reproduce or explain the observed statistics of in vivo neuronal firings that can often be approximated by Poisson processes. Several important indications have been obtained, in particular, the potential importance of correlated or synchronized presynaptic firings (Harsch and Robinson 2000; Stevens and Zador 1998), balanced excitatory and inhibitory inputs (Amit and Brunel 1997; Shu et al. 2003; van Vreeswijk and Sompolinsky 1996), and postsynaptic *N*-methyl-d-aspartate (NMDA) receptor conductance (Harsch and Robinson 2000) on the high variability of spike timings. It remains still elusive, however, as to exactly how these distinct mechanisms contribute to spike variability as well as how precisely the spike variability resembles Poisson-like distributions. In the experiment using the random-dot direction-of-motion discrimination task, it was shown that the variance of the MT neuronal firings was not significantly influenced by the trial-to-trial fluctuations in the visual stimuli actually presented in the display and was thus likely to have a neural origin (Britten et al. 1993). This may suggest that, although it appears stochastic, more than a small part of variability might actually result from the chaotic dynamics of the neuronal membrane (Aihara et al. 1984) and/or the recurrent neural activity (van Vreeswijk and Sompolinsky 1996). If this were the case, then it might be necessary to explicitly model such a mechanism at the level of neurons and synapses to confirm the validity of assuming Poisson-like spike statistics independent of the dynamics of the neurons constituting the circuit. In any case, from the viewpoint of the computational theory, it would be intriguing, as pointed out by Beck and colleagues (2008) themselves, to quantitatively examine how the behavior deviates from optimal Bayesian inference when the spike statistics varies from the Poisson-like.

Regarding the decay time constant of the LIP neurons, from the computational viewpoint, its length is critical, because this determines how precisely the linear integration of sensory evidence at different time epochs—a factor that plays a key role in the hypothesis of Bayesian optimal cue integration (Ma et al. 2006) as we have seen above (Fig. 1*B*)—can be implemented; infinite time constant theoretically realizes perfectly linear integration and thus the 1-s time constant assumed by Beck et al. (2008) implements nearly, but not exactly, linear integration. It might be interesting to systematically vary the time constant in the model and determine which value gives the best fit to actual animal behavior. From the perspective of biophysical modeling, on the other hand, it is of great interest to find out how such a long time constant can be implemented. In fact, the 1-s decay time constant of the LIP neurons assumed by Beck et al. is much longer than the time constant of most neuronal and synaptic events, i.e., the membrane time constant of neurons (up to ∼20 ms) as well as the decay time constant of fast glutamatergic (α-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid (AMPA) receptor channel-mediated) and GABAergic (γ-aminobutyric acid type A [GABA_{A}] receptor channel-mediated) synaptic conductances (several ms). A mechanism that seems likely to be able to implement such a long time constant is a combination of the NMDA receptor-mediated synaptic current, whose time constant is around 100 ms, and the recurrent excitatory connections between the LIP neurons (Wang 2002). Approximating Bayesian optimal evidence integration by using this mechanism, however, would not be very straightforward for two reasons. First, recurrent excitation between neurons having different selectivity (even though they are not very distant) can potentially degrade optimality. Although the model by Beck et al. (2008) already includes recurrent excitation between the LIP neurons, a substantial amount of additional recurrent excitation would be required to explicitly model the long decay time constant. Second, approximating linear integration itself is not easy. In fact, it has been suggested (Brody et al. 2003; Goldman et al. 2003; Koulakov et al. 2002; Seung 1996) that to approximate linear integration through recurrent excitation, its strength should be finely tuned so that it precisely balances with the activity decay due to the membrane leakage (in simple terms, in the equation “d*x*/d*t* = −*x* + *ax* + *I _{external}*,” “

*a*” should be finely tuned to around 1 so that “

*x*” neither exponentially grows nor decays but temporally integrates

*I*, i.e.,

_{external}*x*= ∫

*I*d

_{external}*t*). The problem is whether and how such a fine tuning can be achieved in the brain. Several possible mechanisms, including the bistability of individual neuronal activity (Goldman et al. 2003; Koulakov et al. 2002) and the homeostatic plasticity (not for the linear integration but for another mathematically related problem) (Renart et al. 2003), have been proposed, but they still need to be experimentally tested. Alternatively, it is also possible that the long time constant is realized by mechanisms other than finely tuned recurrent excitation. Recently, it was proposed (Goldman 2009) that an apparently recurrent neuronal network can nevertheless effectively operate as a feedforward network, in which transient activation propagates through an embedded feedforward cascade of activity patterns for a duration proportional to the network size (this can be long if the network is large) so that the total activity can approximate the linear summation of inputs applied to the network, provided the connections between neurons are appropriately set. This mechanism was shown (Goldman 2009) to be robust against mistuning of the connection strengths, compared with the mechanism by recurrent excitation, and also to be able to well reproduce time-varying sustained activity, which has been observed in many experiments, although whether and how the specific connectivity pattern required for the network to functionally operate as a feedforward network can be formed via existing plasticity mechanisms remain for future studies. Biophysical modeling studies (Furman and Wang 2008; Lo and Wang 2006; Wang 2002, 2008; Wong and Wang 2006) have been suggesting that recurrent excitation is crucial for the maintenance of neural activity corresponding to a categorical decision after the disappearance of the sensory inputs (Shadlen and Newsome 2001). Recurrent excitation also enables (Wang 2008; Wong et al. 2007) these models to adequately explain the longer reaction times for error trials than for correct trials (Roitman and Shadlen 2002) as well as the violation of time-shift invariance (TSI) such that later pulses (brief sensory stimulations) have weaker effects on the behavior (Huk and Shadlen 2005). It remains to be seen whether these features can also be explained to the same extent by other biophysical mechanisms that do not rely on recurrent excitation-mediated attractor dynamics. Future studies are expected to address how the recurrent activity dynamics of the local circuit is influenced by inputs from other circuits (Edin et al. 2009; Furman and Wang 2008; Ganguli et al. 2008; Wang 2008), as well as by other factors such as neuromodulation or short-term synaptic plasticity.

As introduced earlier, the study of Beck et al. (2008) has suggested that neurons and the neural circuit are equipped with several specific properties that are (nearly) ideal for Bayesian optimal decision making. In a situation where animals have to repeat similar types of decision making many times, optimality can be defined in terms of maximization of the overall outcome (reward), which would depend on the relative value of speed and accuracy under a given condition. A trade-off between speed and accuracy has been studied in many psychological experiments and it has been shown that changes in one of the two main parameters of the diffusion model (Ratcliff and McKoon 2008), that is, changes in the bound (threshold) can well reproduce experimentally observed changes in the reaction time and the error rate (Palmer et al. 2005). In the framework of biophysical modeling, Lo and Wang (2006) have shown that changes in the level of activation of the striatum neurons by the cortical (LIP) neurons cause changes in the strength of the inhibitory gating of the downstream motor circuitry and can thus effectively change the level of the LIP activity necessary for initiating motor responses, thereby explaining the behavioral variation along with the speed–accuracy trade-off. A functional MRI study consistent with this prediction was recently reported (Forstmann et al. 2008). Moreover, the recent study by Furman and Wang (2008) (which appeared in the same journal issue as did the study by Beck et al. 2008, as mentioned earlier) has suggested that there would exist additional mechanisms that can modulate the speed–accuracy trade-off. Specifically, they have shown that inputs from other brain regions to the LIP, in particular, control signals from the prefrontal cortex can change the speed and the accuracy so as to nonmonotonically change the total outcome over multiple decision trials, and thus the maximization of the total outcome can in turn be achieved by tuning such top-down control signals (Furman and Wang 2008). Beck et al. (2008) have shown that changing the threshold (bound) of the LIP spiking activity can explain the speed–accuracy trade-off in the context of two- and four-choice tasks, whereas the maximization of the total outcome over multiple decision trials has not yet been addressed. It would be interesting to explore whether neurons or neural circuits are equipped with some additional ideally designed properties, of the kind that were suggested by Beck et al. (i.e., the Poisson-like spike statistics and the long decay time constant), for optimal multiple decisions.

The Bayesian model (Beck et al. 2008) and the attractor models (Furman and Wang 2008; Lo and Wang 2006; Machens et al. 2005; Miller et al. 2003; Wang 2002, 2008; Wong and Wang 2006) of decision making reflect statistical (Rieke et al. 1999) and dynamical (Wilson 1999) aspects of neural systems, respectively, and it is desired (Salinas 2008) that both of these approaches will be further elaborated to clarify together the decision-making mechanism, which is expected to be the same for everyone, regardless of whichever approach is preferred.

- Copyright © 2009 the American Physiological Society

## REFERENCES

- Aihara et al. 1984.↵
- Amit and Brunel 1997.↵
- Beck et al. 2008.↵
- Britten et al. 1993.↵
- Brody et al. 2003.↵
- Churchland et al. 2008.↵
- Deneve et al. 1999.↵
- Doya et al. 2007.↵
- Edin et al. 2008.↵
- Forstmann et al. 2008.↵
- Furman and Wang 2008.↵
- Ganguli et al. 2008.↵
- Goldman 2009.↵
- Goldman et al. 2003.↵
- Harsch and Robinson 2000.↵
- Huk and Shadlen 2005.↵
- Koulakov et al. 2002.↵
- Lo and Wang 2006.↵
- Ma et al. 2006.↵
- Machens et al. 2005.↵
- Miller et al. 2003.↵
- Palmer et al. 2005.↵
- Pouget et al. 2003.↵
- Ratcliff and McKoon 2008.↵
- Renart et al. 2003.↵
- Rieke et al. 1999.↵
- Roitman and Shadlen 2002.↵
- Salinas 2008.↵
- Seung 1996.↵
- Seung and Sompolinsky 1993.↵
- Shadlen and Newsome 2001.↵
- Shu et al. 2003.↵
- Stevens and Zador 1998.↵
- van Vreeswijk and Sompolinsky 1996.↵
- Wang 2002.↵
- Wang 2008.↵
- Wilson 1999.↵
- Wong et al. 2006.
- Wong and Wang 2006.↵