# Efficient Estimation of Detailed Single-Neuron Models

## Abstract

Biophysically accurate multicompartmental models of individual neurons have significantly advanced our understanding of the input–output function of single cells. These models depend on a large number of parameters that are difficult to estimate. In practice, they are often hand-tuned to match measured physiological behaviors, thus raising questions of identifiability and interpretability. We propose a statistical approach to the automatic estimation of various biologically relevant parameters, including *1*) the distribution of channel densities, *2*) the spatiotemporal pattern of synaptic input, and *3*) axial resistances across extended dendrites. Recent experimental advances, notably in voltage-sensitive imaging, motivate us to assume access to: *i*) the spatiotemporal voltage signal in the dendrite and *ii*) an approximate description of the channel kinetics of interest. We show here that, given *i* and *ii*, parameters 1–3 can be inferred simultaneously by nonnegative linear regression; that this optimization problem possesses a unique solution and is guaranteed to converge despite the large number of parameters and their complex nonlinear interaction; and that standard optimization algorithms efficiently reach this optimum with modest computational and data requirements. We demonstrate that the method leads to accurate estimations on a wide variety of challenging model data sets that include up to about 10^{4} parameters (roughly two orders of magnitude more than previously feasible) and describe how the method gives insights into the functional interaction of groups of channels.

## INTRODUCTION

Recent electrophysiological investigations have yielded an ever deeper insight into dendritic computation and dynamics, shedding light on fundamental issues such as the interaction of synaptic inputs in active dendrites (Roth and Häusser 2001), the role of shunting inhibition (Borg-Graham et al. 1998), backpropagating action potentials (Larkum et al. 1999; Roth and Häusser 2001; Stuart and Sakmann 1994), and active dendritic channels (Frick et al. 2001; Magee 1998; Magee and Johnston 1995; Reyes 2001). These experimental results have been supplemented by accurate, detailed biophysical models of single neurons that have allowed us to probe the computational purposes of such features and to address specific questions about the neural input–output (IO) function (Fellous et al. 2003; Koch 1999; Koch and Segev 2000; London and Häusser 2005; Poirazi et al. 2003b; Wolfart et al. 2005). Here, we attempt to capitalize on the advances in voltage-dye imaging (Baker et al. 2005; Djurisic and Zecevic 2005) and propose a simple statistical method for automatically building large, detailed compartmental models of single neurons, circumventing some of the traditional complexities in building these models “by hand.”

Our knowledge about the input to individual cells and about the relevant part of the output of the cell is at present still insufficient to allow direct inference of the neural IO function. However, the knowledge that passive properties of neurons are accurately described by the cable equation (Baldi et al. 1998; Bell and Craciun 2003; Bhalla and Bower 1993; Dayan and Abbott 2001; Jack et al. 1975) represents very rich information that can significantly constrain our search for the true neural IO function. Our approach to estimating neural properties from data will therefore consist in postulating a model class (a large, detailed compartmental model) that approximates the cable equations to arbitrary precision and then seeking to constrain the parameters in the model by data.

Typically, there is a major trade-off between realism and tractability when constructing large compartmental models: the more biophysically accurate and interpretable the model, the harder the computational task of setting the model’s parameters becomes, as the number of (nonlinearly interacting) parameters increases [into the thousands in biophysically accurate compartmental models, although this number is dramatically reduced using experimentally supported heuristics (Poirazi et al. 2003a; Schaefer et al. 2003b)]. The challenging nature of this high-dimensional, simultaneous parameter estimation problem is well known (e.g., Prinz et al. 2003) and to a large extent arises from highly nonlinear objective functions [e.g., the percentage of correctly predicted spike times (Bhalla and Bower 1993; Jolivet et al. 2004; Keren et al. 2005)] and the abundance of nonglobal optima in the large parameter space (Goldman et al. 2001; Vanier and Bower 1999).

Here we present a simple approach to estimating single neuron properties that is both computationally tractable and biophysically detailed. Our goal is to simultaneously infer, for each compartment of a large multicompartmental model, *1*) the concentration of membrane channels; *2*) intercompartmental conductances; *3*) the time-varying synaptic conductances; and *4*) other parameters such as the channel reversal potential, the membrane capacitance, and the noise level. To achieve this demanding goal we must make several assumptions; in particular, we assume *1*) observability of the spatiotemporal voltage signal at several (many) positions on the dendritic tree [e.g., by voltage-sensitive imaging methods (Antic et al. 1999; Baker et al. 2005; Djurisic and Zecevic 2005)] and *2*) a good understanding of the kinetics of the channels for which we seek to determine the densities.

The key insight of the proposed method is the *linear* relationship between dynamic functions of the observed voltage (such as the channel open probabilities) and the transmembrane currents, both of which are readily computed once the transmembrane voltage is known (see also Morse et al. 2001; Wood et al. 2004). The estimation of the parameters of proportionality between these can therefore be recast into a simple nonnegative linear regression problem. The linear relationship is of great value. First, it implies a unique, global optimum. Second, finding this optimum is an extremely well studied problem for which powerful computational tools are readily available (Boyd and Vanderberghe 2004; Press et al. 1992). In this paper we will give examples of the method’s performance on an extensive set of model data and plan to apply this method to in vitro recordings in the future.

Several subtleties, both of a computational and physiological nature, are worth noting. First, because of the large numbers of parameters, the optimization problem, although convex, is nevertheless very high dimensional (we will here infer about 10^{4} parameters simultaneously, which is roughly two orders of magnitude more than previously feasible); fortunately, certain decomposition methods apply that allow us to break the problem into many smaller, tractable subproblems (Platt 1998). In addition, it turns out that the problem of estimating the time-varying synaptic input to a given compartment is underconstrained: because we will be inferring several temporal series of conductance values (one for each type of synapse impinging on any particular compartment), given a voltage trace of the same length, the ratio of data to unknown variables will be <1. Similar issues arise when attempting to determine the relative densities of channels with very similar kinetics and reversal potentials. We discuss regularization strategies that have proven effective for controlling these problems (including methods for providing confidence intervals around our estimates). We will also show how the present method naturally reveals information not only about individual, but also about groups of channels and their *joint* effect on a neuron’s behavior.

The aim of the present extended version is to *1*) report the methods in intuitive detail to allow efficient implementation; *2*) extend the scope and scale of both the methods and the simulations; and *3*) provide an in-depth analysis of their performance. We previously reported the main ideas in abbreviated form (Ahrens et al. 2006).

## METHODS

### Basic setup

Biophysically accurate models of individual neurons are typically formulated as compartmental models—a set of first-order coupled differential equations that form an arbitrarily accurate spatially discrete approximation to the standard cable equations (Dayan and Abbott 2001). Modeling the cell under investigation in this discretized manner, the voltage *V*_{x}(*t*) in compartment *x* can be described by (1) Here the products *a*_{i,x}*J*_{i,x}(*t*) represent the time-varying membrane and axial currents, whereas *I*_{x}(*t*) is an externally injected (electrode) current and *N*_{x,t} is current noise scaled by σ_{x}. Dropping the subscript *x* for notational clarity when possible, the terms *a*_{i}*J*_{i}(*t*) will come to represent three types of currents in each compartment (see also Fig. 1)

Axial currents arising from voltage mismatch in adjacent compartments

Currents arising from synaptic input (ligand-gated channels) at time

*t*′Transmembrane currents arising from active (voltage-dependent) or passive channels

where *E*_{s} and *E*_{c} are the synapse and channel reversal potentials, respectively^{1} ; *g*_{s}(*t*, *t*′) is the conductance *time course* that synapse *s* would have contributed if it had been activated at time *t*′; and similarly *g*_{c}(*t*) is the time course of channel *c*’s conductance. The main insight in this paper is that the total transmembrane current *C*(d*V*/d*t*) is linear in the “current shapes” *J*_{i}(*t*) and that the proportionality constants *a*_{i} can be inferred by linear regression if we have access to the *J*_{i}(*t*). For each of the three current types mentioned above, the proportionality constants represent biophysically highly relevant variables whose estimation has been the object of extensive research. Among others, the parameters we will infer will include

the intercompartmental conductances

*f*_{x,y}the overall strength of the presynaptic input at time

*t*,*w*_{s}(*t*)the channel densities ḡ

_{c}

The second key point in this paper is that knowledge of the voltage trace [*V*(*t*)] and the channel kinetics [*g*_{s}(*t*, *t*′) and *g*_{c}(*t*)] together also grant us access to both the current shapes *J*_{i}(*t*) and the total transmembrane currents *C*(d*V*/d*t*), and thus permit us to accurately estimate the parameters we are interested in.

Clearly, any recording that gives us access to *V*_{x}(*t*) gives us direct access to d*V*_{x}/d*t* by a differentiation. Such scenarios include voltage-dye recordings for large, electrotonically extensive cells and whole cell patch-clamp recordings for electrotonically compact cells. In electrotonically compact cells, voltage-clamp recordings grant us direct access to the transmembrane current, relieving us from the need to differentiate the data *V*(*t*). Let us now show that voltage recordings also give us access to the *J*_{i}(*t*) (assuming knowledge of the kinetics) and allow us to efficiently estimate the parameters *a*_{i}.

### Special case: single-compartment, passive neuron

We illustrate the procedure for a single compartment with only a leak channel^{2} , which is described by (2)

Assuming also that we know *V*(*t*) (and momentarily also *E*_{L} and *C*, although this is readily relaxed; cf. appendix A), we can evaluate both the terms *J*_{L}(*t*) = [*E*_{L} − *V*(*t*)], which here is just the driving force, and the total transmembrane current *C*(d*V*/d*t*). *g*_{L} is now the unknown scaling factor that relates *J*_{L}(*t*) to the total transmembrane current. We seek to set it such that the difference between the total transmembrane current observed and the sum of the currents on the right-hand side (RHS) of *Eq. 2* match as closely as possible where arg min_{z} *h*(*z*) stands for the value of *z* that achieves the minimum of some function *h*(*z*). In the present case, this is unique. We note that this choice of ĝ* _{L}* corresponds to the maximum-likelihood (ML) estimate under Gaussian white noise

*N*(

*t*). For notational clarity, without loss of generality, we are neglecting the capacitance in vectorized formulations (cf. appendix A).

### Definition of current shapes; vector parameterization

More generally, knowledge of kinetics gives us access to *J*(*t*) when we condition on (i.e., assume knowledge of) the voltage trace. We will illustrate this for active conductances, in which the kinetics are voltage dependent, and for synapses, in which the kinetics are voltage independent but where the parameterization is somewhat more complex.

##### VOLTAGE-GATED CONDUCTANCES.

For the active conductances, knowledge of the membrane voltage and channel *c*’s kinetics gives direct access to the current shape *J*_{c}(*t*) of that channel. The shape of a channel’s current contribution, the “current shape,” is given by the product of that channel’s open fraction *g*_{c}(*t*) (which is a function of past voltage and time only) and the driving force *E*_{c} − *V*(*t*) The open fraction *g*_{c}(*t*) is computed straightforwardly once the voltage and the kinetics are known. Assume for example a Hodgkin–Huxley (HH) Na^{+} channel whose open fraction is *g*_{c}(*t*) = *m*(*t*)^{3}*h*(*t*) (Hodgkin and Huxley 1952). Both *m*(*t*) and *h*(*t*) are given by equations of the type (3) where *V*(*t*) enters as a forcing term through *m*_{∞}(*V*), and through τ_{m} (*V*), which together completely define the kinetics of the channel. Given *V*(*t*) and knowledge of *m*_{∞} and τ_{m} (the channel kinetics), we can directly compute *m*(*t*) and *h*(*t*), and thus *g*_{c}(*t*) and therefore the current shape *J*_{c}(*t*) that channel may contribute. [Note that in typical modeling studies *V*(*t*) and the channel variables *m*(*t*) and *h*(*t*) are evolved together as a set of coupled differential equations, but that knowledge of *V*(*t*) here decouples these equations.] The total discretized channel current is given by summing over the *N*_{c} distinct channels present in the membrane (4) and we can again identify each column of the channel shape matrix **J**_{c} with the current shape *J*_{c}(*t*) that a particular channel can contribute. Each component of **ḡ** is the corresponding proportionality constant ḡ_{c}, the membrane density of channel *c*.

Figure 2 illustrates this setting. The *top panel* in gray shows the voltage trace *V*(*t*), the only observed data here. From it we derive, in a deterministic fashion, the current shapes on the *left* and the derivative of the voltage on the *right* (in gray boxes). d*V*/d*t* now has to be matched by a weighted sum of the current shapes. The weights correspond to the parameters being inferred (in dashed black box) and are constrained to be positive. The case for synaptic inputs *w*(*t*) will be elucidated shortly.

Figure 3, on the other hand, illustrates the case when a voltage is imposed by voltage clamp. We then have experimental access to both the voltage and the transmembrane current. The voltage is still used to derive the current shapes deterministically. These are then weighted and summed to reproduce the negative of the current injected through the electrode, rather than the derivative of the voltage.

##### LIGAND-GATED CONDUCTANCES.

For the synaptic conductances, assuming e.g., α-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid (AMPA)–like synaptic kinetics, i.e., instantaneous rise and fast, exponential decay, turns the time-varying synaptic conductance *g*_{s}(*t*) into a convolution of the synaptic input *w*_{s}(*t*) (5) More generally, for synapses with slower rise kinetics [e.g., *N*-methyl-d-aspartate (NMDA)], we would convolve the input signal *w*_{s}(*t*) with an elementary conductance waveform possessing a correspondingly slower rise time, such as an alpha function (cf. voltage-dependent synaptic input: nmda).

By discretizing the time series, the convolution in *Eq. 5* can be rewritten as a multiplication with a convolution matrix which makes explicit the parameterization we use: every point in time *t*′ at which a synapse could be active is ascribed its own, independent parameter *w*_{s}(*t*′). The conductance time course (for all times *t*) corresponding to activating a synapse at time *t*′ is in this case given by *g*_{s}(*t*, *t*′) = Θ(*t* − *t*′)*e*^{−(t−t′)/τs}, where Θ(*t* − *t*′) is the Heaviside function and zeroes out anything before time *t*′. The respective current shape is then given by

Synaptic current shapes for inputs at various times are illustrated in Fig. 2. The total current shape from one synapse is the weighted sum of the current shapes arising from activations of the synapse at different times *t*′ (6) where diag (·) stands for a diagonal matrix with the vector *E*_{s} − **V** on the diagonal and implements an elementwise product of the vectors (*E*_{s} − **V**) and **Kw**_{s}. We can identify the *i*th column of synaptic shape matrix **J**_{syn} with the current shape *J*_{i}(*t*) as corresponding to the *i*th component of the vector **w**_{s}. A synaptic input at time *t*′ and with weight *w*_{s}(*t*′) thus contributes a current of the shape of column *t*′ of **J** to the RHS of *Eq. 1*. To sum up, once again, we may write *I*_{syn} as a weighted sum of known terms, with the weights *w*_{s}(*t*) exactly the parameters we seek to estimate. For each spike time, the current shape is, as before, a function only of the data but not of the parameters we seek to estimate.

At a discretization level that leads to a voltage vector of length *T*, there are *T* parameters to estimate for each type of synaptic kinetics included. Thus if there is one synapse per compartment, there are as many parameters to infer as data points are available. For more synapses per compartment, the data/parameter ratio is <1.

The gray box in Fig. 2 also illustrates how the total synaptic current is made up of the individual current shapes in a single compartment. In the bottom two plots, current shapes for an excitatory and an inhibitory synapse are shown. Curves in shades of gray correspond to different times *t*′ of synaptic activity. Thus the first curve simply looks like an exponential because the driving force at the times of nonzero conductance right after such an early synaptic input is constant. However, for an input spike right before the action potential, the current shape bears the effect of the change in driving force during the action potential. Although here we have considered only voltage-independent synapses, a combination of the channel and synaptic cases can be used for voltage-dependent (e.g., NMDA) synapses (see also voltage-dependent synaptic input: nmda).

### Inference

Given *Eqs. 6* and *4* and a similar equation for the intercompartmental conductances, we can infer *a* = {ḡ_{c}, *w*_{s}, *f*} by linear regression. To see this, we concatenate all the shape matrices and the parameter vectors and write (7) where **V̇** is a vector of length *T* with elements V̇_{t} = [*V*(*t* + d*t*) − *V*(*t*)]/d*t*, **a** is a vector containing all the parameters {ḡ_{c}, *w*_{s}, *f*} we want to infer, and *N*_{t} = ε, where ε is unit variance, independent Gaussian noise. A solution to this linear equation can be written as a constrained optimization (8) where V̇ the inequality constraints stem from the nonnegative nature of the parameters (note importantly that all the parameters we infer are directly biophysically interpretable). The Hessian **H** = **J**^{T}**J**, whereas f = 2**J**^{T}**V̇**.

The optimization in *Eq. 8* is jointly quadratic in all the parameters except σ, to which it is indifferent. There are no nonglobal optima (the Hessian is positive semidefinite) and we can use well-analyzed quadratic programming methods to find the optimum under the nonnegativity constraints, which act as linear constraints on **a** here. Furthermore, performing the quadratic minimization in *Eq. 8* is equivalent to assuming that the noise *N*(*t*) is Gaussian and white and maximizing the likelihood of the data given a setting of the parameters, i.e., **â**_{ML} is the maximum likelihood estimate (MLE) of the parameters under a Gaussian noise model.

Given {ḡ_{c}, *g*_{s}, *f*}, the usual MLE for σ in turn is readily (and analytically) computed: σ̂_{ML} is the root-mean-square error in the predicted current, σ̂_{ML}^{2} = (1/*T*) ∑_{t} [V̇(*t*) ∑_{i} *â*_{i}*J*_{i}(*t*)]^{2}, where the numerator gives the sum-squared error of our estimate of V̇(*t*) and the denominator normalizes by the length *T* of the observed data trace.

### Simulations

All simulations were carried out using MATLAB. Sample code for some of the simulations is available at http://www.gatsby.ucl.ac.uk/∼qhuys/code.html.

## RESULTS: APPLICATIONS TO MODEL DATA

In the following we explore inference of several combinations of these parameters to test the validity, efficiency, and accuracy of the proposed method. Initially, we will infer maximal channel conductances from a single compartment or multiple compartments. Then, we will analyze inference of synaptic inputs in a passive membrane and finally combine inference of synaptic and channel conductances.

### Channel densities ḡ_{c}

For illustrative purposes, let us first analyze the inference of channel densities in a single compartment when we know the exact channel descriptions. We will then relax this assumption to the case in which we do not know the identity of the channels present. Finally, we will infer channel densities along a multicompartmental structure.

##### EXACT KNOWLEDGE OF CHANNEL IDENTITIES AND KINETICS.

Suppose we are given the noiseless voltage trace *V*(*t*) from a single compartment’s response to a current input *I*(*t*) and know both the dynamics of all ion channels present in that compartment and the input *I*(*t*) applied to it. We now seek to infer the channel densities ḡ_{c} of these different channels from thevoltage trace. The equation describing this case is (9) where *E*_{c} is a channel’s (known) reversal potential and *C* = 1 μF/cm^{2}. Given *V*(*t*), we know the open fractions *g*_{c}(*V*, *t*) because these are only a function of *t* and the past voltage history *V*(*t*′) ∀*t*′ ≤ *t*; e.g., *g*_{c}(*V*, *t*) = *m*(*V*, *t*)*h*^{3}(*V*, *t*) with both gates *m* and *h* given by solutions to equations of the form of *Eq. 3*. The only unknowns in *Eq. 9* are the ḡ_{c} and the capacitance *C*. **J** is now a matrix that has entries *J*_{t,c} = *g*_{c}(*V*, *t*)[*V*(*t*) − *E*_{c}] and *J*_{t,I} = *I*(*t*) and **g**_{c} is a vector with entries ḡ_{c}/*C* and 1/*C*. Dropping the subscript *c*, the regression is then formulated as follows (10) In this noiseless case the regression provides highly accurate estimates and correctly recovers the true channel concentrations in the spiking model compartment (*C* = 1 μF/cm^{2}, *g*_{Na} = 120 mS/cm^{2}, *g*_{K} = 36 mS/cm^{2}, and *g*_{leak} = 3 mS/cm^{2}; data not shown).

##### UNCERTAINTY IN CHANNEL IDENTITIES AND KINETICS.

We now relax the requirement of exact knowledge of both channel kinetics and identities because we will not usually have knowledge of the exact kinetics of all channels present in each compartment. Rather than using only the true kinetics of the compartment’s channels as in the preceding section, we fit a more powerful model containing many different channel kinetics (including the true) in the hope that only those channels truly present in the membrane will be assigned nonzero conductances and that the data will constrain even this more powerful model. The true channel kinetics included were as before of Hodgkin–Huxley type. To test the selectivity of the estimation procedure, we fitted additional candidate channels from Poirazi et al. (2003a), Mainen and Sejnowski (1996), and Safronov et al. (2000) to the same data as used in the previous section. Figure 4*A* shows the data and Fig. 4*B* the voltage derivative to be matched by the summed, weighted current shapes. The inferred densities are shown to match the true ones in Fig. 4*C*. The actual nonzero weights by which the current shapes were multiplied were *a*_{input current} = 1/*C* = 1 cm^{2}/μF; *a*_{Na} = *g*_{Na}/*C* = 120 mS/μF; *a*_{K} = *g*_{K}/*C* = 36 mS/μF; and *a*_{leak} = *g*_{leak}/*C* = 3 mS/μF, from which the densities are easily derived. Zero or nonzero densities were inferred for the channels not present during the generation of the voltage trace (Fig. 4).

The solution to our problem formulation is given by the weighted sum of current shapes that most closely reproduces the observed current. Because we assume that channel densities are constant over time, we can expect to collect enough data to constrain even the more powerful model: in the example just mentioned, the same data that constrained the inference with exact knowledge of the channels sufficed to constrain the more powerful model with an extended channel library. However, this is only the case when the channel kinetics lead to sufficiently different current shapes. Had the channels we were trying to distinguish between been much more similar, we would not have expected such a simple answer. In general, the closer the kinetic schemes we try to distinguish, the more data will be needed.

More data, however, will not distinguish between channels whose linear combination jointly well accounts for the data. To see this more clearly, consider an artificial case where we are trying to simultaneously determine the densities of two channels *g*_{1} and *g*_{2}, which happen to have identical kinetics. Clearly, only the sum of the conductances assigned to these two instances is of importance, whereas the difference |ḡ_{1} − ḡ_{2}| is irrelevant. Figure 5 illustrates this scenario. Whenever there is such functional equivalence between kinetic schemes, there is irreducible uncertainty with respect to different combinations of channels and the data cannot discriminate between them. Looking back at the formulation of the problem in *Eq. 8*, this type of uncertainty corresponds to zero (or more generally, near-zero) eigenvalues of the Hessian **H**. If we have two very similar channels, two columns of the matrix **J** will approximately be equal (or proportional) and **H** will be near singular. The smallest eigenvalue of **H** will correspond to the longest axis of the now very elongated quadratic bowl **a**^{T}**Ha**, i.e., to the direction in parameter space along which the data least constrain the parameters.

Because the eigenspectrum of the Hessian **H** will be determined both by the similarity of the kinetic schemes and by the appropriateness of the data in differentiating between them, it will be a good indicator from which to derive error bars as measures of certainty. Nota bene, however, that the eigenvalues will be informative about certainty along eigenvectors of **H**, and that these eigenvectors will not generally line up with the axes along which only a single channel density changes. Eigenvalues will measure how well *combinations* of channels are constrained by the data, rather than individual channel densities. Figure 6 illustrates this point. Changes along certain relative channel densities have a considerable effect on the behavior of the neuron and are well constrained by the data. For example, in Fig. 6 we recover the intuitive result that changing the balance of Na^{+} to K^{+} channels has a substantial effect on the neuron’s behavior, whereas changing the relative concentrations of very similar channels has very little impact on either the objective function **a**^{T}**Ha** or on the behavior of the cell. Thus this method allows us to ascertain the channel combinations that are most and least constrained by the data, and therefore relevant or irrelevant to the measured neural behavior, respectively.

Finally, we seek a measure of confidence in the estimates of densities of individual channels. Assuming as we have so far, that the noise is Gaussian and white, we can use Monte Carlo methods such as importance sampling (Press et al. 1992) (or for low-dimensional problems as in Fig. 4 even straightforward rejection sampling) to obtain estimates of the error bars for individual channels (see appendix B for details).

##### INFERRING CHANNEL CONDUCTANCES IN A MULTICOMPARTMENTAL MODEL.

Next, we extend the method to a multicompartmental model, described by (11) which is the same as before, but now including current flow between connected compartments. The intercompartmental conductances are *f*_{x,y} and we minimize a similar vectorized equation as above (see *Eq. 8*). Based on biophysical constraints, we require *f*_{x,y} = *f*_{y,x} because the resistance to current flowing in one direction should be equal to the resistance to the current flowing in the opposite direction if the compartments are of equal size. Instead of constraining the optimization (which would entail an additional set of linear constraints on **a**), we define a new set of parameters, consisting only of the nonzero *f*_{x,y} for which *x* > *y*, and optimize over them instead.

Because of the small mutual influence between compartments that are spatially well separated, the inference is amenable to a computational decomposition speed-up method described in appendix C, and is thus quite easily extensible to even larger spatial structures, including thousands of compartments.^{3} Thus this method could potentially permit efficient inference of channel distributions across dendritic trees.

To illustrate the performance of the method, we randomly generated cells of 50 or 1,000 compartments, as shown in Fig. 7. A known squared sinusoidal current [*I*(*t*) ∝ sin^{2} (γ*t*)] was injected in the soma and the voltage from all compartments was recorded. When the true transmembrane current is known, as would be the case if each compartment’s voltage had been observed at an arbitrarily high sampling rate, all parameters are accurately recovered with only 10 ms of data, even in a nonspiking model (Fig. 8 shows this for the 1,000-compartment neuron in Fig. 7). Figure 8*E* shows the entire trace from 100 compartments.

##### SENSITIVITY TO CURRENT NOISE.

Figure 8 lays out the performance of the inference in the presence of negligible noise. However, here we are mainly interested in experimental situations in which only the noisy voltage is recorded, and the transmembrane current has to be recovered from the voltage trace. Taking the derivative of a noisy quantity increases the noise and may hurt the performance of the inference. Figures 9 and 10 show aggregate statistics for the effect of noise and temporal discretization, respectively. The data are drawn from 11 × 200 simulations of randomly generated 50-compartment, electrotonically extensive neurons (*L* = 16 μm) with 2,000 time points each (including spikes) for each compartment (e.g., 100 ms at 20 kHz). The current was inferred from the voltage in the simplest possible manner by taking the finite difference between adjacent data points *Î*_{total}(*t* + 0.5Δ) = [*V*(*t* + Δ) − *V*(*t*)]/2Δ [more sophisticated methods, such as derivative Gaussian processes (Solak et al. 2003) or intermittent Kalman filters (Doucet et al. 2001; Voss et al. 2004; Huys and Paninski, unpublished observations) are available; also see discussion].

Figure 9 shows that the inference of channel densities is less sensitive to noise than is inference of the intercompartmental conductances. Indeed, for very extended cells (*f*_{xy} < 5 mS/cm^{2} or *L* > 150 μm; data not shown), two sets of parameters emerge as the noise variance σ^{2} is increased. For one cluster, inference deteriorates. These are compartments in which there was no adequate voltage deflection to inform parameter choice at that particular level of noise. The currents in those compartments are well accounted for by noise and there is thus a trade-off between the noise σ^{2} and the required electrotonical compactness of the neuron.^{4}

When decreasing the temporal resolution of the recording, a negative bias is mostly observed in the active conductances. This was to be expected because inferring the current by a finite difference results in a systematic underestimation of the peak current during the action potential, this underestimate being directly dependent on Δ*t* and vanishing as Δ*t* → 0.

##### SPATIAL SAMPLING—THE LINEAR DENDRITE.

A likely current experimental situation is a recording from a contiguous subsection of the entire dendrite, for example along just one branch. Although in that situation we cannot estimate the parameters in the outermost compartments (because current from their unobserved neighbor compartments cannot be accounted for), estimation for the parameters of all the other compartments is unchanged if we treat these compartments as current sources.

A linear piece of dendrite also provides a good test of the robustness of our method to spatial subsampling: In the previous section we generated data from a multicompartmental model and used the same compartmentalization to infer the parameters. In general, however, the compartmentalization will be given by the spatial sampling locations without any knowledge of an accurate underlying compartmental model. Once there is subsampling, there is no longer a “true” set of parameters to which we can compare inference. A compartmental model with compartments at each sampling location will be just a spatial approximation to the true cell, and the parameters inferred are the best parameters for the particular compartmental model assumed by the discretization.

To demonstrate the robustness properties of our method to spatial subsampling, we generated a linear dendrite of 200 compartments and subsampled by factors *k* = {2, 3, 4, 5}, i.e., generated the data from the full dendrite but during parameter inference assumed access only to every *k*th compartment. The results are displayed in Fig. 11. The distribution of channel densities (Fig. 11, *A*–*C*) is centered around the true value in the subsampled case. The variance of the distribution shrinks rapidly as the cells become more electrotonically compact. For realistic initial compartmental lengths of *L* = 7 μm (corresponding to *f*_{xy} = 2,000 mS/cm^{2}) the error bars nearly vanish. The procedure is thus seen to be very robust to subsampling. Note that the inferred intercompartmental conductance decreases with increasing subsampling (Fig. 11*D*); this effect is expected, and exactly predicted by linear cable theory for which *f*_{xy} = *a*/(2*r*_{L}*L*^{2}), where *r*_{L} is the axial resistivity (in kΩ · mm^{2}) and *a* is the dendrite’s radius (in μm) (Dayan and Abbott 2001). The solid line shows a fit of this relationship to the data.

### Synaptic inputs w_{s}(t) to a passive membrane

Finally, we approach inference of synaptic conductance, i.e., presynaptic input. For clarity, let us first analyze a passive membrane patch that receives voltage-independent synaptic inputs (e.g., AMPA and GABA_{A}) (12) (13) where *s* indexes the *S* synapse types impinging on this membrane patch, *g*_{s}(*t*) is the synaptic conductance time course, *w*_{s}(*t*) is the strength of the synaptic input at time *t*, and τ_{s} is the synaptic time constant. The response of a passive membrane to one such synaptic input is shown in Fig. 12. Assuming knowledge of the synaptic time constants, our aim is to infer the *w*_{s}(*t*), the temporal pattern of presynaptic input. [Note that the synaptic input strength *w*_{s}(*t*) is constrained to be positive only, not to be binary.] We represent all the time series [*V*(*t*), d*V*/d*t*] as vectors **V**, **V̇**, and so forth and write the maximum likelihood (ML) **w** as (14) where all variables indexed by *s* have been concatenated, i.e., **w** is a concatenation of all discretized *w*_{s}(*t*) and for a recording of length *T* has length *TS*. If **K** is the convolution matrix corresponding to *Eq. 13*, i.e., **g** = **Kw**, then **J**_{syn} = diag (*E*_{s} − **V**)**K** as in *Eq. 6*. As before, the positive nature of the synaptic entries turns this minimization into a standard nonnegative quadratic minimization problem. However, the complexity of solving quadratic programs typically scales as 𝒪(*N*^{3}), where *N* is the number of parameters, which in the synaptic case is *T* · *S*. Except for small problems it is not feasible to solve this directly, but because the minimization is jointly convex in all the parameters, a number of decomposition techniques apply, e.g., we can simply iteratively optimize over a subset of parameters. appendix C (synaptic inputs only) details the procedure, which is quite efficient for this simple passive membrane. Figure 12*D* shows that this simple approach gives rather good results, but points to one issue that arises directly from the large number of free parameters: the inferred presynaptic activity *w*_{s}(*t*) is not sparse. That is, our synapse is kept active all the time (albeit at a low level) to explain the small random perturbations caused by the noise term in *Eq. 12*—a classical example of overfitting. Overfitting was to be expected as we have, for a single synapse on one compartment, as many parameters as data points (*N* = *TS* = *T*).

##### REGULARIZATION.

Figure 13*A* shows the performance of the ML estimator when there are three synapses, two excitatory (glutamatergic) and one inhibitory (GABAergic). We see that the inferred presynaptic activity is even less sparse than in the case with a single synapse: constant small synaptic inputs are used to explain the noise. Two factors explain why this case is worse than the case with one synapse: First, we now use each data (time) point to infer *several* (three) parameters. Second, because we have both excitatory and inhibitory synapses, an overshoot resulting from overactivity in a synapse can be explained away with too much activity in a synapse of the opposite sign with similar dynamics, which would not be possible if all synapses were of the same sign.

Because the data do not provide enough information to constrain the parameters, we have to use regularization techniques. We take a Bayesian approach: given that we chose to infer *w*_{s}(*t*)—a real nonnegative number—we can readily impose a prior distribution on *w*_{s}(*t*). Because we expect that for a single synapse, *w*_{s}(*t*) is really a sparse set of discrete events in time, an appropriate prior is one that *1*) does not enforce smoothness across time and *2*) penalizes nonzero values of *w*_{s}(*t*) and thereby enforces sparseness. A simple shrinkage prior that factorizes over time and penalizes large values of *w*_{s}(*t*) satisfies both these requirements. In vectorized form, we write the maximum a posteriori (MAP) estimate for *w*_{s}(*t*) as (15) where **n** is a vector of ones and λ determines the relative importance we give to matching the data (the first term in the equation above) and to sparsification (the second term). The larger the value of λ, the sparser the value of the estimated ŵ_{s}(*t*). In probabilistic terms, λ parameterizes an independent exponential prior distribution over *w*_{s}(*t*), that is Figures 12*E* and 13*B* show the effect of including the prior. Most of the small, constant synaptic activity is effectively suppressed without greatly affecting the larger spikes. The effect is pronounced in both the single- and three-synapse cases, but of more importance in the latter, for the reasons discussed above.

The value of λ also needs to be inferred. Its ML value is the a priori unknown presynaptic firing rate. There are several ways to proceed: The one adopted here is based on an empirical prior, choosing λ that maximizes the posterior probability, which corresponds to maximizing the posterior with respect to λ after discretization. Alternatively, λ̂ can be chosen to maximize the marginal likelihood (16) an integral that can be evaluated by Markov chain Monte Carlo methods (MacKay 2003).

During relatively input-free times regularization as introduced here will suppress the inferred synaptic activity used (mistakenly) to explain the voltage evolution noise. However, during periods of high-input firing rates, this regularization might also suppress legitimate spikes. Two more flexible schemes could be applied if we possessed information about some variable that explained much of the fluctuations in the input to our cell. First, it is reasonable on morphological grounds to allow the regularization parameter to vary as a function of location on the dendritic tree; for example, close to the soma of a pyramidal cell, we might decrease the regularization parameter λ_{inh} corresponding to inhibitory synapses, but increase the regularization parameter corresponding to excitatory synapses, thus reflecting our prior knowledge of the distribution of synaptic input as a function of distance from the soma. Second, it may be sensible to let λ vary as a function of time λ(*t*), where λ(*t*) may be modeled as some function of the stimulus condition at time *t*. A simplified version of this idea is explored further in stimulus-dependent synaptic input to an active membrane. To estimate the error bars on the synaptic input parameters, it will be necessary to adopt methods from the sequential sampling literature (Doucet et al. 2001; Huys and Paninski, unpublished observations) because the inference space here is too large for the simple importance sampler in appendix B.

##### TIME-VARYING SYNAPTIC STRENGTH.

Synaptic strength changes over time, at both fast (because of synaptic dynamics such as facilitation and depression) and slow (synaptic plasticity such as long-term potentiation and depression) timescales. Treating the weight of every spike by any individual synapse as an independent variable has the strong advantage that it not only allows inference of precise input spike times, but also naturally allows inferring the time-varying strength of the synaptic input. As an example, consider Fig. 14. Here three potentiating synapses^{5} impinged on one passive compartment, two of which were excitatory (Fig. 14*A*) and one inhibitory (Fig. 14*C*). Figure 14*D* plots the true synaptic strength for each spike versus the inferred strength. Because there were both excitatory and inhibitory synapses, regularization was necessary, but it introduced a minor negative bias only in the estimates of synaptic strengths.

##### VOLTAGE-DEPENDENT SYNAPTIC INPUT: NMDA.

The synapses discussed so far had simple, voltage-independent kinetics. NMDA synapses have a more complex voltage-dependent form, contributing a conductance *g*_{N} (17) where *b*(*V*) is some suitable (known) sigmoidal function of the voltage, representing magnesium block, whereas *o*(*t*) is the time-dependent component similar to the AMPA synapse *g*_{s}. As in the discussion of active channels, given the voltage *V*(*t*), we can compute both *o*(*t*) and *b*[*V*(*t*)] and thus again infer the channel density ḡ. The variation of the synaptic strength with time should present no problems as we infer the full time course *w*_{s}(*t*). Thus this fits directly into the present framework.

### Synaptic inputs to an active membrane

As mentioned above, there is a jointly global optimum for both channel densities and presynaptic input (the optimization is jointly convex in both). To illustrate this, we infer channel density and presynaptic input simultaneously in a single-compartment cell, writing (18) where *g*_{c}(*t*) is the conductance state of a channel *c* and ḡ_{c} is that channel’s maximal conductance. We fit the same channel kinetics *g*_{c}(*t*) as in the multicompartmental case and infer the 𝒞 parameters ḡ_{c} plus the *T* · 𝒮 parameters *w*_{s}(*t*).

Proceeding as before, we discretize *Eq. 18* and rewrite it in vector form (19) **J**_{ch} stems from the channel conductances (*Eq. 9*) and **J**_{syn} from the vectorized synaptic input (*Eq. 12*). **J**_{j} is thus *T* by 𝒞 + *T*𝒮, which as in the previous section is typically very large. In the *N*-compartment case this becomes *T* by *N*𝒞 + *NT*𝒮 + *N* (the last addition being for the intercompartmental conductances). **g** is the concatenation of all the parameters we try to estimate (**g**_{c} and **w**_{s}) and we obtain our estimate by **ĝ**= arg min** _{g}** ‖

**V̇**−

**J**

_{j}

**g**‖2/σ

^{2}+

**n**

^{T}

**g**, where

**n**contains the regularization terms (we regularize only the synaptic terms).

As in the previous section, the problem has to be decomposed because of its size. However, we now have two types of parameters: channel parameters, which are assumed stationary throughout the recording, and synaptic parameters. Two issues deserve mentioning: First, the stationary nature of the channel parameters means that the entire recording is informative about any one channel density parameter. On the other hand, information about any one synaptic parameter is strictly local. This permits sparse storage techniques to save memory (see appendix D). Second, this very powerful model is prone to overfitting in both manners discussed above: there are (possibly) too many channels and there are both excitatory and inhibitory synaptic conductances. Of course, these degrees of freedom can additionally interact; synaptic input, for example, could explain away part of the currents arising from the channels if proper regularization techniques are not used.

Indeed, this interaction between channel and synaptic parameters slows convergence of the decomposition method discussed so far [see *Synaptic inputs w*_{s}*(t) to a passive membrane* and appendix C, synaptic inputs only, slightly modified such as to always optimize over the channel parameters but iterating through the synaptic parameters] and this effect is more dominant for short than for long segments. We found that a combination of multiplicative updates and coordinate ascent yielded fastest results on the data with both channels and synapses (see synaptic and channel currents in appendix C for details). Figure 15 shows results on the joint inference of synaptic input and channel densities for a recording over 2 s at 0.1-ms resolution. There were thus 4 × 10^{4} + 7 = 40,007 parameters to estimate—a large inference problem. Using the decomposition detailed in synaptic and channel currents in appendix C, the inference took 22 min on a 64-bit 1.6-GHz AMD processor. In the enlargement in Fig. 15*C*, no excitatory spikes are missed and the prior effectively suppresses noisy activity between the true input spikes. Excitatory spikes are usually missed only during action potentials. Inhibitory spikes on the other hand are more easily missed as the voltage frequently approaches the reversal potential and inhibitory input is then undetectable because it does not contribute any current.

##### STIMULUS-DEPENDENT SYNAPTIC INPUT TO AN ACTIVE MEMBRANE.

In experiments one often records from a sensory cell while driving the cell with a relevant stimulus and there is a huge body of work on the relationship between such evoked activity and the stimulus. The present framework can also be extended to this setting. Let us assume we record intracellularly from, say, an electrotonically compact midbrain auditory neuron while presenting an auditory stimulus. Here it makes sense to represent the overall synaptic input to the cell in a more global manner as a function of the stimulus. In the spirit of many successful stimulus encoding models (Simoncelli et al. 2004), we write the entire input to any compartment as a linearly filtered version of the (*d*-dimensional) stimulus **s**(*t*) (20) and infer the (unconstrained) filter **k** at the same time as we infer the channel densities. *Equation 20* is of the same form as *Eq. 7*, with **g** now containing both the channel densities |Az**ḡ**_{c} and the filter **k** and the matrix **J** containing the current shapes for the channels and each of the stimulus dimensions. Figure 16 shows that this works well for a 10-d white noise stimulus driving a cell with HH channels. As previously, we fitted more channels than were present.

## DISCUSSION

We have developed a probabilistic regression framework for estimation of biophysical single neuron properties and synaptic input. It leads directly to algorithms for determining these parameters that are both efficient and are guaranteed to converge to the unique optimum. Previous applications of automatic parameter searches (Bhalla and Bower 1993; Prinz et al. 2003; Vanier and Bower 1999) possessed no such guarantee because they either followed a local gradient that may lead to local optima or used sampling techniques that may miss narrow peaks in the cost functions. The globally convex properties imply that only part of the state space has to be searched, permitting fits of models with free parameters on the order of 10^{4}. The framework leads to well-founded methods for analyzing the uncertainty of the estimates, which may be used to guide experiments aimed at better constraining particular parameters (Paninski 2005).

The method is very data efficient: throughout this paper, tens to hundreds of milliseconds of data sufficed to ascertain the parameters that are extensive in the data (channel and morphological parameters), dispensing with the need to average over cells (Golowasch et al. 2002). We believe this is a key first step toward applying these techniques in detailed, quantitative studies of dendritic input and processing in vitro and in vivo. For example, the present method could potentially allow mapping complex channel densities along dendritic trees [and thus the identification of dendritic “hot spots” of activity (Frick et al. 2001)], tracking synaptic channel distributions through development, mapping synaptic strength along the dendrite (London and Segev 2001; Magee and Cook 2000, 2001), and inferring synaptic input from many cells during sensory stimulation and their changing synaptic weights over time. The method naturally exposes the *joint* functioning of channels—not readily available previously—and may reveal insights into functional properties of channel combinations (cf. e.g., Slee et al. 2005). However, a number of important caveats relating to the data assumptions—and directions for necessary future work—should be emphasized.

### Assumptions about the data

We have assumed concurrent submillisecond resolution monitoring of the dendritic voltage throughout a subsection of the dendritic tree. Current voltage-dye recording methods still suffer from relatively low spatiotemporal sampling rates and allow only concurrent monitoring of the voltage trace at submillisecond resolution at a few tens of locations in the dendritic tree [albeit arbitrarily chosen ones (Bullen et al. 1997; Djurisic and Zecevic 2005; Iyer et al. 2006), even terminal dendrites (Baker et al. 2005; Djurisic et al. 2004) and individual spines (Nuriya et al. 2006)], and Fisher et al. (2005) recently began to combine voltage-dye with two-photon imaging techniques, which may further increase spatial resolution. Millisecond resolution is important to ensure accurate estimates of the total transmembrane currents. Given the extremely rapid advances of imaging methods in the past decade, we expect significant improvement in the near future. Until then, techniques from the intermittent Kalman filter literature could be applied for large cells (Doucet et al. 2000; Huys and Paninski, unpublished observations) and the method proposed here is already fully applicable to intracellular recordings from electrotonically compact cells.

We have assumed noiseless observation of the voltage time course *V*(*t*). This is reasonable for electrophysiological recordings, but does not yet apply to imaging data. Djurisic and Zecevic (2005) reported a 1–6% fluorescence change relative to background, but point out that this is ascribed to technical issues rather than physical limits set, e.g., by photon shot noise, and that there is still ample room for improvement. Voltage noise enters the main equation partially through its derivative d*V*/d*t*, the significance of which was indicated in inferring channel conductances in a multicompartmental model. It will be important to relax the noiseless-observation assumption by adapting standard techniques, for example, from the Kalman filter signal processing literature (Doucet et al. 2000).

In the presence of significant noise, observation of significant voltage deflections provides the crucial evidence to constrain, e.g., the Na^{+} channel densities. In distant dendrites, the backpropagating action potential may be insufficient to cause a large enough voltage deflection and provide weak constraints on active parameters (Fig. 9). For a parameter to be well constrained, *some* cellular behavior in which the parameter is of relevance has to be observed, whether it be in response to a backpropagating action potential or to synaptic input. If behavior affected by a particular parameter is never observed, that parameter may be of small relevance (smaller than the noise) to the cell’s overall behavior.

Recordings with low spatial resolution correspond to a small compartmental model. Although very small compartmental models (e.g., Mainen and Sejnowski 1996; Vetter et al. 2001) can reproduce qualitatively complex behavior of neurons, the exact identity of each compartment becomes crucial, and omission of a particular compartment may lead to a fundamental alteration of the model cell’s behavior. Although it may thus be that too small a compartmental model will be unable to reproduce the observed cellular behavior, conservation of certain dendritic statistics (Mainen and Sejnowski 1996; Vetter et al. 2001) should help ensure that even very low compartmental reconstructions provide satisfactory models of the cell under investigation. Furthermore, the parameters inferred for a small compartmental model will still identify the best possible model of the data at that level of simplicity. Until large cells can be sampled at high spatial frequency, the present method is directly applicable to any contiguous subpart of a dendrite.

The method does not require any morphological data, but if they are available, it is possible not only to reduce the number of intercompartmental conductances (see sensitivity to current noise), but also to include extracellular data (Gold et al. 2006), which is readily available in vitro. Because extracellularly recorded potentials are linear sums of transmembrane currents contributed by the entire dendritic tree, each extracellular recording provides one additional constraint on the relative contribution of longitudinal versus transmembrane currents. This may further improve accuracy when intercompartmental conductances are weakly constrained.

Discrepancies between model and data are penalized by a squared error, giving rise to a well-defined probabilistic interpretation unlike noise-free, deterministic formulations that assign hard zero probability to models that do not fit the data exactly (Baldi et al. 1998). Because the cost function measures performance on the entire voltage trace, the model is forced to account for all the cellular behavior evident in the data. Rather than relying on a few hand-selected behaviors (Bhalla and Bower 1993; Jolivet et al. 2004; Keren et al. 2005), this approach thus benefits from data that explore as vast a range of behaviors as possible. Although the main evolution *Eq. 1* includes only Gaussian additive (current) noise and multiplicative (conductance) noise is not explicitly present in the formulation, letting synaptic input vary on a continuous scale makes allowance for conductance noise [which may be relevant to replicating true in vivo states (Fellous et al. 2003; Rudolph and Destexhe 2003)] and dependency of the noise scale σ on *V*(*t*) is easily incorporated in our analysis. Even though the assumption of Gaussian noise does remain to be tested, our observations on a variety of cost functions corresponding to different noise assumptions (data not shown) have always led to very similar results, indicating robustness with respect to the noise model.

Channels may be modulated by variables other than voltage (e.g., Ca^{2+}- or K^{+}-dependent channels). In its present form, the method applies to data in which these currents are absent or have been blocked pharmacologically. It extends directly to cases where these variables are observed; more generally, a semirealistic model of calcium dynamics, for example, can be developed. Calcium can then be treated as a hidden variable over which to average. A related issue is the stationarity assumption, which experimentally is satisfied for times that, although short, are long enough given the method’s data efficiency. Repeated inference at different times may thus be used to monitor nonstationarities.

### Assumptions about voltage-gated channels

Several techniques have been applied to the mapping of channel distributions across the dendritic tree. Histological techniques, such as immunogold imaging, although being highly quantitative, allow the determination of only relative channel densities and preclude any analysis of the channel kinetics themselves (Häusser 2003). Neurophysiological approaches include excised or cell-attached patch-clamp recording techniques and dendrosomes (Bekkers 2000; Hoffman et al. 1997; Magee and Johnston 1995), which are known to have low reliability arising from small currents, variability in patch area, number of included channels, and the nonisopotential nature of extended dendritic structures (Häusser 2003). Recently, techniques have been used to transform some of the weaknesses of neurophysiological techniques into strengths. Thus it is possible to use the large fluctuations arising from small channel numbers to extract channel densities in spines (Sabatini and Svoboda 2000). Schaefer et al. (2003a) used detailed models of the passive structure of neurons and an insightful assumption about channels with zero densities at rest to correct for the lack of space clamp and infer channel kinetics and densities at each patch-clamp position (in fact, they use the local nature of the voltage clamp to estimate these quantities at the different locations in the dendrite). Despite its elegance and accuracy, this approach still requires pharmacological manipulations, multiple patchings, and stimulations of each neuron (at each site of interest), followed by a detailed morphological reconstruction and it cannot be used to simultaneously infer the density distributions of many (regenerative) channels. The method presented in this paper also relies on many local measurements, but these are exactly of the kind promised by voltage-dye imaging methods: we do not require voltage-clamp measurements (dispensing with the need for multiple patchings) and demand no such pharmacology or indeed anatomical reconstruction more detailed than that available from voltage-dye recordings (remember that the number of recording sites directly determines the morphological approximation).

An issue that does deserve further exploration is whether the present method allows automatic inference of channel kinetics in addition to assigning the channels densities. One approach was illustrated earlier in uncertainty in channel identities and kinetics and inferring channel conductances in a multicompartmental model: include a large number of kinetics schemes (e.g., systematic parametric variations of a particular channel type) in the channel library and choose those that are assigned nonzero densities. This amounts to an automatic model selection scheme. However, uncertainty in channel identities and kinetics also shows that it applies only to selecting among sufficiently differing channel combinations and will yield satisfactory results only if kinetic schemes close enough (and this is hard to quantify) to the true ones are included in the channel library. It may be possible to concatenate the present approach with previous approaches, selecting among significantly differing channel kinetics in a first coarse step using the present method and then refining the selected channel kinetics by doing a *local* gradient descent on those kinetic parameters. It may also be possible to use methods developed for single channels (Colquhoun and Hawkes 1977) to infer continuous-time Markov chain descriptions of single channels at the same time as their densities.

### Assumptions about synaptic input

We have assumed known synaptic kinetics. Four issues need to be addressed. First, although evidence suggests that the changes in the postsynaptic potentials of a particular synaptic type observed along the dendritic tree and in learning are mostly attributable to changes in channel density, rather than to changes in synaptic receptor kinetics or kinetic changes arising from alterations of spine morphology (Andrásfalvy and Magee 2001; Eder et al. 2003; Koch and Zador 1993; Magee and Cook 2000; Nuriya et al. 2006; Spruston et al. 1995; Williams and Stuart 2003), the synaptic kinetics may vary across the dendritic tree. Second, even if the kinetics do not vary across a tree, selection of one kinetic scheme among many is hard because of the paucity of data. Classically, inferring synaptic conductance shapes is nontrivial but possible (Cox 2004; Häusser and Roth 1997; MJE Richardson and G Silberberg, unpublished results), and it may be necessary to rely on further progress in in vitro technology, which will prove helpful in inferring conductance time courses of synapses, even distant from the soma (Boucsein et al. 2005; Häusser and Roth 1997). Third, in in vivo–like high conductance states, massive synaptic bombardment (Fellous et al. 2003) reduces the effect of individual synaptic inputs on the postsynaptic membrane (Destexhe et al. 2003). The current noise, however, will be reduced equally, and thus the performance of our method in high-conductance states should not deteriorate, although this has not been assessed. Fourth, even with known synaptic kinetics, the performance on model data indicates that a reduction of the dimensionality of the synaptic inference problem may be desirable for applications to full-scale neurons. This will be of particular importance if many synapses of varying type impinge on any one individual compartment, as the ratio of data points to parameters rapidly becomes too small.

Previous approaches to estimating synaptic input have concentrated on amalgamated statistics of the synaptic input, rather than on the exact time course of the presynaptic spike train, partially because of limitations emanating directly from the data considered (Rall 1967). One possibility, used both in vitro and in vivo, is to measure the voltage excursions in response to injection of short current pulses (Pei et al. 1991). However, the frequent pulse injections necessary to obtain a conductance time course disturb the voltage trajectory. A popular approach that avoids this problem is to clamp a cell’s voltage at various holding potentials and infer an *I*–*V* curve, the slope of which gives the overall synaptic conductance at any one point in time (Borg-Graham et al. 1998). Although this approach does recover a time course of synaptic conductance, it recovers only the summed, total input as seen at the soma, not the full *spatio*temporal input signal as attempted by our method. Also, Borg-Graham et al. (1998) worked in regimes where the contribution by active channels is negligible, whereas the present method explicitly works with both simultaneously. Other work has concentrated on estimating statistical descriptions of the overall synaptic input. Fellous et al. (2003) inferred the mean and variance of an Ornstein–Uhlenbeck process, which is a valid statistical description of the summed input from thousands of presynaptic cells, but which again does not attempt to recover the precise presynaptic spike times. The stringent data requirements of our method are a direct result of the more ambitious goal.

## APPENDIX

### A: Inferring reversal potentials and capacitance

It is also possible to estimate the reversal potentials of the channels and synapses. The effect of a channel on the membrane current d*V*/d*t* = −ḡ*g*(*t*)[*V*(*t*) − *E*] can be rewritten as (A1) where ℱ = ḡ*E*. We can treat these two terms as *two* pseudocurrents instead of one, and estimate both ḡ and ℱ in the way described above and finally deduce the reversal potential from the relation *E* = ℱ/ḡ. This works well on the model data used in this paper, but in general ḡ may be zero when ℱ is nonzero, and there may be need to regularize particularly when ḡ is small. This regularization may be performed, as above, by introducing an exponential or Gaussian prior on *E*. It is also possible to write where *E* is now a guessed reversal potential and *b* is the deviation from it, multiplied by ḡ. Regularizing *b* will punish large deviations from the guessed reversal potentials.

The capacitance is the proportionality constant between recorded and injected currents and thus 1/*C* and *a*_{i}/*C* may be inferred using a straightforward modification of the techniques developed here, and an estimate for *C* may be obtained by inverting this estimate of 1/*C*.

### B: Monte Carlo error bars by importance sampling

Because of the nonnegativity constraints that are often enforced, *ĝ* will be far from the mean of the unconstrained posterior. Thus the variance around the unconstrained posterior mean is not a good indicator for certainty. The second moment around *ĝ* arguably coincides most closely with classical measures of confidence. However, its evaluation poses further challenges. Although the posterior distribution is over all parameters jointly, the error bars for individual parameters are related to the width of the marginal distribution, i.e., we have to integrate out all parameters but the one of interest—an integral over potentially many dimensions. Because this is hard to evaluate analytically, we opted for a Monte Carlo approach.

The second moment around the estimated values *ĝ*_{ML} or *ĝ*_{MAP} is given by where the right side of the equation is a sum over samples **g*** _{p}^{i}* generated from

*p*(

**g**|

**V**). As long as

**g**is not too high dimensional, it is possible to sample from an approximate distribution

*q*(

**g**) and reweigh the samples instead of sampling directly from

*p*(

**g**|

**V**). To see this, write the average of interest The first equality holds by simple expansion if

*q*(

**g**) is nonzero wherever

*p*(

**g**|

**V**) is nonzero. The second expresses the central idea of importance sampling (MacKay 2003). The average is now written as an average over

*q*(

**g**), not over

*p*(

**g**|

**V**). That is, we can now generate

*N*samples {

**g**

*}*

_{q}^{i}_{i=1}

^{N}from the approximate distribution

*q*(

**g**) and

*reweigh*each of the samples by a factor

*w*(

**g**), which corrects for the fact that we sampled from the approximate distribution

*q*(

**g**), although we should have sampled from

*p*(

**g**|

**V**)

In the present case, samples from the approximate distribution are generated by adding independent, truncated Gaussian noise to *ĝ*_{ML} or *ĝ*_{MAP}. Finally, evaluating *p*(**g**|**V**) is not straightforward either because it involves another complex high-dimensional integral. Instead of using *p*(**g**|**V**) in the computation of *w*(**g**), it is possible to use *p**(**g**|**V**), where *Z* is the (unknown) normalization constant and incorporate a further correction in *w*(**g**) More general methods, such as Metropolis–Hastings (MacKay 2003), may also be used.

### C: Quadratic programming

We explored a variety of approaches to quadratic programming. The simplest instances of the inference problems addressed here—the pure channel and the channel + linear filter problems in a single compartment—are amenable to direct inference with standard quadratic programming tools (e.g., quadprog.m in Matlab). However, the larger-scale multicompartmental and/or temporal synaptic input problems lead to very high-dimensional optimization problems, which must be decomposed in various ways before we may solve them efficiently. For multicompartmental problems without synapses and pure synaptic problems (with no active channels), we found that simply breaking up the problem into smaller problems—each of which would be solved by a standard quadratic programming tool—was most efficient. On the other hand, for the joint inference of channel and synapses, we found a combination of multiplicative updates followed by a sequential minimal optimization (SMO)–like (Platt 1998) coordinate ascent to be fastest.

##### SYNAPTIC INPUTS ONLY.

*Equation 15* is in a form that can be solved by standard quadratic programming (QP) tools, but the dimensionality is too big and leads to prohibitively long computation time. Rather than solving for the entire **w** (dropping the subscript *s*) at once, we note that we can iteratively solve for small chunks **w**_{k} of **w**. This is explained by the fact that the surface on which we move is quadratic and has no nonglobal minima, and therefore we can descend the jointly quadratic surface by taking steps changing only parts of the vector **w**, keeping the other parts fixed. When solving the large problem at once, we take steps straight toward the maximum. Optimizing iteratively over the smaller subspaces corresponds to taking steps within subspaces (black lines in Fig. C1 if the subspaces are individual parameters). Overall the path to the maximum is longer (cf. Fig. C1, black vs. red lines), but because QP problems typically are of 𝒪(*n*^{3}) computational complexity, we expect this method to be faster for longer recordings. We first multiply out and rewrite *Eq. 15* as a nonnegative quadratic minimization problem (C1) where **H** = **J*** _{syn}^{T}*J

_{syn}/σ

^{2},

**f**= λ

**n**− 2

**J**

_{syn}^{T}**V̇**/σ

^{2}, and

*w*

_{t}≥ 0 ∀

*t*, we can now write smaller subproblems, optimizing only over

**w**

_{k}(C2) where

*c*is independent of

**w**and

*k*indexes the parameters updated at that iteration, whereas k̄ indexes the parameters not updated at that iteration. We have found that this converges very rapidly (approximately two iterations over all parameters) because the overall problem really does more or less decompose into small independent problems resulting from the locality of the synaptic conductances in time (or, in the spatial case, resulting from the locality of the channels in separate compartments). Adding a line search along the overall direction (this is just a one-dimensional quadratic program; see Fig. C1) did not significantly speed up convergence in this case.

##### SYNAPTIC AND CHANNEL CURRENTS.

In practice, the approach in the previous section proved rather slow for the joint inference of channels and synapses, resulting from the coupling between temporally distant synaptic weights *w*_{s}(*t*) induced by the dependency of the optimal estimate for the channel parameters on the fully inferred synaptic time course. Multiplicative updates (Sha et al. 2003) were more efficient at approaching the region containing the solution in this case. Both multiplicative updates and the chunking (Platt 1998) approach to QP presented in the previous section were slower than a very fast and efficient formulation of coordinate ascent at actually reaching the minimum. Although it is known that multiplicative updates slow down inappropriately around the minimum, the slowness of the chunked QP is likely attributable to overhead from function calls.

#### Multiplicative updates.

Letting the multiplicative updates take a very simple form (Sha et al. 2003)

#### Coordinate ascent^{6} .

Minimizing *Eq. C1* with respect to only one parameter allows derivation of a closed-form analytic solution and circumvents the need for calling a QP function (C3) *Equation C3* can be written in a very efficient update form such that ∑_{j}_{≠}* _{i}* 2

*A*

_{ij}

*g*

_{j}does not have to be reevaluated every time. Let

**g**be the vector of parameters from the previous iteration and

*g*

_{i}the new value. Then (C4) and we update both

*g*

_{i}and the vector

**m*** =

**m**+ (

*g*

_{i}−

*g*

_{i})

**H**

_{i}(where

**H**

_{i}represents the

*i*th column of

**H**) iff

*g*

_{i}≠

*g*

_{i}. Conditioning on a change in

*g*

_{i}has a similar effect to the heuristics used in SMO (Platt 1998), whereby that Lagrangian is updated that most violates the Kahrush–Kuhn–Tucker conditions. Here all the

*g*

_{i}values that violate the conditions are updated. Searching for those that contribute the largest violations is computationally less efficient than iterating through in this simple manner.

### D: Memory requirements

For large problems, the memory requirements of constructing the matrices **J** and **H** become prohibitive. However, the size of **J** mostly arises from the synaptic contributions, which are only locally nonzero (a synaptic input has a relatively short impact on the voltage trace). The sparse.m function in MATLAB allows highly efficient use of this property. In all the larger problems presented in this paper, all elements in both **J** and **H** of absolute size smaller than some ε were set to zero. This did not affect performance but significantly improved speed and allowed large problems to be solved. We found ε = 0.001 to be useful.

## GRANTS

This work was supported by Gatsby Charitable Foundation grants to L. Paninski and M. Ahrens, a Royal Society International Fellowship to L. Paninski, the Bremen Institute of Industrial Technology and Applied Work Science consortium, and a University College London School of Medicine grant to Q. Huys.

## Acknowledgments

We are indebted to P. Dayan, A. Destexhe, C. Gold, M. Häusser, M. London, F. Perez-Cruz, A. Roth, and S. Roweis for very helpful and interesting discussions and to E. Marder and anonymous reviewers for helpful suggestions. We are thankful to S. Slee for detailed comments on a draft of this paper and G. Gorny for the photograph in Fig. 1.

## Footnotes

↵* Q.J.M. Huys and M. B. Ahrens contributed equally to this work.

↵1 For clarity, we will present the technique assuming the reversal potentials

*E*and the membrane capacitance*C*are known, but it is possible to relax these assumptions and infer both*E*and, if current is injected intracellularly,*C*(see appendix A).↵2 We will henceforth use the term “channels” to refer to “channel types,” not to individual channels.

↵3 If the connectivity between the compartments is known (which is the case for a voltage-dye experiment and is assumed here), the number of intercompartmental parameters to infer grows slightly faster than linearly. If the connectivity is unknown, this number grows quadratically.

↵4 It is reasonable to reduce the model by setting

*f*_{xy}=*f*_{yx}=*rA*_{xy}, where*A*_{xy}is the observed cross-sectional area of the dendrite between compartments*x*and*y*and*r*is the unknown resistivity parameter, which is assumed to be constant throughout the cell. Then the parameter*r*may be inferred by quadratic programming methods similar to those described here. This has the obvious advantage of reducing the dimensionality of the model to be estimated; however, optical measurements of the cross-sectional diameter might be unreliable or susceptible to noise arising from the small spatial scale, in which case it may be more robust to fit the multicompartmental model (Eq. 11) directly.↵5 As potentiation dynamics we used

*w*_{s}(*t*) = Σ_{k}δ(*t*−*t*_{k}) ∫d*t*′ exp[−(*t*−*t*′/τ_{p})]*w*_{s}(*t*′), where {*t*_{k}} represents all the spikes (indexed by*k*) that a synapse emits (Gerstner and Kistler 2002).↵6 We thank F. Perez-Cruz for this formulation.

The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “

*advertisement*” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

- Copyright © 2006 by the American Physiological Society