## Abstract

Multiple factors simultaneously affect the spiking activity of individual neurons. Determining the effects and relative importance of these factors is a challenging problem in neurophysiology. We propose a statistical framework based on the point process likelihood function to relate a neuron's spiking probability to three typical covariates: the neuron's own spiking history, concurrent ensemble activity, and extrinsic covariates such as stimuli or behavior. The framework uses parametric models of the conditional intensity function to define a neuron's spiking probability in terms of the covariates. The discrete time likelihood function for point processes is used to carry out model fitting and model analysis. We show that, by modeling the logarithm of the conditional intensity function as a linear combination of functions of the covariates, the discrete time point process likelihood function is readily analyzed in the generalized linear model (GLM) framework. We illustrate our approach for both GLM and non-GLM likelihood functions using simulated data and multivariate single-unit activity data simultaneously recorded from the motor cortex of a monkey performing a visuomotor pursuit-tracking task. The point process framework provides a flexible, computationally efficient approach for maximum likelihood estimation, goodness-of-fit assessment, residual analysis, model selection, and neural decoding. The framework thus allows for the formulation and analysis of point process models of neural spiking activity that readily capture the simultaneous effects of multiple covariates and enables the assessment of their relative importance.

## INTRODUCTION

Understanding what makes a neuron spike is a challenging problem, whose solution is critical for deciphering the nature of computation in single cells and neural ensembles. Multiple factors simultaneously affect spiking activity of single neurons and thus assessing the effects and relative importance of each factor creates the challenge. Neural activity is often studied in relation to 3 types of covariates. First, spiking activity is associated with extrinsic covariates such as sensory stimuli and behavior. For example, the spiking activity of neurons in the rat hippocampus is associated with the animal's position in its environment, the theta rhythm, theta phase precession, and the animal's running velocity (Frank et al. 2002; Mehta et al. 1997, 2000; O'Keefe and Dostrovsky 1971; O'Keefe and Recce 1993). Retinal neurons respond to light intensity and light contrast, and V1 neurons are influenced by the spatiotemporal structure outside their classic receptive fields (Knierim and Vanessen 1992; Sillito et al. 1995; Vinje and Gallant 2000). The spiking activity of neurons in the arm region of the primary motor cortex (MI) is strongly associated with several covariates of motor behavior such as hand position, velocity, acceleration, and generated forces (Ashe and Georgopoulos 1994; Fu et al. 1995; Scott 2003). Second, the current spiking activity of a neuron is also related to its past activity, reflecting biophysical properties such as refractoriness and rebound excitation or inhibition (Hille 2001; Keat et al. 2001; Wilson 1999).

Third, current capabilities to record the simultaneous activity of multiple single neurons (Csicsvari et al. 2003; Donoghue 2002; Nicolelis et al. 2003; Wilson and McNaughton 1993) make it possible to study the extent to which spiking activity in a given neuron is related to concurrent ensemble spiking activity (Grammont and Riehle 1999, 2003; Hatsopoulos et al. 1998, 2003; Jackson et al. 2003; Maynard et al. 1999; Sanes and Truccolo 2003). Therefore, a statistical modeling framework that allows the analysis of the simultaneous effects of extrinsic covariates, spiking history, and concurrent neural ensemble activity would be highly desirable.

Current studies investigating the relation between spiking activity and these 3 covariate types have used primarily linear (reverse correlation) or nonlinear regression methods (e.g., Ashe and Georgopoulos 1994; Fu et al. 1995; Luczak et al. 2004). Although these methods have played an important role in characterizing the spiking properties in many neural systems, 3 important shortcomings have not been fully addressed. First, neural spike trains form a sequence of discrete events or point process time series (Brillinger 1988). Standard linear or nonlinear regression methods are designed for the analysis of continuous-valued data and not point process observations. To model spike trains with conventional regression methods the data are frequently smoothed or binned, a preprocessing step that can alter their stochastic structure and, as a consequence, the inferences made from their analysis. Second, although it is accepted that extrinsic covariates, spiking history, and neural ensemble activity affect neural spiking, current approaches make separate assessments of these effects, thereby making it difficult to establish their relative importance. Third, model goodness-of-fit assessments as well as the analysis of neural ensemble representation based on decoding should be carried out using methods appropriate for the point process nature of neural spike trains.

To address these issues, we present a point process likelihood framework to analyze the simultaneous effects and relative importance of spiking history, neural ensemble, and extrinsic covariates. We show that this likelihood analysis can be efficiently conducted by representing the logarithm of the point process conditional intensity function in terms of linear combinations of general functions of the covariates and then using the discrete time point process likelihood function to fit the model to spike train data in the generalized linear model (GLM) framework. Because the discrete time point process likelihood function is general, we also show how it may be used to relate covariates to neural spike trains in a non-GLM model. We illustrate the methods in the analysis of a simulated data example and an example in which multiple single neurons are recorded from MI in a monkey performing a visuomotor pursuit-tracking task.

## METHODS

In this section we present the statistical theory underlying our approach. First, we define the conditional intensity function for a point process. Second, we present a discrete time approximation to the continuous time point process likelihood function, expressed in terms of the conditional intensity function. Third, we show that when the logarithm of the conditional intensity is a linear combination of functions of the covariates, the discrete time point process likelihood function is equivalent to the likelihood of a GLM under a Poisson distribution and log link function. Alternatively, if the point process is represented as a conditionally independent Bernoulli process and the probability of the events is modeled by a logistic function, then the likelihood function is equivalent to the likelihood of a GLM under a Bernoulli distribution and a logistic link function. Fourth, we present several forms of conditional intensity models for representing spiking history, neural ensemble, and extrinsic covariate effects. Finally, we define our approach to maximum likelihood estimation, goodness-of-fit assessment, model comparison, residuals analysis, and decoding from point process observations by combining the GLM framework with analysis methods for point processes.

A *point process* is a set of discrete events that occur in continuous time. For a neural spike train this would be the set of individual spike times. Given an observation interval (0, *T*], a sequence of *J* spike times 0 < *u*_{1} < … < *u*_{j} < … < *u*_{J} ≤ *T* constitutes a point process. Let *N*(*t*) denote the number of spikes counted in the time interval (0, *t*] for *t* ∈ (0, *T*]. We define a single realization of the point process during the time interval (0, *t*] as *N*_{0:t} = {0 < *u*_{1} < *u*_{2} < … < *u*_{j} ≤ *t* ∩ *N*(*t*) = *j*} for *j* ≤ *J*.

### Conditional intensity function

A stochastic neural point process can be completely characterized by its conditional intensity function λ(*t* | *H* (*t*)) (Daley and Vere-Jones 2003), defined as (1) where *P*[· | ·] is a conditional probability and *H(t)* includes the neuron’s spiking history up to time *t* and other relevant covariates. The conditional intensity is a strictly positive function that gives a history-dependent generalization of the rate function of a Poisson process. From *Eq. 1* we have that, for small Δ, λ(*t* | *H*(*t*))Δ gives approximately the neuron's spiking probability in the time interval (*t, t* + Δ]. Because defining the conditional intensity function completely defines the point process, to model the neural spike train in terms of a point process it suffices to define its conditional intensity function. We use parametric models to express the conditional intensity as a function of covariates of interest, therefore relating the neuron's spiking probability to the covariates. We use λ(*t* | θ, *H*(*t*)) to denote the parametric representation of the conditional intensity function in *Eq. 1*, where θ denotes an unknown parameter to be estimated. The dimension of θ depends on the form of the model used to define the conditional intensity function.

A discrete time representation of the point process will facilitate the definition of the point process likelihood function and the construction of our estimation algorithms. To obtain this representation, we choose a large integer *K* and partition the observation interval (0, *T*] into *K* subintervals (*t*_{k}_{−1}, *t*_{k}]_{k=1}^{K} each of length Δ = *TK*^{−1}. We choose large *K* so that there is at most one spike per subinterval. The discrete time versions of the continuous time variables are now denoted as *N*_{k}=*N*(*t*_{k}), *N*_{1:k}=*N*_{0:tk}, and *H*_{k} = *H*(*t*_{k}). Because we chose large *K*, the differences Δ*N*_{k} = *N*_{k} − *N*_{k−1} define the spike train as a binary time series of zeros and ones. In discrete time, the parametric form of the conditional intensity function becomes λ(*t*_{k} | θ, *H*_{k}).

### Point process likelihood and GLM framework

Because of its several optimality properties, we choose a likelihood approach (Pawitan 2001) for fitting and analyzing the parametric models of the conditional intensity function. As in all likelihood analyses, the likelihood function for a continuous time point process is formulated by deriving the joint probability density of the spike train, which is the joint probability density of the *J* spike times 0 < *u*_{1} < *u*_{2} < … < *u*_{J} ≤ *T* in (0, *T*]. For any point process model satisfying *Eq. 1*, this probability density can be expressed in terms of the conditional intensity function (Daley and Vere-Jones 2003). Similarly, in the discrete time representation, this joint probability density can be expressed in terms of the joint probability mass function of the discretized spike train (see appendix *Eqs. A1* and *A2*) and is expressed here as a product of conditionally independent Bernoulli events (Andersen et al. 1992; Berman and Turner 1992; Brillinger 1988; Brown et al. 2003) (2) where the term *o*(Δ* ^{J}*) relates to the probability of observing a spike train with 2 or more spikes in any subinterval (

*t*

_{k}

_{−1},

*t*

_{k}]. From

*Eqs. A3*–

*A5*in the appendix, it follows that

*Eq. 2*can be reexpressed as (3) If we view

*Eq. 3*as a function of θ, given the spike train observations

*N*

_{1:K}, then this probability mass function defines our discrete time point process likelihood function and we denote it as

*L*(θ |

*H*

_{K}) =

*P*(

*N*

_{1:K}| θ). From

*Eq. A6*in the appendix, it can be seen that

*Eq. 3*is a discrete time approximation to the joint probability density of a continuous time point process.

To develop a computationally tractable and efficient approach to estimating θ we note that for any subinterval (*t*_{k−1}, *t*_{k}], the conditional intensity function is approximately constant so that, by *Eq. 3*, *P*(Δ*N*_{k})=*exp*{*log* [λ(*t*_{k} | θ, *H*_{k})]Δ*N*_{k}−λ(*t*_{k} | θ, *H*_{k})Δ} is given by the Poisson probability mass function. Because Δ is small, this is equivalent to the Bernoulli probability *P*(Δ*N*_{k})=[λ(*t*_{k} | θ, *H*_{k})Δ]^{ΔNk} [1−λ(*t*_{k} | θ, *H*_{k})Δ]^{1−ΔNk} in *Eq. 2*. If we now express the logarithm of the conditional intensity function as a linear combination of general functions of the covariates (4) where *g*_{i} is a general function of a covariate *v*_{i}(*t*_{k}) at different time lags τ, and *q* is the dimension of the estimated parameter θ, then *Eq. 3* has the same form as the likelihood function for a GLM under a Poisson probability model and a log link function (see appendix, *Eqs. A7*–*A8*). Thus, maximum likelihood estimation of model parameters and likelihood analyses can be carried out using the Poisson–GLM framework. Alternatively, if we extend the results in Brillinger (1988), we obtain (5) then *Eq. 2* has the same form as the likelihood function for a GLM under a Bernoulli probability distribution and a logistic link function (*Eqs. A9* and *A10*). Thus, maximum likelihood estimation of model parameters and likelihood analyses can also be carried out using the Bernoulli–GLM framework (see also Kass and Ventura 2001). In other words, for Δ sufficiently small (i.e., at most one spike per time subinterval), likelihood analyses performed with either the Bernoulli or the Poisson model are equivalent. However, because we are interested in modeling the conditional intensity function directly, instead of the probability of events in our discrete time likelihoods, we used the Poisson–GLM framework in our analyses.

Therefore, we can take advantage of the computational efficiency and robustness of the GLM framework together with all of the analysis tools from the point process theory: goodness-of-fit based on the time rescaling theorem, residual analysis, model selection, and stochastic decoding based on point process observations. We refer to this combined framework as the *point process–GLM framework*. This framework covers a very large class of models because *Eq. 4* allows for general functions of the covariates and of interaction terms consisting of combinations of the covariates. An application of GLM analysis to spike train data, without the support of the derived relations between the point process and GLM likelihood functions, would remain purely heuristic in nature.

Finally, *Eqs. 2* and *3* are generally applicable discrete time approximations for the point process likelihood function. Thus, when a parametric model of the conditional intensity function cannot be expressed in terms of either *Eq. 4* or *Eq. 5*, the GLM framework may be replaced with standard algorithms for computing maximum likelihood estimates (Pawitan 2001).

### Models for the conditional intensity function

We formulate specific models for the conditional intensity function that incorporate the effects of spiking history, ensemble, and extrinsic covariates. For the exposition in the remainder of this section, we extend our notation to include the neural ensemble activity. Consider an observation time interval *t* ∈ (0, *T*] with corresponding sequences of *J ^{c}* spike times 0 <

*u*

_{1}

^{c}< … <

*u*< … <

_{j}^{c}*u*≤

_{Jc}^{c}*T*, for

*c*= 1, …,

*C*recorded neurons. Let

*N*

_{1:K}

^{1:C}=∪

_{c=1}

^{C}

*N*

_{1:K}

^{c}denote the sample path for the entire ensemble.

##### CONDITIONAL INTENSITY MODELS IN THE POINT PROCESS–GLM FRAMEWORK.

The general form for the conditional intensity function we use to model a single cell's spiking activity is (6) where θ = {θ_{X}, θ_{E}, θ_{I}}, λ_{I}(*t*_{k} | *N*_{1:k}, θ_{I}) is the component of the intensity function conditioned on the spiking history *N*_{1:}_{k} of the neuron whose intensity is being modeled, λ_{E}(*t*_{k} | *N*_{1:K}^{1:C}, θ_{E}) is the component related to the ensemble history contribution, and λ_{X}(*t*_{k} |**x**_{k}_{+τ}, θ_{X}) is the component related to an extrinsic covariate **x**_{k}_{+τ}, where τ is an integer time shift. Note that the term *H*_{k}, used in the previous section, is now replaced by more specific information according to the model.

We consider the following specific models for each of these 3 covariate types. We begin with a model incorporating the spiking history component.

The spiking history component is modeled as (7) where *Q* is the order of the autoregressive process, γ_{n} represents the autoregressive coefficients, and γ_{0} relates to a background level of activity. This model is henceforth referred to as the autoregressive spiking history model. We apply Akaike's standard information criterion (AIC, see *Eq. 16* below) to estimate the parameter *Q*. We expect this autoregressive spiking history model to capture mostly refractory effects, recovery periods, and oscillatory properties of the neuron.

The contributions from the ensemble are expressed in terms of a regression model of order *R* (8) where the first summation is over the ensemble of cells with the exception of the cell whose conditional intensity function is being modeled. Thus the above model contains *R* × (*C* − 1) parameters plus one additional parameter for the background level. Note that the coefficients in the ensemble model capture spike effects at 1-ms time resolution and in this way they may reflect lagged-synchrony between spikes of the modeled cell and other cells in the ensemble. Alternatively, to investigate ensemble effects at lower time precision, we consider the ensemble rates model (9) where the term *N*_{k−(r−1)W}^{c} − *N*_{k−rW}^{c} is the spike count in a time window of length *W* covering the time interval (*t*_{k−rW}, *t*_{k−(r−1)W}]. The coefficients in this model may reflect spike covariances on slow time scales.

In our application to MI data, the extrinsic covariate **x**_{k+τ} will specify the hand velocity. To model this component we employ a variation of the Moran and Schwartz (1999) model, henceforth referred to as the velocity model (10) where |*V*_{k+τ}| and φ_{k+τ} are, respectively, the magnitude and angle of the 2-D hand velocity vector in polar coordinates at time *t*_{k+τ}. In this model **x**_{k+τ}=[|*V*_{k+τ}|, φ_{k+τ}]′. For illustration purposes, we have considered only a single, fixed-time shift τ in the above model. Based on previous results (Paninski et al. 2004) we set τ = 150 ms. A much more generic model form including linear or nonlinear functions of covariates at many different time lags could be easily formulated.

The most complex conditional intensity function models we investigate are the autoregressive spiking history plus velocity and ensemble activity, and the autoregressive spiking history plus velocity and ensemble rates models. For the former, the full conditional intensity function model is given by (11) where μ relates to the background activity.

It should be noticed that although these models are in the “generalized linear” model class, the relation between the conditional intensity function and spiking history, ensemble, and extrinsic covariates can be highly nonlinear. These models are linear only after the transformation of the natural parameter (here the conditional intensity function) by the log link function and only with respect to the model parameters being estimated. As seen in *Eq. 4*, general functions (e.g., quadratic, cubic, etc.) of the actual measured covariates can be used.

##### NON-GLM CONDITIONAL INTENSITY FUNCTION MODEL.

To illustrate the generality of the proposed point process framework, we construct and analyze a non-GLM conditional intensity function model that also incorporates effects of spiking history, neural ensemble, and extrinsic covariates. Additionally, this example demonstrates a procedure for obtaining a conditional intensity function by first modeling the interspike interval (ISI) conditional probability density function. The conditional intensity is obtained from the ISI probability density model using the relation (Brown et al. 2003) (12) where *t*_{e}=*t*_{k}−*u*_{Nk−1} is the elapsed time since the most recent spike of the modeled cell before time *t*_{k} and *p*(*t*_{e} | θ, *H*_{k}) is the ISI probability density, specified here by the inhomogeneous inverse Gaussian (Barbieri et al. 2001). This probability density is given in *Eq. A11* in the appendix. This density is specified by a time-varying scaling parameter *s*(*t*_{k} | · ) that, in our application to MI spiking data, captures the velocity and ensemble rates covariate effects (13) and a location parameter ψ. The set of parameters defining the inhomogeneous inverse Gaussian density and therefore the conditional intensity function is denoted θ = {θ_{X}, θ_{E}, ψ}. This model (*Eqs. 12*, *13*, and *A11*) is henceforth referred to as the inhomogeneous inverse Gaussian plus velocity and ensemble rates model. The history dependence in this model extends back to the time of the previous, most recent spike.

### Maximum likelihood parameter estimation

Maximum likelihood parameter estimates for the models in the point process–GLM framework were efficiently computed using the iterative reweighted least squares (IRLS) algorithm. This method is the standard choice for the maximum likelihood estimation of GLMs because of its computational simplicity, efficiency, and robustness. IRLS applies the Newton–Raphson method to the reweighted least squares problem (McCullagh and Nelder 1989). Given the conditional intensity model in *Eq. 4*, the log-likelihood function is strictly concave. Therefore, if the maximum log-likelihood exists, it is unique (Santner and Duffy 1989). Confidence intervals and *p*-values were obtained following standard computations based on the observed Fisher information matrix (Pawitan 2001). Statistically nonsignificant parameters (e.g. *P* ≥ 0.001) were set to zero for all of the models. In the non-GLM case, the inhomogeneous inverse Gaussian model was fit by direct maximization of the likelihood function using a quasi-Newton method (IMSL, C function *min_uncon_multivar*, from Visual Numerics, 2001). For the data sets used here, the most intensive computations involved operations on large matrices of size about 10^{6} × 200. Algorithms were coded in C and run on dual-processor 3.9-GHz IBM machines, 2 GB of RAM memory. Standard GLM estimation using IRLS is also available in several statistical packages (S-Plus, SPSS, and Matlab Statistics toolbox).

### Goodness-of-fit, point process residual analyses and model comparison

##### KOLMOGOROV–SMIRNOV (K-S) TEST ON TIME RESCALED ISIS.

Before making an inference from a statistical model, it is crucial to assess the extent to which the model describes the data. Measuring quantitatively the agreement between a proposed model and a spike train data series is a more challenging problem than for models of continuous-valued processes. Standard distance measures applied in continuous data analyses, such as average sum of squared errors, are not designed for point process data. One alternative solution to this problem is to apply the time-rescaling theorem (Brown et al. 2002; Ogata 1988; Papangelou 1972) to transform point processes like spike trains into continuous measures appropriate for goodness-of-fit assessment. Once a conditional intensity function model has been fit to a spike train data series, we can compute rescaled times *z*_{j} from the estimated conditional intensity and from the spike times as (14) for *j* = 1, …, *J* − 1, where θ̂ is the maximum likelihood estimate of the parameters. The *z*_{j} values will be independent uniformly distributed random variables on the interval [0, 1) if and only if the conditional intensity function model corresponds to the true conditional intensity of the process. Because the transformation in *Eq. 14* is one to one, any statistical assessment that measures the agreement between the *z*_{j} values and a uniform distribution directly evaluates how well the original model agrees with the spike train data. To construct the K-S test, we order the *z*_{j} values from smallest to largest, denoting the ordered values as *z*_{(j)}, and then plot the values of the cumulative distribution function of the uniform density function defined as *b*_{j}=(*j*−1/2)/*J* for *j* = 1, …, *J* against the *z*_{(j)}. We term these plots *K-S plots*. If the model is correct, then the points should lie on a 45° line. Confidence bounds for the degree of agreement between a model and the data may be constructed using the distribution of the Kolmogorov–Smirnov statistic (Johnson and Kotz 1970). For moderate to large sample sizes the 95% confidence bounds are well approximated by *b*_{j}±1.36·*J*^{−1/2} (Johnson and Kotz 1970).

To assess how well a model performs in terms of the original ISIs (*ISI*_{j}=*u*_{j}−*u*_{j−1}), we relate the ISIs to the computed *z*_{j} values in the following manner. First, the empirical probability density of the *z*_{j} values is computed, and the ratio of the empirical to the expected (uniform) density is calculated for each bin in the density. Second, the ISI values in the data are rounded to integer milliseconds and collected into bins. For these ISIs, all the corresponding *z*_{j} values as well as the ratios of empirical to expected densities in the related bins are obtained. This correspondence between ISIs and *z*_{j} values is easily available from *Eq. 14*. Third, we compute the mean ratio *R* (i.e., the mean of all the ratios for this particular ISI value). A mean ratio *R* > 1 (*R* < 1) implies that there are more (less) rescaled ISIs of length *z*_{j} than expected and that the intensity is being underestimated (overestimated), on average, at this particular ISI value.

If the model is correct, the *z*_{j} values should be not only uniformly distributed, but also independent. Thus, even when the K-S statistic is small, we still need to show that the rescaled times are independent. Here we assess independence up to 2nd-order temporal correlations by computing the autocorrelation function of the transformed rescaled times. As a visualization aid, we plot *z*_{j+1} against *z*_{j}.

##### POINT PROCESS RESIDUAL ANALYSIS.

A standard approach in goodness-of-fit analysis is to examine structure in the data that is not described by the model. For continuous valued data, this is done by analyzing the residual error (i.e., the difference between the true and predicted values). For point process data, a different definition of residuals is needed to relate the conditional intensity function to the observed spike train data. The point process residual (Andersen et al. 1992) over nonoverlapping moving time windows is defined as (15) for *k* − *B* ≥ 1. In the application to MI data, we will look for relations between the point process residual and motor covariates (e.g., speed or direction) by computing their cross-correlation functions. Existence of correlations would imply that there is some structure left in the residuals that is not captured by the conditional intensity function model.

##### MODEL SELECTION.

An additional tool for comparing models comes from the statistical theory of model selection (Burnham and Anderson 2002). The idea consists of choosing the best models to approximate an underlying process generating the observed data, a process whose complexity can be potentially infinite dimensional. To achieve this goal, we adopt Akaike's standard information criterion (AIC) (Akaike 1973). This criterion also provides a way to rank different candidate models. The AIC was originally derived as an estimate of the expected relative Kullback–Leibler distance (Cover and Thomas 1991) between a distribution given by an approximating model and the distribution of the true underlying process generating the data. This criterion is formulated as (16) where *L*(θ̂ | *H*_{K}) is the likelihood function, *L*(θ̂ | *H*_{K}) = *P*(*N*_{1:K} | θ̂, *H*_{K}); θ̂ is the maximum likelihood estimate of the model parameters θ̂; and *q* is the total number of parameters in the model. By this criterion, the best model is the one with the smallest AIC, implying that the approximate distance between this model and the “true process” generating the data is smallest. The AIC is frequently interpreted as a measure of the trade-off between how well the model fits the data and the number of parameters required to achieve that fit, or of the desired trade-off between bias and variance (Burnham and Anderson 2002). An equivalence between AIC and cross-validation for the purpose of model selection has been established (Stone 1977). AIC can be applied to both nested and nonnested models, and to models with different distributions in their stochastic component. We compute AIC values for different models to guide our model comparison. Specifically, we provide the difference between the AIC of all of the models with respect to the AIC of the best model. We also use the AIC to estimate the order of the autoregressive spiking history component in *Eq. 7*.

#### Neural decoding analysis by state estimation with point process observations

Beyond assessing the goodness-of-fit of a single cell model with respect to its individual spike train data, we also analyze the ability of the model, over the entire cell population, to decode an *m*-dimensional extrinsic covariate **x**_{k+τ}. Such decoding will use the spike times of the entire ensemble of cells and the corresponding conditional intensity function for each of these cells. We thus perform a state estimation of **x**_{k} based on point process observations and thereby assess the ensemble coding properties of the cell population. The estimated extrinsic covariate will be given by the posterior mode after a Gaussian approximation to the Bayes–Chapman–Kolmogorov system (Eden et al. 2004).

For the particular type of hand kinematics data described above, we model **x**_{k} as a Gaussian autoregressive process of order 1, henceforth *AR*(1), given by (17) where μ_{x} is an *m*-dimensional vector of mean parameters, **F** is an *m* × *m* state matrix, and ε_{k} is the noise term given by a zero mean *m*-dimensional white Gaussian vector with *m* × *m* covariance matrix **W**_{ε}. The matrices **F** and **W**_{ε} are fitted by maximum likelihood.

The point process observation equation is expressed in terms of the modeled conditional intensity functions λ* ^{c}*(

*t*

_{k}| · ) for each of the

*C*cells entering the decoding. As an example, for intensity functions conditioned on a motor covariate

**x**

_{k+τ}and intrinsic spiking history

*N*

_{1:k}

^{C}, we have the following recursive point process filter.One step prediction (18) One-step prediction covariance (19) Posterior covariance (20) Posterior mode (21) The term ∇(∇

^{2}) denotes the

*m*-dimensional vector (

*m*×

*m*matrix) of first (second) partial derivatives with respect to

**x**

_{k+τ}, and

**W**

_{k+τ|k+τ}is the posterior covariance matrix of

**x**

_{k+τ}. Similarly, decoding equations based on other models of the conditional intensity function can be obtained. The derivation of the recursive point process filter is based on the well-established (Mendel 1995; Kitagawa and Gersh 1996) relation between the posterior probability density and the Chapman–Kolmogorov (one-step prediction) probability density, and on a Gaussian approximation of the posterior density (for details see Eden et al. 2004). The Gaussian approximation results from a 2nd-order Taylor expansion of the density and it is a standard first approach for approximating probability densities (Tanner 1996; Pawitan 2001). Nonetheless, the spiking activity enters into the computations in a very non-Gaussian way through the point process model instantiated by the conditional intensity function.

The amount of uncertainty in the algorithm about the true state of the decoded parameter is related to the matrix **W**_{k+τ|k+τ}. Confidence regions and coverage probability for the decoding can thus be obtained as follows. At time *k*Δ*t* an approximate 0.95 confidence region for the true covariate **x**_{k+τ} may be constructed as (22) for *k* = 1, 2, …, *K*, where χ_{0.95}^{2}(*m*) gives the 0.95 quantile of the χ^{2} distribution with degrees of freedom equal to the dimension *m* of the covariate. The coverage probability up to time *t*_{k} is given by *s _{k}/k* where

*s*

_{k}is the number of times the true covariate is within the confidence regions during the time interval (0,

*k*Δ]. In the decoding analysis we compute the mean of the coverage probability over the entire decoding period. A Monte Carlo simulation is employed to obtain the confidence intervals and coverage probability for the covariate in polar coordinates. We first use the estimated posterior covariance matrix to generate 10

^{4}Gaussian-distributed samples centered at the current covariate estimates in Cartesian coordinates. Second, these random samples are converted to polar coordinates. Finally, the confidence intervals are then computed from the distribution of the random samples in polar coordinates.

## RESULTS

The proposed point process framework is illustrated with 2 examples. The first one is applied to simulated neural spike data and the second to multiple single units simultaneously recorded from monkey primary motor cortex. For the discrete time representation of the neural point process we set Δ = 1 ms.

### Simulation study

The goal of the simulation study is 2-fold. First, we illustrate the main properties of the model in *Eq. 11* containing the autoregressive history, neural ensemble history, and motor covariate effects. Second, we demonstrate that the parameters of the simulated model are accurately recovered from relatively small spike data sets by maximum likelihood estimation implemented with the IRLS algorithm.

The conditional intensity functions of 6 neurons (A, B, C, D, E, F) were simulated using methods as described in Ogata (1981). The intensity of 5 of them (B–F) was given by the velocity model (*Eq. 10*); that is, the neurons were modeled as inhomogeneous Poisson processes with mean background spiking rates of 17, 16, 9, 8, and 7 Hz, respectively, and inhomogeneity introduced by the modulating hand velocity signal. Different velocity tuning functions were used for the set of cells. The hand velocity signal was sampled from actual hand trajectories performed by a monkey (see *Application to MI spiking data*, below). The conditional intensity function for neuron A was given by the autoregressive spiking history plus ensemble and velocity model (*Eq. 11*). The background mean firing rate of this neuron was set to 10 Hz. The autoregressive spiking history component contained 120 coefficients covering 120 ms of spiking history (see Fig. 2*B*). The autoregressive coefficients mimicked the effects of refractory–recovery periods and rebound excitation. From the ensemble of 5 neurons, only 2 contributed excitatory (neuron B) and inhibitory (neuron C) effects at 3 time lags (−1, −2, and −3 ms).

The simulation scheme worked as follows. Starting with the initial simulation time step, first the conditional intensity functions were updated and then, at the same time step, the spiking activities for all of the cells were simulated. The simulation then moved to the next time step. The conditional intensity functions were updated based on the past intrinsic and ensemble spiking history (neuron A only) and on the current hand velocity state (all neurons).

The main features of the conditional intensity function model in *Eq. 11* can be observed in Fig. 1, where the simulated conditional intensity function of neuron A and its own spiking activity are plotted together with the activity of the other 5 neurons and the contribution of velocity signal. The simulated conditional intensity function clearly shows the dependence on spike history: after a spike, the intensity drops to almost zero and slowly recovers, reaching a period of higher than background spiking probability at about 20 ms after the spike. Fast excitatory and inhibitory effects follow the spikes of neurons B and C. Spiking history, neural ensemble, and velocity modulate each other's contributions in a multiplicative fashion.

From the simulated ensemble spike trains and from the velocity signal, we then estimated the conditional intensity function generating the spiking activity of neuron A. The data set entering the estimation algorithm thus consisted of 6 simulated spike trains, each 200 s long, and of the hand velocity time series in polar coordinates. The spike train for the modeled neuron A contained 2,867 spikes. The parametric model for the estimated conditional intensity function consisted of a background mean, 120 autoregressive coefficients, and 5 regressive coefficients for each of the other 5 neurons (i.e., 25 coefficients in total). Each set of 5 coefficients related to spiking activity at lags −1, −2, …, −5 ms. The IRLS algorithm converged in 12 iterations (tolerance was set to 10^{−6}). Statistically nonsignificant parameters (*P* > 0.001) were set to zero (see methods section). The true model parameters used in the simulation of neuron A were accurately recovered (Fig. 2, *B* and *C*), with the estimated model passing the K-S goodness-of-fit test (Fig. 2*D*). Parameter estimation on smaller data sets (about 50 s of data) led to similar successful fittings.

### Application to MI spiking data

Experimental data were obtained from the MI area of a behaving monkey. Details of the basic recording hardware and protocols are available elsewhere (Donoghue et al. 1998; Maynard et al. 1999). After task training, a Bionic Technologies LLC (BTL, Salt Lake City, UT) 100-electrode silicon array was implanted in the area of MI corresponding to the arm representation. One monkey (*M. mulatta*) was operantly conditioned to track a smoothly and randomly moving visual target. The monkey viewed a computer monitor and gripped a 2-link, low-friction manipulandum that constrained hand movement to a horizontal plane. The hand (*x, y*) position signal was digitized and resampled to 1 kHz. Low-pass–filtered finite differences of position data were used to obtain hand velocities. Some 130 trials (8–9 s each) were recorded. More details about the statistical properties of the distributions for hand position and velocity, spiking sorting methods, and other task details can be found in Paninski et al. (2004).

Models including 1, 2, or all of the 3 types of covariates were analyzed. To start, we focus on the analysis of the velocity and the autoregressive spiking history plus velocity models. Later, we also compare these 2 models using neural decoding based on the observation of the entire ensemble of cells. For this reason, we analyzed these 2 models for each of the 20 cells in the ensemble. More detailed analysis involving K-S plots, point process residuals, and AIC model comparison will be illustrated for one typical cell.

##### K-S GOODNESS-OF-FIT ANALYSIS FOR THE VELOCITY AND THE AUTOREGRESSIVE SPIKING HISTORY PLUS VELOCITY MODELS.

The tuning functions obtained from the velocity model (*Eq. 10*) are shown in Fig. 3. This model was statistically significant for all of the cells. Preferred direction was diverse across cells, covering the range of possible directions. The corresponding K-S plots are shown in Fig. 4. The quantiles refer to the *z*_{(j)}s (*Eq. 14*) and the cumulative distribution function (CDF) refers to the expected uniform distribution for the case when the estimated conditional intensity model was equivalent to the true one. The velocity model tends to overestimate (underestimate) the conditional intensity at lower (middle) quantiles. Introduction of the autoregressive spiking history component (*Eq. 7*) in the velocity model greatly improved the explanation of the spiking process, almost completely eliminating both the over- and underestimation of the intensity. The maximum order of the autoregressive component was about 120 (i.e., the component incorporated history effects spanning over 120 ms in the past). The most significant history effects extended to 60 ms in the past. For the majority of the cells, this component seemed to capture mostly 3 main history effects: refractory and recovery periods followed by an increase in the firing probability around 20 ms after a spike (see Fig. 5*B*). It should be noticed that the autoregressive coefficients could have also reflected dynamical network properties of nonmeasured neural ensembles such as networks of excitatory and inhibitory neurons where the modeled cell is embedded, or nonmeasured fast extrinsic covariates. No significant differences in the K-S plots were observed between a pure autoregressive history model and the autoregressive history plus velocity models (not shown).

Figure 5, *C* and *D* summarize the above observations for a typical cell (cell 75a, 29,971 spikes over 130 trials, i.e., ∼1,040 s) in this example set and relate the fitting problems of the velocity model to the original nontime rescaled ISIs. In the velocity model, the intensity is overestimated (mean ratio *R* < 1) for ISIs below 10 ms and underestimated (mean ratio *R* > 1) for ISIs in the interval 10 to about 40 ms (Fig. 5*D*). The overestimation is likely a reflection of a refractory–recovery period (up to ∼10 ms) after the cell has spiked, which is not captured by the velocity model. The underestimation reflects a period of increased firing probability that follows the recovery period. These 2 different regimes are reasonably well captured by the coefficients of the autoregressive component (see Fig. 5*B*), thus resulting in the improved fit observed for the autoregressive spiking history plus velocity model. Introduction of the autoregressive component makes the observed density for the *z*_{j} values much closer to the expected uniform density (Fig. 5*C*).

The K-S statistic measures how close rescaled times are to being uniformly distributed on [0, 1). In addition, a good model should also generate independent and identically distributed rescaled times. To illustrate this point, we checked for temporal correlations at lag 1 in the time-rescaled ISIs (Fig. 6). As expected, some temporal structure remains in the case of the velocity model (*r*^{2} = 0.25, *P* = 10^{−6}), whereas this structure is effectively insignificant for the velocity plus autoregressive spiking history (*r*^{2} = 0.002, *P* = 10^{−6}). The cross-correlation function computed over a broad range of lags was consistent with this result.

##### POINT PROCESS RESIDUAL ANALYSIS.

Even though the parameters for the velocity model were statistically significant, the K-S plot analysis showed that the velocity model fell short of explaining the entire statistical structure in the observed single-cell spiking activity. It thus remains to be seen how well this model captured the relationship between hand velocity and spiking activity. Besides neural decoding, another approach to address this problem is to measure the correlations among the point process residuals as defined in *Eq. 15* and the movement velocity. Existence of correlations would imply that there is some structure left in the residuals that is not captured by the velocity model. On the other hand, a decrease in the correlation level with respect to some other model would imply that the velocity model does capture some of the structure in the spiking activity related to hand velocity.

We computed the correlations for the residuals from the autoregressive spiking history model (*Eq. 7*) and compared them to the correlations for the residuals from the velocity and from the autoregressive spiking history plus velocity model. Residuals were computed for nonoverlapping 200-ms moving windows (Fig. 7). Cross-correlation functions were computed between the residuals and the mean of the kinematic variables. Mean (*x, y*) velocities were computed for each time window and were used to obtain, in polar coordinates, the respective mean movement speed and direction. In the autoregressive model case, peak cross-correlation values between the residuals and direction, speed, and velocities in *x* and *y* coordinates were 0.29, 0.10, −0.17, and 0.50, respectively. For the autoregressive spiking history and velocity model, the peak cross-correlation values for the same variables were 0.08, 0.06, −0.12, and 0.28. This suggests that, for this particular neuron, the velocity model captures a significant amount of information about hand velocity available in the spiking activity. Nonetheless, it is also clear that there is a residual structure in the spiking activity that is statistically related to the hand velocity in Cartesian coordinates and that is not captured by the autoregressive spiking history plus velocity model. Furthermore, the cross-correlation functions for both the velocity and the autoregressive spiking history plus velocity model show no significant differences, which suggests that the autoregressive component does not carry additional statistical information about hand velocity.

##### MODEL COMPARISON.

We compared partial and complete (i.e., 1, 2, or 3 covariate types) conditional intensity function models for cell 75a. The ensemble model included spiking activity of each cell at 4 time lags (−1, −2, −3, and −4 ms). The ensemble rates included the spike counts of each cell at 3 nonoverlapping and lagged time windows. The length of each of the time windows, specified by the parameter *W* in *Eq. 9*, was 50 ms. The K-S plots in Fig. 8 reveal that the autoregressive spiking history plus velocity and the autoregressive spiking history plus velocity and ensemble rates models provided the best fits among all of the models for this specific cell. The inhomogeneous inverse Gaussian plus velocity and ensemble rates model performed better than the velocity, ensemble, and ensemble rates models. Inspection of the coefficients for the ensemble and ensemble rates models showed that the dependencies were statistically significant for many of the cells in the ensemble. Individual cells contributed either positive or negative effects to the conditional intensity function and the effective ensemble contribution to the modulation of the conditional intensity function could reach tens of hertz.

In the above K-S plot comparisons, some of the models had K-S statistics far from the 95% confidence intervals or nearly identical to those from other models, making a clear comparison difficult. The AIC analysis was then used to provide a more detailed comparison, as well as to take the complexity of the model (i.e., number of parameters) into consideration in the model comparison. Figure 9 shows the ranked models in terms of their difference with respect to the AIC of the best model. In this context, models with lower AIC difference values are considered better models.

Overall, this criterion provided a fine model ranking and suggested that models containing the autoregressive spiking history component performed better in each instance. Among the alternative models for spiking history, the autoregressive spiking history model performed better than the conditional intensity model based on the inhomogeneous inverse Gaussian ISI distribution model (*Eqs. 12*, *13*, and *A11*), both in the AIC and K-S goodness-of-fit analyses. Also, the ensemble rates model did better than models containing only the velocity covariate or the ensemble covariate at fine temporal precision.

##### VELOCITY AND MOVEMENT DIRECTION DECODING ANALYSIS.

The velocity (*Eq. 10*) and the autoregressive spiking history plus velocity models were used in the neural decoding of hand velocity. Models were fit to a training data set (120 trials, about 8–9 s each) and applied to decoding on a different test data set (10 trials, again about 8–9 s each). The state matrix **F** for the *AR*(1) state process (*Eq. 17*) was estimated to be diagonal with nonzero terms approximately equal to 0.99, and the noise covariance matrix **W**_{ε} to be diagonal with nonzero entries equal to 0.01. Figure 10 shows the resulting decoding of movement direction and, in Cartesian coordinates, the estimated (*x, y*) velocities for a single test trial based on the velocity model. Overall, decoding of movement direction was remarkably good. Decoded (*x, y*) velocities captured mostly slower fluctuations. To compare the decoding performance of the 2 models, we computed the coverage probability and the decoding error. Table 1 gives the mean values (across time and trials) for the coverage probabilities of the bivariate estimate (velocity magnitude and movement direction) and the coverage probabilities of the univariate estimate (velocity magnitude or movement direction). Mean coverage probability for the movement direction estimate was 0.94 for the velocity model. For the same model, coverage probabilities for the bivariate estimate and velocity magnitude were much smaller, consistent with the observation that the estimated velocities in Cartesian coordinates captured mostly slow fluctuations. Mean coverage probability, mean and median decoding errors, and confidence intervals for the decoding errors were not significantly different between the 2 models.

## DISCUSSION

An important problem in neurophysiology is determining the factors that affect a neuron's spiking behavior. To address this question we have presented a point process statistical framework that allowed us to characterize simultaneously the effects of several covariates on the spiking activity of an individual neuron. The 3 types of covariates we considered were the neuron's spiking history, past neural ensemble activity, and extrinsic covariates such as stimuli or behavior. Because defining the conditional intensity function defines a point process model, the explanatory contributions of the covariates were assessed by constructing conditional intensity function models in terms of these covariates and by using a likelihood-based estimation approach to fit these models to data and to assess goodness-of-fit.

Analyses that measure the simultaneous effects of different covariates are crucial because the covariates modulate the neural spiking activity at the same time. Currently used analysis methods do not allow for the modeling of the simultaneous effects of spiking history, neural ensemble, and extrinsic covariates on the spike train treated as a point process. To evaluate the relation between spiking activity and covariates, the spike train data are frequently transformed to a rate function and the relation between the rate function and the covariate is then assessed using regression methods (e.g., Ashe and Georgopoulos 1994; Luczak et al. 2004). The relation between a neuron's spiking activity, spiking history, and concurrent neural activity are usually assessed using autocorrelation and pairwise cross-correlation analyses performed directly on the spike train (e.g., Hatsopoulos et al. 1998). The use of different methods to assess individually the importance of these covariates precludes an analysis of the neural point process in which the relative importance of all covariates is assessed and may also lead to a misleading estimate of the covariate effects. For example, spiking history effects can interfere with the accurate estimation of extrinsic covariate effects in spike triggered averages and reverse correlation methods (Aguera y Arcas and Fairhall 2003; Aguera y Arcas et al. 2003). Additionally, current analysis techniques also assess the contribution of ensemble covariates separately. For instance, pairwise cross-correlation analyses measure the statistical association of a single neuron to each member of the ensemble separately but not the association between the single neuron's activity and the entire observed ensemble.

The key to our likelihood approach is representing the conditional intensity function of a single neuron in terms of the covariates. In this way, the covariates are directly related to the probability that the neuron spikes. Although this formulation can be used generally to analyze the relation between covariates and neural spiking activity, it usually requires writing a new algorithm or function to carry out the maximum likelihood estimation with each new model formulation. To make our point process likelihood approach more broadly applicable, we showed that by representing the logarithm of the conditional intensity function as a linear combination of general functions of the covariates, the conditional intensity function model could be efficiently fit to neural spike train data using the GLM framework under a Poisson distribution and a log link function (*Eqs. 3*, *A6*, and *A8*). We also showed that, equivalently, if the point process is represented as a conditionally dependent Bernoulli process (*Eqs. 2* and *A2*) and the probability of the events is modeled by a logistic function, then the point process model could also be fit using the GLM framework under a Bernoulli distribution and a logistic link function (*Eqs. 5* and *A10*). Regarding the time discretization, we note that the 1-ms discretization interval we chose should not be interpreted as the absolute discretization interval for all applications of the point process–GLM framework. How fine the discretization interval should be will depend on the particular problem under study.

Our use of the Poisson distribution to fit point process models does not mean that we are assuming that our original data are Poisson. We are rather exploiting for computational purposes the fact that all point process likelihoods that admit a conditional intensity function, including those that are history dependent, have the same mathematical form given by *Eq. A6*. Our analysis makes explicit the relation between likelihood methods for point processes, conditional Bernoulli processes, and GLM model fitting with Poisson or Bernoulli distributions. This relation provides a justification for using this GLM framework to analyze spike trains as point process data.

The point process–GLM framework for analyzing neural spike train data has several important advantages. First, this framework allows us to formulate complex models to relate the spiking activity to covariates. The fact that *Eq. 4* is written in terms of general functions of the covariates provides the framework with a very large class of possible models. Second, the GLM framework is part of several standard mathematical and statistical packages (e.g., Splus, Matlab, SPSS), so that the approach is readily accessible to experimentalists analyzing their data. Even though likelihood-based methods are highly desirable because of several optimality properties, the biggest impediment to their widespread use is the lack of readily available software in which a flexible class of neural spike train models can be easily applied to data. The point process–GLM framework offers a practical, broadly applicable solution to the computational problem of fitting potentially complex point process models for neural spike trains by maximum likelihood.

Third, the point process–GLM framework makes it possible to apply a set of goodness-of-fit tools for point processes not available in the GLM. These are the point process residuals analysis, goodness-of-fit tests based on the time-rescaling theorem, and decoding from point process observations. Fourth, the model selection and goodness-of-fit methods available in GLM are extended to spike train data. Thus we have a set of complementary methods to assess the extent to which proposed models explain the structure in neural spike trains. Although we used the point process–GLM framework to carry out most of the analysis, we also illustrated with the conditional intensity function based on the inhomogeneous inverse Gaussian model how non-GLM point process likelihood models may be used to analyze neural spike train data. Finally, by analogy with the way in which linear regression methods are used to analyze the relation between a continuous dependent variable and a set of candidate explanatory variables, the ready availability of software to implement GLMs also makes it possible for neurophysiologists to quickly assess the relevance of a wide range of covariates before proceeding to construct more specific models that may require non-GLM algorithms to carry out the model fitting.

A key objective of the proposed framework is to provide tools for the assessment of the relative importance of the covariates on neural spiking activity. We showed how this objective is accomplished by analyzing the goodness-of-fit of each model we proposed. Because no method provides a complete assessment of goodness-of-fit, we used the standard statistical approach of using multiple complementary measures that evaluate different aspects of the model's agreement with the data. We applied 4 complementary types of goodness-of-fit techniques in our example data set: Kolmogorov–Smirnov tests based on the time-rescaling theorem, model selection by AIC, point process residual analysis, and neural spike train decoding based on point process observations.

First, the K-S analyses performed on the example data problems were useful in providing a sense of how close our models were to capturing the stochastic structure in the example data sets. The fact that some of our models captured most of the statistical structure in the spiking activity (Figs. 4 and 8) suggests that developing a parsimonious statistical model of MI activity is a realistic goal. The K-S plots also highlighted the importance of the spiking history. The fact that the coefficients had a structure that seemed to reflect mostly refractory and recovery periods together with rebound excitation in short time scales (effectively shorter than 60 ms) suggests that the autoregressive component successfully captured important history effects.

Second, the AIC analysis provided additional information for model comparison when the K-S plots did not distinguish between different models. By finely ranking the different models, the AIC analysis allowed for the assessment of the distinct effects and relative importance of the 3 types of covariates. Our choice of the AIC for model comparison was motivated by the fact that AIC was derived to be an estimate of the expected relative Kullback–Leibler distance between the distributions generated by the model and the distribution of the underlying stochastic process generating the data. The “true” underlying model does not need to be in the set of tested models for the AIC analysis to suggest the most appropriate model, a situation that we believe is more often the rule than the exception in biology. Because the AIC is penalized with increasing numbers of model parameters, its use is more appropriate than the use of the data likelihood itself in preventing overfitting of the data by the model. Additionally, an equivalence between AIC and cross-validation in model comparison problems has been established (Stone 1977), and model-based penalty methods (e.g., AIC) outperform cross-validation in important aspects when assessing prediction performance (Efron 2004). We also computed the Bayesian information criterion (BIC) (Schwarz 1978), an alternate, more conservative criterion function for each of our models and found that it yielded the same model rankings as did the AIC. When comparing complex models, another protection against overfitting comes from having large quantities of data as compared to the number of model parameters. In our example the >10^{6} data observations far outnumbered the maximum 200 or so parameters in our most complex models. Apparent inconsistencies between AIC and K-S analyses could occur in cases where, for example, the conditional intensity model is more likely to produce the observed spike train as a whole, but is less accurate in describing a specific aspect of the data structure, such as the regime of small ISIs. This might have been the case when comparing the AIC and K-S plots results for the autoregressive spiking history plus velocity and ensemble model and a simpler autoregressive spiking history plus velocity model (Figs. 8 and 9).

Third, we illustrated how the point process residual analysis can be used to assess the contribution of an extrinsic covariate to a single neuron's spiking activity. In the illustration example, the residual analysis showed that the introduction of the velocity covariate captured a significant amount of the statistical structure related to hand velocity available in the spiking activity of a single neuron. Yet, for the particular cell chosen in this study, the analysis was also able to reveal that there still was a significant amount of structure in the residuals that was correlated to hand velocity but that was not captured by this specific form of the velocity model. Cross-correlation analysis of the point process residuals and extrinsic covariates is thus an important tool for assessing whether a particular model has captured well the effects of the covariate on the spiking activity. The ideal model should produce a residual with no significant correlations to the modeled covariate. It should also be noted that, unlike the decoding analysis, the point process residual analysis is not dependent on the properties of a decoding algorithm.

Fourth, complementing the above 3 goodness-of-fit analysis tools, the spike train decoding allowed for the goodness-of-fit assessment at the neural ensemble level. In conjunction with understanding what makes a neuron spike, we are also interested in assessing how well a model captures the representation of an extrinsic covariate at the ensemble level. At present, decoding is the only technique we have for assessing goodness-of-fit at this level. The key elements for assessing goodness-of-fit in the decoding analysis were the predicted signal and its confidence intervals and coverage probability, and especially the estimation error and its confidence intervals. The confidence intervals and the coverage probability based on the estimated posterior covariance matrix provided an estimate of the amount of uncertainty in the decoding algorithm, whereas the decoding error and its distribution provided a measure of the algorithm's actual performance. For this reason, the mean coverage probability should be interpreted in conjunction with the mean decoding error. As suggested by the narrow distribution of the decoding error and approximately 0.95 mean coverage probability (Table 1), hand movement direction was remarkably well decoded. Velocity estimates in Cartesian coordinates captured reasonably well slow fluctuations in the measured hand velocity. We also illustrated how to assess the contribution of the autoregressive spiking history component to neural decoding. The autoregressive spiking history plus velocity and the velocity models performed similarly well. This preliminary result suggests that short time dynamics captured by the autoregressive component did not play a crucial role in decoding hand velocity or movement direction in this data set. Given that the models used in the decoding analysis did not include the ensemble or ensemble rates covariate, the ensemble decoding assumed independent encoder cells. Nonetheless, the framework also allows the assessment of the contribution of interaction patterns in the neural ensemble to decoding. That could be easily achieved by extending the conditional intensity models to include the ensemble covariates. This analysis is beyond the scope of this paper and will be addressed elsewhere.

In summary, the above 4 complementary goodness-of-fit and model selection analyses are an essential step for achieving our primary objective of assessing the effects and relative importance of the modeled covariates. The proposed point process framework provides a starting point for building and analyzing complex models of spiking activity including the 3 types of covariates discussed in this paper. In particular, the point process–GLM framework provides a systematic approach to neural spike train data analysis analogous to that for continuous-valued variables under the linear Gaussian regression framework.

We foresee several potential improvements and extensions of this framework. Although we were able to fit models with hundreds of parameters, much larger models will require the development of efficient algorithms for both GLM and non-GLM computations. Also, to analyze the relation between the simultaneous firing of an entire ensemble relative to its spiking history and a set of extrinsic covariates, we can extend the framework by using multivariate point process likelihoods (Chornoboy et al. 1988). A multivariate likelihood model will facilitate the study of the independence, redundancy, and synergy in the ensemble representation. Finally, multielectrode devices (Csicsvari et al. 2003; Donoghue 2002; Nicolelis et al. 2003) now make possible the simultaneous recordings of multiple single cells from many different brain areas. We foresee the proposed framework as a valuable tool for investigating how interacting brain regions represent, compute, and process information. We are currently applying this framework to the analysis of parietal and MI spiking activity in monkeys performing visuomotor tracking tasks and hippocampus activity in rats performing a range of learning tasks.

## APPENDIX

### Continuous and discrete time point process likelihood function

The likelihood of a neural spike train, like that of any statistical model, is defined by finding the joint probability density of the data. We show below that the joint probability of any point process is easy to derive from the conditional intensity function. We show that the point process likelihood function in *Eqs. 2* and *3* gives a discrete time approximation of the likelihood function for a continuous time point process (*Eq. A6* below).

Let 0 < *u*_{1} < *u*_{2}, …, *u*_{J} < *T* be a set of neural spike train measurements. Using the discrete time representation given in the methods section, define the events (A1) for *k* = 1, …, *K* and where *A _{k}^{c}* is the complement of

*A*

_{k}. For simplicity,

*H*

_{k}includes only the intrinsic history of the process. It can be easily extended to incorporate neural ensemble activity and other extrinsic covariates. By construction of the partition of the interval (0,

*T*], introduced in the methods section, we must have

*u*

_{j}∈ (

*t*

_{kj−1},

*t*

_{kj}],

*j*= 1, …,

*J*, for a subset of the intervals satisfying

*k*

_{1}<

*k*

_{2}< … <

*k*

_{j}. The remaining

*K*−

*J*intervals have no spikes.

The probability of exactly *J* events occurring within the intervals (*t*_{kj−1}, *t*_{kj}]_{j=1}^{J} in (0, *T*], may then be computed as (A2) by the definition of *A*_{k} and *E*_{k} in *Eq. A1*.

The spike train thus forms a sequence of conditionally independent Bernoulli trials, with the probability of a spike in the *k*^{th} time interval given by *P*(*A*_{k}). In any interval (t_{k−1}, *t*_{k}] we have (A3) Substituting *Eq. A3* into *Eq. A2* yields (A4) which is *Eq. 2*. For small Δ, [1 − λ(*t*_{k})Δ] ≈ exp{−λ(*t*_{k})Δ} andlog [λ(*t*_{k})Δ[1 − λ(*t*_{k})Δ]^{−1}] ≈ log (λ(*t*_{k})Δ), therefore we obtain (A5) The probability density of these *J* exact spikes in (0, *T*], given by *p*(*N*_{0:T})= lim_{Δ→0} *P*(*N*_{1:K})/Δ^{J}, is then obtained as (A6) which is the joint probability density of the point process spike train in continuous time (Brown et al. 2003; Daley and Vere-Jones 2003). Note that we could have derived the likelihood for the continuous time point process (and therefore also *Eq. 3*) by a generalization of the continuous time Poisson process (Daley and Vere-Jones 2003), without resorting to representing the neural point process as a conditional Bernoulli process. We formulated the spike train joint probability in terms of *Eq. A2* only to show (see below) the equivalence between Poisson and Bernoulli–GLMs when Δ is sufficiently small.

### The Poisson and Bernoulli–GLMs

We briefly define a generalized linear model and show that for small enough Δ, the Bernoulli and Poisson–GLMs are equivalent in the modeling of spiking train data.

Two main aspects characterize a generalized linear model of a random variable *y* (McCullagh and Nelder 1989). First, the modeled random variable *y* has a distribution in the exponential family. Among several members of this family are the Gaussian, the Poisson, and the Bernoulli distribution. The exponential family has the general form (A7) where *a*( · ), *b*( · ), and *c*( · ) are some specific functions. If φ is known, this is an exponential-family model with canonical parameter **Θ.**

For the particular case of the Poisson distribution, **Θ** = logλ, *b*(**Θ**) = *e*** ^{Θ}**,

*a*(φ) = φ = 1,

*c*(

*y*, φ) = −log (

*y*!). The location and scale parameters are λ and φ, respectively. Thus the distribution in

*Eq. A7*can be expressed as

*f*(

*y*| λ)=exp{

*y*log λ−λ−log (

*y*!)}=λ

^{y}e^{−λ}/

*y*!. Note that the canonical parameter

**Θ**has, in the Poisson case, a natural representation in terms of the logarithm of the parameter λ. The joint probability distribution for an independently distributed data set

**y**= {

*y*

_{k}}

_{k=1}

^{K}becomes (A8) If the rate function λ

_{k}of this Poisson process is generalized by the conditional intensity function (

*Eq. 1*); and

*y*

_{k}= Δ

*N*

_{k}, Δ = 1, then

*Eq. A8*has the same form as the typical general likelihood function for any discrete time point process (

*Eqs. 3*and

*A5*).

For the Bernoulli case, we let *y* = {0, 1} with the probability of success denoted by *P*, and set **Θ** = log ([1 − *P*]^{−1}*P*), *b*(**Θ**) = log(1 + *e*** ^{Θ}**),

*a*(φ) = φ,

*c*( · ) = 1, and φ = 1. Thus, for single realizations we have

*p*(

*y*|

*P*) =

*P*(1 −

^{y}*P*)

^{1−y}. Given an independently distributed data set

**y**= {

*y*

_{k}}

_{k=1}

^{K}, the likelihood function under the Bernoulli distribution becomes (A9) By letting [

*P*

_{k}]

^{yk}=

*P*(

*A*

_{k})

^{ΔNk}we obtain

*Eq. A2*.

Second, the defining feature of a generalized linear model follows. The canonical parameter of the exponential family is expressed in a linear form by a transformation given by a monotonic differentiable function. In the Poisson case, if the canonical parameter is modeled as a linear combination of general functions of covariates **v** of interest (that is, **Θ** = log λ(θ, **v**) = Σ_{i=1}^{q} θ_{i}*g*_{i} (*v*_{i}) as in *Eq. 4* or equivalently as λ(θ, **v**) = exp{Σ_{i=1}^{q} θ_{i}*g*_{i}(*v*_{i})} as in *Eqs. 5*–*10*), then *f*(*y* | **Θ**, φ) ∝ exp{*y* log λ(θ, **v**) − λ(θ, **v**)} gives the distribution for a GLM under a Poisson distribution and a log link function. In the Bernoulli case, if **Θ** = log ([1 − *P*(*A*_{k} | θ, *H*_{k})]^{−1}*P*(*A*_{k} | θ, *H*_{k})) is modeled as linear combination of general functions of the covariates, then *Eqs. 2* and *A2* give the likelihood function for a GLM under a Bernoulli distribution and a *logistic* link function.

Finally, we establish the relation between the Poisson and Bernoulli–GLMs in the context of neural point process models. After making explicit the parametric model of the conditional intensity function, we have the probability of a spike event in the time interval (*t*_{k}_{−1}, *t*_{k}] given by *P*(*A*_{k} | θ, *H*_{k}) = λ(*t*_{k} | θ, *H*_{k})Δ + *o*(Δ). For small Δ (A10) Therefore, for small enough Δ, the Bernoulli and Poisson–GLMs are equivalent when applied to the modeling of spiking train data.

### The IIG model for the ISI probability density

For a particular cell, let *t*_{e}=*t*_{k}−*u*_{Nk−1} denote the time elapsed since the last spike *u*_{Nk−1}. The inhomogeneous inverse Gaussian ISI probability density function conditioned on the motor covariate and neural ensemble activity is defined as (A11) where *G* = {*u*_{Nk−1}, **x**_{k+τ}, *N*_{1:k}^{1:C}, θ}, ψ > 0 is the location parameter and *s*(*t*_{k} | · ) > 0 is the scaling parameter at time *t*_{k}, conditioned on the extrinsic and ensemble rates covariates as given in *Eq. 13*.

## GRANTS

This work was supported in part by Burroughs Wellcome Fund, Keck Foundation, National Institute of Neurological Disorders and Stroke Neural Prosthesis Program, National Institutes of Health Grants NS-25074, MH-59733, MH-61637, MH-65018, and DA-015644, National Science Foundation Grant 0081548, and grants from Defense Advanced Research Projects Agency and Office of Naval Research.

## DISCLOSURE

J. P. Donoghue is a cofounder and shareholder in Cybernetics, Inc., a neurotechnology company that is developing neural prosthetic devices.

## Footnotes

The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “

*advertisement*” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

- Copyright © 2005 by the American Physiological Society