JN Fuel your research with LabChart
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J Neurophysiol 91: 2910-2928, 2004. First published January 28, 2004; doi:10.1152/jn.00227.2003
0022-3077/04 $5.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
91/6/2910    most recent
00227.2003v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (8)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Pouzat, C.
Right arrow Articles by Diebolt, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pouzat, C.
Right arrow Articles by Diebolt, J.

Innovative Methodology

Improved Spike-Sorting By Modeling Firing Statistics and Burst-Dependent Spike Amplitude Attenuation: A Markov Chain Monte Carlo Approach

Christophe Pouzat1, Matthieu Delescluse1, Pascal Viot2 and Jean Diebolt3

1Laboratoire de Physiologie Cérébrale, Centre National de la Recherche Scientifique (CNRS) Unité Mixte de Recherche (UMR) 8118, Université René Descartes, 75006, Paris; 2Laboratoire de Physique Théorique des Liquides, CNRS UMR 7600, Université Pierre et Marie Curie, 75252 Paris Cedex 05; and 3Laboratoire d'Analyse et de Mathématiques Appliquées, CNRS UMR 8050, Université de Marne la Vallée, Batiment Copernic, Cité Descartes, 77454 Marne la Vallée Cedex, France

Submitted 10 March 2003; accepted in final form 14 January 2004


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 ACKNOWLEDGMENTS
 REFERENCES
 
Spike-sorting techniques attempt to classify a series of noisy electrical waveforms according to the identity of the neurons that generated them. Existing techniques perform this classification ignoring several properties of actual neurons that can ultimately improve classification performance. In this study, we propose a more realistic spike train generation model. It incorporates both a description of "nontrivial" (i.e., non-Poisson) neuronal discharge statistics and a description of spike waveform dynamics (e.g., the events amplitude decays for short interspike intervals). We show that this spike train generation model is analogous to a one-dimensional Potts spin-glass model. We can therefore tailor to our particular case the computational methods that have been developed in fields where Potts models are extensively used, including statistical physics and image restoration. These methods are based on the construction of a Markov chain in the space of model parameters and spike train configurations, where a configuration is defined by specifying a neuron of origin for each spike. This Markov chain is built such that its unique stationary density is the posterior density of model parameters and configurations given the observed data. A Monte Carlo simulation of the Markov chain is then used to estimate the posterior density. We illustrate the way to build the transition matrix of the Markov chain with a simple, but realistic, model for data generation. We use simulated data to illustrate the performance of the method and to show that this approach can easily cope with neurons firing doublets of spikes and/or generating spikes with highly dynamic waveforms. The method cannot automatically find the "correct" number of neurons in the data. User input is required for this important problem and we illustrate how this can be done. We finally discuss further developments of the method.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 ACKNOWLEDGMENTS
 REFERENCES
 
The study of neuronal populations activity is one of the main experimental challenges of contemporary neuroscience. Among the techniques that have been developed and used to monitor populations of neurons, the multielectrodes extracellular recordings, albeit one of the oldest, still stands as one of the methods of choice. It is relatively simple and inexpensive to implement (especially when compared to imaging techniques) and it can potentially give access to the activity of many neurons with a fine time resolution (below the ms). The full exploitation of this method nevertheless requires some basic problems to be solved, among which stands prominently the spike-sorting problem. The raw data collected by extracellular recordings are of course corrupted by some recording noise but, more important, are a mixture of activities from different neurons. This means that even in good situations where action potentials or spikes can be unambiguously distinguished from background noise, the experimentalist is left with the intricate problem of finding out how many neurons are in fact contributing to the recorded data and, for each spike, which is the neuron of origin. This in essence is the spike-sorting problem. Extracellular recordings being an old method, the problem has been recognized as such for a long time and there is already a fairly long history of proposed solutions (Lewicki 1998Go). Nevertheless it seems to us that none of the available methods makes full use of the information present in the data. They consider mainly or exclusively the information provided by the waveform of the individual spikes and they neglect that provided by their occurrence time. Yet, the importance of the spikes occurrence times shows up in two ways.

1) First, the sequence of spike occurrence times emitted by a neuron has well-known features like the presence of a refractory period (e.g., after a spike has been emitted no other spike will be emitted by the same neuron for a period whose duration varies from 2 to 10 ms). Other features include an interspike interval (ISI) probability density, which is often unimodal and skewed (it rises "fast" and decays "slowly"). Although some methods make use of the presence of a refractory period (Fee et al. 1996aGo; Harris et al. 2000Go), none makes use of the full ISI density. This amounts to throwing useful information away, for if event 101 has been attributed to neuron 3 we not only know that no other spike can arise from neuron 3 for, say, the next 5 ms, but we also know that the next spike of this neuron is likely to occur with an ISI between 10 and 20 ms. As pointed out by Lewicki (1994Go), any spike-sorting method that would include an estimate of the ISI density of the different neurons should produce better classification performances.

2) Second, the spike waveform generated by a given neuron often depends on the time elapsed since its last spike (Quirk et al. 1999). This nonstationarity of spike waveform will worsen the classification reliability of methods that assume stationary waveform (Lewicki 1994Go; Pouzat et al. 2002Go). Other methods (e.g., Gray et al. 1995Go for a "manual" method and Fee et al. 1996aGo for an automatic one) assume a continuity of the cluster of events originating from a given neuron when represented in a "feature space" [the peak amplitude, valley(s) amplitude(s), half-width are examples of such features] and make use of this property to classify nonstationary spikes. Although it is clear that the latter methods will in general outperform the former, we think that some cases could arise where none of them would work. In particular, doublet-firing cells could generate well-separated clusters in feature space and these clusters would not be put together by presently available methods.

These considerations motivated us to look for a new method that would have a built-in capability to take into account both the ISI density of the neurons and their spike waveform dynamics. However, there are two additional issues with spike-sorting that, we think, are worth considering.

1) "Hard" versus "soft" spike-sorting. Most spike-sorting methods, including the ones where users actually perform the clustering and some automatic methods like the one of Fee et al. (1996aGo), generate "hard" classification. That is, each recorded event is attributed "fully" to one of the K neurons contributing to the data. Methods based on probabilistic models (Lewicki 1994Go; Nguyen et al. 2003Go; Pouzat et al. 2002Go; Sahani 1999) deal with this issue differently: each event is given a probability to have been generated by each of the K neurons (e.g., one gets results like: event 100 was generated by neuron 1 with probability 0.75 and by neuron 2 with probability 0.25). We will refer to this kind of classification as "soft" classification. Users of the latter methods are often not aware of their soft classification aspect because what one does for subsequent processing like estimating the ISI density of a given neuron or computing the cross-correlogram of spike trains from 2 different neurons, is to force the soft classification into the most likely one (in the example above we would force the classification of event 100 into neuron 1). By doing so, however, the analyst introduces a bias in the estimates. It seems therefore interesting to find a way to keep the soft classification aspect of the model-based methods when computing estimates.

2) Confidence intervals on model parameters. Methods based on an explicit model for data generation could in principle generate confidence intervals for their model parameters, although this was never done. However, the values of model parameters do influence the classification and a source of bias could be reduced by including information about uncertainty of model parameters values.

Based on these considerations we have developed a semiautomatic method that takes spike waveform dynamics and ISI densities into account, which produces soft classification and allows the user to make full use of it, and which generates a full posterior density for the model parameters and confidence intervals. Our method uses a probabilistic model for data generation where the label (i.e., neuron of origin) of a given spike at a given time depends on the label of other spikes occurring just before or just after. We will call configuration the specification of a label for each spike in the train. It will become clear that with this model the spike-sorting problem is analogous to an image restoration problem where a picture (i.e., a set of pixel values) that has been generated by a real object and corrupted by some noise is given. The problem is to find out what the actual pixel value was, given the noise properties and the known correlation properties of noise free images. The case of spike sorting is equivalent to a one-dimensional "pixel sequence" where the pixel value is replaced by the label of the spike. More generally it is a special case of a Potts spin-glass model encountered in statistical physics (Newman and Barkema 1999Go; Wu 1982Go). Thanks to this analogy solutions developed to study Potts models (Landau and Binder 2000Go; Newman and Barkema 1999Go) and to solve the image restoration problem (Geman and Geman 1984Go) can be tailored to the spike-sorting problem. These solutions rely on a Markov chain that has the posterior density of model parameters and configurations given the observed data as its unique stationary density. This Markov Chain is stochastically simulated on the computer. This general class of methods has been used for 50 years by physicists (Metropolis et al. 1953Go) who call it Dynamic Monte Carlo and for 20 years by statisticians (Fishman 1996Go; Geman and Geman 1984Go; Liu 2001Go; Robert and Cassela 1999Go) who call it Markov Chain Monte Carlo (MCMC).

Our purpose in this paper is 2-fold: First to demonstrate that a particular MCMC method allows incorporation of more realistic data generation models and by doing so to perform reliable spike-sorting even in difficult cases (e.g., in the presence of neurons generating 2 well-separated clusters in feature space). Second, to explain how and why this method works, thereby allowing its users to judge the quality of the produced classification, to improve the algorithm implementing it, and to adapt it to their specific situations. Our method does not yet provide an automatic estimate of the number of neurons present in the data, although we illustrate how this critical problem can be addressed by the user.


    METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 ACKNOWLEDGMENTS
 REFERENCES
 
Data generation model

In the present manuscript we will make the following assumptions about data generation.

  1. The firing statistics of each neuron are fully described by a time-independent interspike interval density. That is, the sequence of spike times from a given neuron is a realization of a homogeneous renewal point process (Johnson 1996Go).
  2. The spike amplitudes generated by each neuron depend on the elapsed time since the previous spike of this neuron.
  3. The measured spike amplitudes are corrupted by a Gaussian white noise, which sums linearly with the spikes and is statistically independent of them.

These assumptions have been chosen to show as simply as possible the implementation of our method and should not be taken as intrinsic limitations of this method. They constitute, moreover, a fairly good first approximation of real data in our experience. More sophisticated models where the next ISI of a neuron depends on the value of the former one (Johnson et al. 1986Go) could easily be included. In the same vein, models where the amplitude of a spike depends on several of the former ISIs could be considered. The Gaussian white noise assumption means that the analyst has "whitened" the noise before starting the spike-sorting procedure (Pouzat et al. 2002Go).

INTERSPIKE INTERVAL DENSITY. We will use in this paper a log-Normal density for the ISIs. This density looks like a good next guess, after the exponential density, when one tries to fit empirical ISI densities. It is unimodal, exhibits a refractory period, rises "fast," and decays "slowly." The version of the log-Normal density we are using depends on 2 parameters: a dimensionless shape parameter {sigma} and a scale parameter s measured in seconds. The probability density for a realization of a log-Normal random variable I with parameters {sigma} and s to have the value i is

(1)

In what follows we will use a shorter notation

meaning that i is a realization of a log-Normal random variable with parameters {sigma} and s. A discussion of the properties of the log-Normal distribution can be found in the e-Handbook of Statistical Methods.

SPIKE AMPLITUDE DYNAMICS. We will consider events described by their occurrence time and their peak amplitude measured on one or several recording sites. This makes the equations more compact, but full waveforms can be accommodated if necessary. Following Fee et al. (1996bGo) we will describe the dependence of the amplitude on the ISI by an exponential relaxation

(2)
where i is the ISI, {lambda} is the inverse of the relaxation time constant (measured in 1/s), P is the vector of the maximal amplitude of the event on each recording site (i.e., this is the amplitude observed when i >> {lambda}-1), and {delta} is the maximal modulation [i.e., for i << {lambda}-1 we have A(i) {approx} P(1 - {delta})]. It is clear from this equation that the amplitudes of events generated by a given neuron are modulated by the same relative amount on each recording site. This is one of the key features of modulation observed experimentally (Gray et al. 1995Go).

To keep equations as compact as possible we will assume that the amplitudes of the recorded events on each recording site have been normalized by the SD of the noise on the corresponding site. A and P in Eq. 2 are therefore dimensionless and the noise SD equals 1. Combining Eq. 2 with our third model hypothesis (independent Gaussian noise corruption) we get for the density of the amplitude vector a of a given neuron conditioned on the ISI value i

(3)
where ns is the number of recording sites and ||v|| stands for the Euclidean norm of v. We will sometimes express Eq. 3 in a more compact form

(4)
where Norm (m, v) stands for a normal distribution with mean m and whose covariance matrix is diagonal with diagonal values equal to v.

We get the density of the couple (i, a) by combining Eq. 1 with Eq. 3

(5)

NOTATIONS FOR THE COMPLETE MODEL AND THE DATA. We now have, for each neuron in the model, 2 parameters ({sigma} and s) to specify the ISI density and 2 + number of recording sites parameters (e.g., for tetrode recordings: P1, P2, P3, P4, {delta}, {lambda}) to specify the amplitude density conditioned on the ISI value (Eq. 3). That is, in the case of tetrode recording: 8 parameters per neuron. In the sequel we will use the symbol {theta} to refer to the complete set of parameters specifying our model. That is for a model with K neurons applied to tetrode recordings

(6)
We will assume that our data sample is of size N and that each element j of the sample is fully specified by its occurrence time tj and its peak amplitude(s), aj,1, aj,2, aj,3, aj,4 (for tetrode recording). We will use Y to refer to the full data sample, that is

(7)

Data augmentation, configurations, and likelihood function

DATA AUGMENTATION AND CONFIGURATIONS. Until now we have specified our model and our data representation but what we are really interested in is to find the neuron of origin of each individual spike, or more precisely, for a model with K neurons, the probability for each spike to have been generated by each of the K neurons (remember that we are doing soft classification). To do that we will associate with each spike j a latent variable, cj {1, 2,..., K}. When cj has value 2, that means that spike j has been "attributed" to neuron 2 of the model. We will use C to refer to the set of N variables c

(8)
For a data sample of size N and a model with K neurons we have KN different C values. We will call configuration a specific C value. That is, a configuration is a N-dimensional vector whose components take value in {1, 2,..., K} [e.g., (1,..., 1), (K,..., K), are 2 configurations]. In the statistical literature, this procedure of "making the data more informative" by adding a latent variable is called data augmentation (Liu 2001Go; Robert and Casella 1999Go).

LIKELIHOOD FUNCTION. Because C has been introduced it is pretty straightforward to compute the likelihood function of the "augmented" data, given specific values of {theta}: L(Y, C|{theta}). We remind the reader here that the likelihood function is proportional to the probability density of the (augmented) data given the parameters. Figure 1 illustrates how this is done on a simple case with 2 units in the model (K = 2) and a single recording site.



View larger version (11K):
[in this window]
[in a new window]
 
FIG. 1. Likelihood computation. A: snapshot of spikes 11 to 15 of a train. cj values are shown at the top. Spike amplitudes are given (aj) as well as the occurrence times (tj). B: spikes from units 1 are shown alone. C: spikes from units 2 are shown alone.

 
We first use the configuration specified by C to split the sample into K spike trains, one train from each neuron. (This is done in Fig. 1 when one goes from A to B and C.) Then because our model in its present form does not include interactions between neurons we just need to compute K distinct likelihood functions, one for each neuron specific spike train, and multiply these K terms to obtain the full likelihood

(9)
where Yl is the subsample of Y for which the components cj of C are equal to l. The L(Yl, C|{theta}) themselves are product of terms like Eq. 5 In the case illustrated on Fig. 1 we will have for the first train

(10)
where t1,former stands for the time of the former spike attributed to neuron 1 in configuration C and t1,next stands for the next spike attributed to neuron 1 in configuration C. If there are N1 spikes attributed to neuron 1 then, using periodic boundary conditions (see Parameter-specific transition kernels), there will be N1 terms in L(Y1, C|{theta}). For the second neuron of Fig. 1 we obtain

(11)
The reader can see that for any configuration Eq. 9 involves the computation of N terms.

Posterior and prior densities, a combinatorial explosion

POSTERIOR DENSITY. Now that we know how to compute the likelihood function we can write the expression of the posterior density of model parameters and configuration given the data using Bayes formula

(12)
where {pi}prior({theta}) is the prior density of the parameters and

(13)
where Z is the probability of the data. It is clear that if we manage to compute or estimate {pi}post({theta}, C|Y) we will have done our job, for then the soft classification is given by the marginal density of C

(14)
The reader will note that this posterior marginal density includes all the information available about {theta}, that is, the uncertainty on the parameters is automatically taken into account.

PRIOR DENSITY. We will assume here that several thousand spikes have been collected and that we perform the analysis by taking chunks of data, say the first 5,000 spikes, then spikes 5,001 to 1,000 and so on. For the first chunk we will assume we know "little" a priori and that the joint prior density {pi}prior({theta}) can be written as a product of the densities for each component of {theta}, that is

where we are assuming that 4 recording sites have been used. We will further assume that our signal to noise ratio is not better 20 (a rather optimistic value), that our spikes are positive, and therefore the {pi}(Pq,1· · · 4) are null below 0 and above +20 (remember we are working with normalized amplitudes). We will reflect our absence of prior knowledge about the amplitudes by taking a uniform distribution between 0 and +20. The {lambda} value reported by Fee et al. (1996bGo) is 45.5 s-1. {lambda} must, moreover, be smaller than {infty}, so we took a prior density uniform between 10 and 200 s-1. {delta} must be ≤1 (the amplitude of a spike from a given neuron on a given recording site does not change sign) and ≥0 (spikes do not become larger upon short ISI), so we used a uniform density between 0.1 and 0.9 for {delta}. An inspection of the effect of the shape parameter {sigma} on the ISI density is enough to convince an experienced neurophysiologist that empirical unimodal ISI densities from well-isolated neurons will have {sigma} [0.1, 2]. We therefore took a prior density uniform between 0.1 and 2 for {sigma}. The same considerations led us to take a uniform prior density between 0.005 and 0.5 for s.

When one analyzes the following data chunks, say spikes 5,001 to 1,000, the best strategy is to take as a prior density for {theta} the posterior density from the previous chunk

One has therefore a direct way to track parameters drifts during an experiment.

A COMBINATORIAL EXPLOSION. We have so far avoided considering the normalizing constant Z of Eq. 12, whose expression is given by Eq. 13. A close examination of this equation shows that it involves a multidimensional integration in a rather highly dimensional space (e.g., for a model with 10 neurons and data from tetrode recordings, we have 80 components in {theta}) and a summation over every possible configuration, that is a sum of KN terms. That is really a lot, to say the least, in any realistic case (say, N = 1,000 and K = 10)! One can wonder how we will manage to compute Z and therefore implement our approach. Fortunately, we do not really need to compute it, as will now be explained.

The Markov Chain Monte Carlo method

AN ANALOGY WITH THE POTTS MODEL IN STATISTICAL PHYSICS. Let us define

(15)
that is, e({theta}, C|Y) is an unnormalized version of {pi}post({theta}, C|Y), the posterior density, and

(16)
Then Eq. 12 can be rewritten as

(17)
with {beta} = (1/kT) = 1. If one interprets E as an energy, {beta} as an "inverse temperature," Z as a partition function, one sees that the posterior density, {pi}post({theta}, C|Y), is the canonical density (or canonical distribution) used in statistical physics (Landau and Binder 2000Go; Newman and Barkema 1999Go). A closer examination of Eqs. 10 and 11 shows that, from a statistical physics viewpoint, the (log of the) likelihood function describes "interactions" between spikes attributed to the same neuron. If one goes a step further and replaces "spikes" by "atoms on crystal nodes" and "neuron of origin" by "spin orientation," ones sees that our problem is analogous to what physicists call a one-dimensional Potts model. The typical Potts model is a system of interacting spins where each spin interacts only with its nearest neighbors and only if they have the same spin value (Wu 1982Go). In our case, spins (that is, spikes) interact with the former and next spins with the same value regardless of the distance (time interval) at which they are located. Moreover the "interaction" strength is a random variable that makes our system analogous to a spin glass (Binder and Young 1986Go; Landau and Binder 2000Go; Newman and Barkema 1999Go) and the amplitude contribution (Eq. 3) gives rise to an energy term analogous to an interaction between a spin and a random magnetic field (Newman and Barkema 1999Go). To be complete we should say that physicists consider that {theta} is given, then Eq. 17 gives the probability to find the "crystal" in a specific configuration C.

The point of this analogy will now become clear. Physicists are interested in estimating expected values (which are what they can experimentally measure) of functions that can be easily computed for each configuration. For instance, they want to compute the expected energy of the "crystal"

(18)
To do that they obviously have to deal with the combinatorial explosion we alluded to. The idea of their solution is to draw configurations: C(1), C(2),..., C(m), from the "target" distribution: {pi}post({theta}, C|Y). Then an estimate of is the empirical average

(19)
In other words, they use a Monte Carlo (MC) method (Landau and Binder 2000Go; Newman and Barkema 1999Go). The problem becomes therefore to generate the draws [C(1), C(2),..., C(m)].

A SOLUTION BASED ON A MARKOV CHAIN. For reasons discussed in detail by Newman and Barkema (1999Go) and Liu (2001Go) the draws cannot be generated in practice by "direct" methods like rejection sampling (Fishman 1996Go; Liu 2001Go; Robert and Casella 1999Go). By "direct" we mean here that the C(t) are independent and identically distributed draws from {pi}post({theta}, C|Y) for a given {theta} value. Instead we must have recourse to an artificial dynamics generating a sequence of correlated draws. This sequence of draws is in practice the realization of a Markov chain. That means that in the general case, where we are interested in getting both {theta} and C, we will use a transition kernel: T[{theta}(t), C(t)|{theta}(t-1), C(t-1)], and simulate the chain starting from [{theta}(0), C(0)], chosen such that {pi}post[{theta}(0), C(0)|Y] > 0. Moreover, we will build T such that the chain converges to a unique stationary distribution given by {pi}post({theta}, C|Y). The estimator Eq. 19 of the expected value Eq. 18 is then still correct even though the successive [{theta}(t), C(t)] are correlated; we just need to be cautious when we compute the variance of the estimator (Janke 2002Go; Sokal 1989; Empirical averages and SDs). By still correct we formally mean that the following equality holds

(20)
We give in The Metropolis–Hastings algorithm of the APPENDIX a general presentation and justification of the procedure used to build the transition kernel T: the Metropolis–Hastings (MH) algorithm. We then continue in Parameter-specific transition kernels with an account of the specific kernel we used for the analysis presented in this paper.

Empirical averages and SDs

As explained at the beginning of Slow relaxation and REM, once our Markov chain has reached equilibrium, or at least once its behavior is compatible with the equilibrium regime, we can estimate values of parameters of interest as well as errors on these estimates. The estimator of the probability for a given spike, say the 100th, to originate from a given neuron, say the second, is straightforward to obtain. If we assume that we discard the first ND = 15,000 on a total of NT = 20,000 iterations we have

(21)
where Iq is the indicator function defined by

(22)
In a similar way, the estimate of the expected value of the maximal peak amplitude of the first neuron on the first recording site is

To obtain the error on such estimators we have to keep in mind that the successive states [{theta}(t), C(t)] generated by the algorithm are correlated. Thus, we cannot use the empirical variance divided by the number of generated states to estimate the error (Janke 2002Go; Sokal 1989). As explained in detail by Janke (2002Go) we have to compute for each parameter {theta}i of the model the normalized autocorrelation function (ACF), {rho}norm(l; {theta}i), defined by

(23)
Then we compute the integrated autocorrelation time, {tau}autoco({theta}i)

(24)
where L is the lag at which {rho} starts oscillating around 0. Using an empirical variance, {sigma}2({theta}i) of parameter {theta}i, defined in the usual way

(25)
Our estimate of the variance, Var of becomes

(26)

The first consequence of the autocorrelation of the states of the chain is therefore to reduce the effective sample size by a factor of 2{tau}autoco({theta}i). This gives us a first quantitative element on which different algorithms can be compared (as explained in The Metropolis–Hastings algorithm of the APPENDIX, the MH algorithm does in fact give us a lot of freedom on the choice of proposal transition kernels). It is clear that the faster the autocorrelation functions of the parameters fall to zero, the greater the statistical efficiency of the algorithm. The other quantitative element we want to consider is the computational time, {tau}cpu, required to perform one MC step of the algorithm. One could for instance imagine that a new sophisticated proposal transition kernel allows us to reduce the largest {tau}autoco of our standard algorithm by a factor of 10, but at the expense of an increase of {tau}cpu by a factor of 100. Globally the new algorithm would be 10 times less efficient than the original one. What we want to keep as small as possible is therefore the product {tau}autoco{tau}cpu. With this efficiency criterion in mind, the replica exchange method described in Slow relaxation and REM becomes even more attractive.

Initial guess

MCMC methods are iterative methods that must be started from an initial guess. We chose randomly with a uniform probability 1/N as many actual events as neurons in the model (K). That gave us our initial guesses for the Pq,i. {delta} was set to {delta}min for each neuron. All the other parameters were randomly drawn from their prior distribution. The initial configuration was generated by labeling each individual spike with one of the K possible labels with a probability 1/K for each label (this is the {beta} = 0 initial condition used in statistical physics).

Data simulation

We have tried to make the simulated tetrode data set used in this paper a challenging one. First the noise corruption was generated from a Student's t density with 4 degrees of freedom instead of a Gaussian density as assumed by our model (Eq. 3). Second, among the 6 simulated neurons only 3 had an actual log-Normal ISI density. Among the 3 others, one had a gamma density

(27)
where s is a scale parameter and {sigma} is a shape parameter. One neuron was a doublet-generating neuron and the third one had a truncated log-Normal ISI density. The truncation of the latter came from the fact that 18% of the events it generated were below the "detection threshold," which would have been at 3 noise SD. The doublet-generating neuron was obtained with 2 log-Normal densities, one giving rise to short ISIs (10 ms) the other one to "long" ones (60 ms); the neuron was moreover switching from the short, respectively the long, ISI generating state to the long, respectively the short, ISI generating state after each spike, giving rise to a strongly bimodal ISI density (Fig. 9C). The mean ISI of these 6 neurons ranged from 10 to 210 ms (Table 1). A "noise" neuron was moreover added to mimic the collective effect of many neurons located far away from the recording sites, which would therefore not be detected most of the time. This "noise" neuron was obtained by generating first a sequence of time points exponentially distributed. A random amplitude given by the background noise density (t density with 4 degrees of freedom; see above) was associated to each point and the point was kept in the "data" sample only if its amplitude exceeded the detection threshold (3 noise SD) on at least one of the 4 recording sites. The initial exponential density for this "noise" neuron was set such that a preset fraction of events in the final data sample would arise from it (in that case 15%, or 758 events on a total of 5,058 events). Details on the pseudorandom number generators required to simulate the data can be found in Random number generators for the simulated data of the APPENDIX.



View larger version (40K):
[in this window]
[in a new window]
 
FIG. 9. Comparison between the exact ISI histograms and their estimates. Exact histograms are displayed in gray, their estimates from the first period with crosses and from the second period with circles. SD for the estimated values is always smaller than the size of the symbols.

 


View this table:
[in this window]
[in a new window]
 
TABLE 1. Parameters used to simulate the neurons

 
Implementation details

Codes were written in C and are freely available under the Gnu Public License at our web site: http://www.biomedicale.univ-paris5.fr/physcerv/Spike-0-Matic.html. The free software Scilab (http://www-rocq.inria.fr/scilab/) was used to generate output plots as well as the graphical user interface, which comes with our release of the routines. The GNU Scientific Library (GSL: http://sources.redhat.com/gs1/) was used for vector and matrix manipulation routines and (pseudo)random number generators (Uniform, Normal, log-Normal, Gamma). The GSL implementation of the MT19937 generator of Matsumoto and Nishimura (1998Go) was used. This generator has a period of 219,937 - 1. Because the code requires a lot of exponential and logarithm evaluations, the exponential and logarithm functions were tabulated and stored in memory, which reduced the computation time by 30%. Codes were compiled and run on a PC laptop computer (Pentium IV with CPU speed of 1.6 GHz, 256 MB RAM) running Linux. The gcc compiler (http://www.gnu.org/software/gcc/gcc.htm) version 3.2 was used.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 ACKNOWLEDGMENTS
 REFERENCES
 
Data properties

The performances of our algorithm will be illustrated in this paper with a simulated data set. The data are supposed to come from tetrode recordings. Six neurons are present in the data together with a "noise" neuron supposed to mimic events rarely detected from neurons far away from the tetrode's recording sites. The data set has been made challenging for the algorithm by producing systematic deviations with respect to our model assumptions (Data simulation). This data set is supposed to come from 15 s of recording during which 5,058 events were detected. Wilson plots as they would appear to the user are shown on Fig. 2A. Figure 2B shows the same data with colors corresponding to the neuron of origin. The parameters used to simulate each individual neuron are given in Table 1. The reader can easily see that, whereas one cluster has very few points (purple cluster on Fig. 2B), others have a lot (brown, green, and blue clusters). Indeed, the smallest cluster contains 73 events, whereas the biggest contains 1,474 events. Some clusters are very elongated (e.g., green and red clusters) and one neuron, the doublet-generating neuron, even gives rise to 2 well-separated clusters (red). The "noise" neuron cluster (black) is much more spread out than the others. Two cluster pairs always exhibit an overlap (the yellow-black and the green-red pairs).



View larger version (26K):
[in this window]
[in a new window]
 
FIG. 2. Wilson plots showing about 25% of the sample (1,215 events). On each plot the scale bars meet at amplitude (0, 0) and are of size 2 (in units of noise SD). A: data as the analyst would see them before starting the spike-sorting procedure. B: same data sorted with the known neuron of origin encoded by the color: neuron 1 (blue), neuron 2 (green), neuron 3 (red), neuron 4 (brown), neuron 5 (purple), neuron 6 (yellow), noise events (black).

 
The deviation between the recording noise properties assumed by our model and the simulated noise is illustrated in Fig. 3. Figure 3A1 shows the peak amplitude versus the isi for the second (green) neuron on the second recording site together with the theoretical relation between these 2 parameters. The reader can therefore get an idea of what Eq. 2 means. Figure 3A2 shows the corresponding residuals (actual value - theoretical one). By looking at the distribution of these points the reader can see that the cluster shape of this neuron, in the 4-dimensional space whose projections are shown on the Wilson plots of Fig. 2, will be significantly skewed. Algorithms assuming a multivariate Gaussian shape of the clusters would therefore not perform well on these data. Figure 3B1 shows the histogram of these residuals together with the Student's t density that was used to generate them (Data simulation & Random number generators for the simulated data) and the Gaussian density assumed by our model. The simulated recording noise generates at the same time more events very close to the ideal event and more events very far from it than a Gaussian noise would do.



View larger version (22K):
[in this window]
[in a new window]
 
FIG. 3. Noise properties. A1: amplitudes of the events generated by neuron 2 (green on Fig. 2B) on the second recording site are plotted against the corresponding interspike interval (ISI). Ideal amplitude relaxation curve (Eq. 2, with P2 = 15, {delta} = 0.8, {lambda} = 100) is shown in gray. Abscissa scale in s. A2, residuals: actual amplitude - ideal amplitude. B1: histogram of the residual values (broken line), ideal density from which these residuals have been generated (gray, t density with an SD = 1 and 4 degrees of freedom) and the density assumed by our model (black, Gaussian density with an SD = 1). B2: t (gray) and Gaussian (black) densities on a log scale showing the heavier tails of the t density.

 
Early exploration and model choice

We want to illustrate here some basic features of our algorithm dynamics as well as how model comparison can be done. In its present form, our approach requires the user to decide what is the proper model (i.e., the proper number of neurons). This decision is based on the examination of "empirical" features associated with the different models. To obtain these empirical features, we have to perform the same computation on the different models, although the computation time increases with the number of events considered. It is therefore a good idea to start with a reduced sample that contains enough events for the model comparison to be done and little enough for the computation to run quickly. In this section we will work with the first 3 s of data (1,017 events), which represent approximately 20% of the total sample (5,058 events). We will focus here on a model with 7 neurons, but a similar study was performed with models containing from 4 to 10 neurons.

EVIDENCE FOR META-STABLE STATES. To explore a model with a given number of neurons we start by simulating 10 different and independent realizations of a Markov chain. The way random initial guesses for these different realizations are obtained is explained in Initial guess. For each realization, one Monte Carlo (MC) step consists in an update of the label of each of the 1,017 spikes (SPIKE LABEL TRANSITION MATRIX), an update of the amplitude parameters (P, {delta}, {lambda}) (AMPLITUDE PARAMETERS TRANSITION KERNELS) and of the 2 parameters of the ISI density, s and {sigma} (SCALE PARAMETER TRANSITION KERNEL & SHAPE PARAMETER TRANSITION KERNEL) for each neuron. As explained in AMPLITUDE PARAMETERS TRANSITION KERNELS, the amplitude parameters are generated with a piecewise linear approximation of the corresponding posterior conditional. This piecewise linear approximation requires a number of discrete sampling points to be specified. Because when we start our model exploration, we know little about the location of the corresponding posterior densities, we used during these first 2,000 MC steps, 100 sampling points regularly spaced on the corresponding parameter domain defined by the prior (PRIOR DENSITY). Figure 4A illustrates the energy evolution of the 10 different trials (realizations). The reader not familiar with our notion of energy (Eq. 16) can think of it as an indication of the quality of fit. The lower it is, the better the fit. Indeed, if we were working with a template-matching paradigm, the sum of the squared errors would be proportional to our energy. The striking feature on Fig. 4A is the presence of "meta-stable" states (a state is defined by a configuration and a value for each model parameter). The trials can spend many steps at an almost constant energy level before making discrete downward steps (like the one falling from 5,800 to 4,300 before the 500th step). If we look at the parameters values and configurations generated by the different trials (not shown), the ones that end above 5,000 "miss" the purple cluster on Fig. 2B. The time required to perform 2,000 MC steps with 7 neurons in the model was roughly 9 min, meaning that 1.5 h was required to obtain the data of Fig. 4A. This time (for 2,000 MC steps) grew from 8' for a model with 4 neurons to 10' for a model with 10 neurons.



View larger version (37K):
[in this window]
[in a new window]
 
FIG. 4. A: energy evolution during 2,000 steps with 10 different initial random guesses. Best trial is shown in gray. B: follow-up of the energy evolution of the best trial without (gray) and with (black) the replica exchange method (REM). Circles: mean energy computed from the last 5,000 Monte Carlo (MC) steps of 5 runs without REM. Triangles: mean energy computed from the last 1,000 MC steps of 5 runs with REM.

 
THE REPLICA EXCHANGE METHOD SPEEDS UP CONVERGENCE IN THE PRESENCE OF META-STABLE STATES. The presence of meta-stable states is indeed a severe problem of our Markov chain dynamics, as further illustrated on the gray trace of Fig. 4B. Here the Markov chain was restarted from the state it had at the end of the 2,000 steps of the best among the 10 initial trials (gray trace on Fig. 4A). The last 1,000 steps of this trial were moreover used to locate 13 posterior conditional sampling points for each amplitude parameter of each neuron (AMPLITUDE PARAMETERS TRANSITION KERNELS). Then 5 different runs with 32,000 steps were performed. The best one is shown on the gray trace of Fig. 4B and the final mean energy of each of the 5 is shown on the right end of the graph (circles). What the reader sees here is a (very) slow convergence of a Markov chain to its stationary density. It is problematic because, if we stop the chain too early, our results could reflect more the properties of our initial guess than the properties of the target density we want to sample from. To circumvent this slow relaxation problem we have implemented a procedure commonly used in spin-glass (Hukushima and Nemoto 1996Go) and biopolymer structure (Hansmann 1997Go; Mitsutake et al. 2001Go) simulations: the replica exchange method (REM). The details of the methods are given in Slow relaxation and REM of the APPENDIX. The REM consists in simulations of "noninteracting" replicas of the same system (model) at different "inverse temperatures" ({beta}) combined with exchanges between replicas states. These inverse temperatures are purely artificial and are defined by analogy with systems studied in statistical physics. The idea is that the system with a small {beta} value will experience little difficulty in crossing the energy barriers separating local minima (meta-stable states) and will therefore not suffer from the slow relaxation problem. The replica exchange dynamics will allow "cold" replicas (replicas with a larger {beta} value) to take profit of the "fast" state space exploration performed by the "hot" replica (small {beta}). The efficiency of the method is demonstrated by the black trace on Fig. 4B. Here we again took the last state of the best trial of Fig. 4A to restart the Markov chain. We used 8 different {beta} values: 1, 0.8, 0.6, 0.5, 0.45, 0.4, 0.35, and 0.3. Five different runs with 4,000 steps were performed and the energy trace at {beta} = 1 is shown in its integrity; this was the best of the 5. The final mean energy of the 5 trials is shown on the right side of the figure (triangles). The Markov chains simulated with the REM relax clearly faster than the ones simulated with the single replica method. The number of MC steps has been chosen so that the same computational time (32') was required for both runs, the long one (32,000 steps) with a single replica at {beta} = 1 (gray trace) and the short one (4,000 steps) with 8 replicas. The reader can remark that for these trials the time required to perform 2,000 MC steps for a single replica is 2', whereas it was previously much larger (9'). This is attributed to the reduction in the number of sampling points used for the posterior conditional, 13 instead of 100 (AMPLITUDE PARAMETERS TRANSITION KERNELS). A significant amount of the computational time is therefore spent generating new amplitude parameters values.

MODEL CHOICE. Figure 5 illustrates what we meant by "empirical features associated with different models" at the beginning of this section. We have now performed the same series of runs with 7 different models having between 4 and 10 neurons. One feature we can look at in the output of these runs is the most likely classification produced (Empirical averages and SDs, Eq. 21). By "most likely" we mean that the label of each spike has been forced to its most likely value. Figure 5A shows one of the Wilson plots with different neuron numbers, where the events have been colored according to their most likely label. The good behavior of the algorithm appears clearly. When asked to account for the data with 6 neurons, it mainly lumps together the "small" neuron (yellow on Fig. 2B, neuron 6 in Table 1) with the "noise" neuron (black on Fig. 2B, neuron 7 in Table 1). With 7 neurons, the actual number of neurons, it splits the previously lumped clusters into 2 (although it attributes too many spikes to the small neuron and too few to the noise one; see Long runs with full data set). With 8 neurons, it further splits the noise cluster. In fact it creates a "noise" cluster (black on the figure) whose peak amplitude on the fourth recording site is roughly 3 times the recording noise SD and another one (clear blue) whose peak amplitude on this same site is smaller than 3. We remind the reader that our detection threshold is here set at 3 noise SD. The algorithm does not split the doublet-firing neuron (red) into 2 neurons, although this neuron gives rise to two well-separated clusters. With 9 neurons, it splits even further the previous clear blue noise "neuron" into 2 neurons, one with a peak amplitude roughly equal to the detection threshold on the first recording site (pink cluster) and one whose peak amplitudes on sites 1 and 4 are smaller than 3. With 10 neurons (not shown) we end up with the actual noise neuron, resulting in 4 neurons with an average peak amplitude slightly above the detection threshold on one of the recording sites and very close to zero on the 3 others.



View larger version (19K):
[in this window]
[in a new window]
 
FIG. 5. Model choice. A: most likely configurations generated by 4 of the 7 models studied are displayed on a Wilson plot where the amplitude on site 4 is plotted against the amplitude on site 1. Scale bars as in Fig. 2. B: maximal peak amplitude plots showing that neuron identification from model to model is easy. Each semiaxis corresponds to the maximal peak amplitude (estimated from the algorithm output) for each of the 6 "important neurons." Vertical axis in the upper half-plan corresponds to the maximal peak amplitude on site 1. Horizontal axis in the right half-plan corresponds to the maximal peak amplitude on site 2. Vertical axis in the lower half-plan corresponds to the amplitude on site 3 and the horizontal axis in the left half-plan corresponds to the amplitude on site 4. For each semiaxis the amplitude value goes from 0 at the origin to 20 at the tip. Each neuron in each model corresponds to a frame. To make the figure more readable (by avoiding a too strong overlap of the frames) a random Gaussian noise (µ = 0, {sigma} = 0.1) has been added to the maximal peak amplitude values of each neuron.

 
Another feature we can look at is the average values of the model parameters. Figure 5B shows the maximal peak amplitudes for 6 of the neurons and for the 4 different models considered in Fig. 5A. These maximal peak amplitudes are constrained by our priors to take values between 0 and 20. Each of the 4 semiaxis represents therefore the maximal peak amplitude on each of the 4 recording sites (see legend) and each frame represents one neuron in one model. For instance the blue frames correspond to neuron 1 in Table 1. Its maximal peak amplitudes on sites 1, 2, 3, and 4 are 15, 10, 5, and 0, respectively. The frames corresponding to this neuron in the different models considered overlap perfectly. The same holds for each neuron, except the yellow and the brown ones. The frame change for the "yellow" neuron is normal and is attributed to the fact that when we consider a model with a total of 6 neurons, the small neuron and the noise neuron get lumped together (Fig. 5A). It is only when we consider models with 7 neurons or more that the small neuron becomes (roughly) properly identified. That explains the variability in the yellow frames on Fig. 5B. The case of the brown neuron is a bit more subtle. This neuron fires very fast with ISIs between 8 and 15 ms (Fig. 9D) and does not generate ISIs large enough to properly explore its potential amplitude dynamic range. Stated differently, the amplitude parameters of this neuron are not well defined by the data (see as well Fig. 7). A more relevant feature for this neuron is the "typical" peak amplitudes produced. By that we mean that if we have estimates of the scale s and shape {sigma} parameters of this neuron (10 ms and 0.2), we can compute the most likely ISI: s exp (-{sigma}2) = 9.6 ms. Then, the most likely peak amplitude is given by Eq. 2. It turns out that all the models considered gave the same and correct values for the 2 ISI density parameters (within error bars, not shown). The models with 6, 7, and 9 neurons gave for the amplitude parameters (P1, {delta}, {lambda}) of the brown neuron: 7.8, 0.45, 114 (the amplitudes on the different recording sites were the same within error bars and the parameters were the same, within error bars, across models, not shown), whereas the model with 8 neurons gave 9.9, 0.45, and 39. Then the most likely amplitude for the models with 6, 7, or 9 neurons was 6.7, whereas it was 6.8 for the model with 8 neurons. That explains the larger frame of the brown neuron on Fig. 5B. In practice it turns out that the observation of the parameters alone is sufficient to find which neuron in a given model corresponds to which neuron in another model. Based on the considerations exposed in these last 2 paragraphs it seems reasonable to keep working with a model with 7 neurons, acknowledging the fact that the model cannot account very well for the noise neuron. In the sequel we will therefore proceed with the full data set (5,058) spikes and consider only the model with 7 neurons.



View larger version (30K):
[in this window]
[in a new window]
 
FIG. 7. Marginal posterior density estimate for each parameter of the model computed from the last 2,000 MC steps performed.

 
Long runs with full data set

The analysis of the full data set (5,058 spikes or 15 s of data) was done in 2 stages. During the first 4,000 MC steps we fixed the model parameters at their mean values computed from the last 1,000 MC steps of the REM run with the reduced data set (Fig. 4B, black trace) and we updated only the configuration. The initial configuration was, as usual, randomly set. We used the last 1,000 MC steps of this first stage to take "measurements" and compare the algorithm's output with the actual values of different data statistics (Figs. 8 and 9). The "complete" algorithm was used during the 21,000 MC steps of the second stage. By complete algorithm we mean that at each MC step a new configuration was drawn as well as new values for the model parameters. The 13 sampling points of the posterior conditionals for the amplitude parameters between steps 4,001 and 6,000 were set using the last 1,000 MC steps of the REM run with the reduced data set. The last 2,000 MC steps of the second stage were used to take measurements. We use here a very long run to illustrate the behavior of our algorithm. Such long, and time-consuming, runs are not required for a daily use of our method. The time required to run the first stage (4,000 MC steps) was 3 h, whereas 26 h were required for the second (21,000 MC steps).



View larger version (17K):
[in this window]
[in a new window]
 
FIG. 8. Comparison between the actual configuration and the most likely configuration generated by the algorithm. Two measurements periods are considered, the first one between steps 3,001 and 4,000 (Fig. 6A) represented here in black and the second period (from step 23001 to step 25000) represented in gray. A: number of false negative for each neuron. B: number of false positive for each neuron. C: fraction of wrongly classified spikes for each neuron.

 
THE REM REQUIRES MORE INVERSE TEMPERATURES WHEN THE SAMPLE SIZE INCREASES. Figure 6A shows the evolution of the energies at the 15 inverse temperatures used. The energy overlap at adjacent temperatures is clear and is confirmed by the energy histograms of Fig. 6B. One of the shortcomings of the REM is that it requires more {beta} to be used (for a given range) as the number of spikes in the data set increases because the width of the energy histograms (Fig. 6B) is inversely proportional to the square root of the number N, of events in the sample (Hansmann 1997Go; Hukushima and Nemoto 1996Go; Iba 2001Go). The necessary number of {beta} grows therefore as (if we had used the same 8 inverse temperatures as during our model exploration with a reduced data set, we would have only every second histogram). The computation time per replica grows, moreover, linearly with N. We therefore end up with a computation time of the REM growing like N1.5. This justifies keeping the sample size small during model exploration. Figure 6C shows that the autocorrelation function of the energy at each temperature falls to zero rapidly (within 10 MC steps) which is an indication of the good performance of the algorithm from a statistical view point (Empirical averages and SDs). A pictorial way to think of the REM is to imagine that several "boxes" at different preset inverse temperatures are used and that there is one and only one replica per box. After each MC step, the algorithm proposes to exchange the replicas located in neighboring boxes (neighboring in the sense of their inverse temperatures) and this proposition can be accepted or rejected (Eq. A28). Then if the exchange dynamics works properly one should see each replica visit all the boxes during the MC run. More precisely, each replica should perform a random walk on the available boxes. Figure 6D shows the random walk performed by the replica which starts at {beta} = 1. The ordinate corresponds to the box index (see legend). Between steps 1 and 25,000, the replica travels several times through the entire inverse temperature range.



View larger version (48K):
[in this window]
[in a new window]
 
FIG. 6. A: energy evolution during a long run with a full data set (5,058 events) and a model with 7 neurons. Fifteen inverse temperatures used were: 1, 0.9, 0.8, 0.7, 0.6, 0.55, 0.5, 0.475, 0.45, 0.425, 0.4, 0.375, 0.35, 0.325, and 0.3. During the first 4,000 steps the algorithm updated only the configuration, not the parameters that were fixed at their average values obtained during the runs with the reduced data set. Black horizontal bar indicates the first measuring period (between steps 3,001 and 4,000), the gray one, the second measuring period (between steps 23,001 and 25,000). B: energy histograms obtained from the second measuring period. Left histogram corresponds to {beta} = 1, the right one to {beta} = 0.3. C: energy autocorrelation functions from the second measuring period. D: path of the first replica.

 
MODEL PARAMETERS EVOLUTION AND POSTERIOR ESTIMATES. We have until now mainly shown parameters, like the energy of the first replica path, which are likely to be unfamiliar to our reader but our algorithm does generate as well outputs that should be more directly meaningful. We can for instance look at the evolution of the 8 parameters associated with each given neuron. The fluctuations of the parameters values around their means gives us the uncertainty of the corresponding estimates. As explained in Empirical averages and SDs, the accuracy of our estimates for a given number of MC steps will be better if the successive values of the corresponding parameters exhibit short autocorrelation times. We can as well use the algorithm output to build the posterior marginal density of each model parameter as shown on Fig. 7. It can be seen that most posterior densities are fairly close to a Gaussian except for 3 neurons whose amplitude parameters are not well defined by the data: neurons 4, 5, and 7. The case of neuron 4, the brown neuron on Fig. 2B, has already been discussed in Early exploration and model choice. The one of neuron 5 is a bit analogous, except that instead of not generating long enough ISIs it does not generate short enough ones (Fig. 9E) to fully explore the amplitude dynamic range. Neuron 7 is the noise neuron and suffers basically from the same problem as the 2 others in addition to the more fundamental one that it is not well described by our model.

A more compact way to present the information of Fig. 7 is to summarize the distributions that are approximately Gaussian by their means and autocorrelation corrected SDs. The other distributions can be summarized by a 95% confidence interval, the left boundary being such that 2.5% of the generated parameters values were below it and the right boundary such that 2.5% of the generated values were above it, as shown in Table 2 (compare with the actual values of Table 1).


View this table:
[in this window]
[in a new window]
 
TABLE 2. Values of estimated parameters from the second measurement period

 
SPIKE CLASSIFICATION PERFORMANCES. We can now fully exploit the fact that we are working with simulated data and compare the classification produced by our algorithm with the actual one. To do that easily we first forced the soft classification generated by the algorithm, which gives for each spike the posterior probability to originate from each of the neurons in the model (Eq. 21) int