JN AJP: Renal Physiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J Neurophysiol 87: 1659-1663, 2002;
0022-3077/02 $5.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (4)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Sanger, T. D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sanger, T. D.

The Journal of Neurophysiology Vol. 87 No. 3 March 2002, pp. 1659-1663
Copyright ©2002 by the American Physiological Society

RAPID COMMUNICATION

Decoding Neural Spike Trains: Calculating the Probability That a Spike Train and an External Signal Are Related

Terence D. Sanger

Department of Child Neurology, Stanford University Medical Center, Stanford, California 94305


    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

Sanger, Terence D.. Decoding Neural Spike Trains: Calculating the Probability That a Spike Train and an External Signal Are Related. J. Neurophysiol. 87: 1659-1663, 2002. Experimental and clinical applications of extracellular recordings of spiking cell activity frequently are used to relate the activity of a cell to externally measurable signals such as surface potentials, sensory stimuli, or movement measurements. When the external signal is time-varying, correlation methods have traditionally been used to quantify the degree of relation with the neural firing. However, in some circumstances correlation methods can give misleading results. A new algorithm is described that estimates the extent to which a spike train is related to a continuous time-varying signal. The technique calculates the probability of generating a spike train with Poisson statistics if the time-varying signal determines the Poisson rate. This is accomplished by successive division of the signal and the spike train into halves and recursive calculation of the probability of each half-signal. The performance of the new algorithm is compared with the performance of correlation methods on simulated data.


    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

When making extracellular recordings of spike trains for electrophysiology experiments, it is usually not possible to state precisely what is coded by the neuron or to what degree the neuron encodes any particular experimental variable. As a result, it is often more fruitful to compare several variables to different neural signals to determine which is most closely related to each neuron's activity. For example, extracellular recording in deep structures of the brain may be compared with the surface potentials recorded at many locations on the scalp to determine the topography of cortical projections. In such a case, it becomes important to determine which signals from scalp electrodes are most closely related to the signal from the deep electrode. Other applications commonly occur when using surface electromyographic (EMG) or quantitative movement kinematics measurements to determine the motor effects of a neuron, or when using complex visual or somatosensory stimuli to discover the pattern or modality sensitivity of a neuron.

Most current techniques to address such questions are based on computing the correlation between a smooth time-varying signal and a numerical representation of the neural spike train (Fu et al. 1995; Glaser and Ruckin 1976; Perkel et al. 1967a,b; Werner et al. 1997). The spike train may be represented by the number of spikes in a sequence of short time bins (the binned rate) or by the reciprocal of the interval between successive spikes (the instantaneous rate). In some cases, the numerical spike sequences are smoothed with a low-pass filter (Miller and Houk 1995; Miller et al. 1993). For short bin widths, the use of a low-pass filter applied to the binned rate is equivalent to assigning a smooth "bump" to the location of each spike (the kernel method). Correlation methods are the basis for attempts at real-time "decoding" of neural spike trains (Bialek et al. 1991), and when performed on sub-bands of the frequency decomposition of the signal lead to coherence analyses (Rosenberg et al. 1989). However, since the original spike train is a sequence of discrete events that occur at specific times, the correlation operation may not be appropriate (Moore et al. 1966; Shao and Chen 1987; Shao and Tsau 1996). By assigning the data to arbitrary bins or by low-pass filtering, the numerical representation may lose information in the signal.

A more important issue depends on the mathematical foundation of the correlation operation. Correlation normalizes the power in the two signals by dividing by the square root of the sum of the squared values at each time point (often called root-mean-square or rms normalization). Normalization compensates for arbitrary changes in the gain or amplitude of one signal compared with another. Normalization serves a very important purpose in the correlation operation, since it favors signals that are spread out in time. In other words, given two signals with equal integral, the more sharply peaked signal will have a higher mean-squared power and thus will be penalized in the correlation when divided by its power. Therefore one interpretation of the correlation between two signals is that it is maximized when 1) the two signals have high values at the same time, and 2) each signal is spread out as much as possible.

For a binned neural spike train, rms normalization may not be a meaningful operation, since the amplitude is set arbitrarily by the experimenter (by assigning a value of 1 to each spike). In this case, rms normalization divides the number of spikes in each bin by the square root of the total number of spikes. For small bin widths, this operation does not favor signals that are spread out in time. One possible effect of this is shown in Fig. 1. In this figure, two smooth signals are used to generate two random Poisson spike trains with time-varying rate (see METHODS below). Because all the spikes in Fig. 1A occur near the highest peak of the signal in Fig. 1B, the correlation between the smooth signal in B and the spike train in A (using 1-ms time bins) is higher than the correlation between the smooth signal in B and the spikes it generated itself. This effect occurs because the normalization of the spike train does not favor the more spread out spike train in Fig. 1B, since there is no penalty for placing all the spikes at the maximum of the signal. This problem also occurs when using the instantaneous spike rate (Carvell et al. 1996). It can be alleviated by using a larger bin width so that there is a large number of spikes in each bin, but in this case the operation loses temporal resolution and may have difficulty with rapidly changing signals. Performing the correlation in shorter time-windows (Schwartz and Adams 1995) worsens the problem. The alternative technique of using a peri-stimulus time histogram (PSTH) (Gerstein and Kiang 1960; Gerstein and Perkel 1969; Murphy et al. 1974) is only applicable for external signals related to discrete events which can be easily repeated for averaging.



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 1. Illustration of a possible error using the correlation of binned spike rates to compare a spike train and a continuous signal. A and B: continuous signals and simulated spike trains generated from them as time-varying Poisson processes. The correlation between signal B and spike train B using 1-ms time bins is 0.32, but the correlation between signal B and spike train A is 0.43. Therefore in this case the correlation incorrectly suggests that signal B and spike train A are more closely related than signal B and the spike train that it in fact generated.

The new technique proposed here is based on probabilistic inference, rather than correlation. The approach is to determine which of several external signals is related to each possible spike train by calculating the conditional probability that each spike train could have been generated by a Poisson process with time-varying rate given by the external signal. This method is applicable to choosing between a set of candidate signals, but the absolute probability values are so small that they are difficult to interpret outside this context. To avoid the necessity of choosing arbitrary bin sizes, the probability is computed using a recursive multi-scale approach, with the time-window being successively divided in half and probabilities computed over smaller and smaller segments. Once the conditional probability has been calculated, choosing the external signal with the largest probability yields the maximum-likelihood (ML) estimate of the best match between continuous signals and spike trains.


    METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

A neural spike train n(t) is modeled as a time-varying Poisson process, with the probability of firing n times in a small interval Delta t given by
<IT>P</IT>(<IT>n‖r</IT>)<IT>=</IT><IT>r<SUP>n</SUP>e<SUP>−r</SUP>/n</IT>!
where P(n|r) is the probability of n spikes given an average rate of r spikes per interval Delta t. The rate r(t) is itself time-varying, and we seek to calculate the probability P[n(t)|r(t)] of obtaining the complete spike train n(t) given a candidate rate function r(t), where t goes from 0 to a final time T. In this formulation, n(t) is the spike density at a set of discrete time bins of width Delta t indexed by t, so that the value of n(t) at any particular time is the number of spikes in the bin at that time. r(t) is a continuous signal representing the time-varying rate of the Poisson process that generates the spikes. Therefore r(t) is defined for any value of t (not just within bins) and represents the theoretical instantaneous spike rate. n(t) is a random variable with probability density determined by r(t). The value P[n(t)|r(t)] represents the probability that a particular set of spike counts n(t) will occur in each time bin, given the true generating rate process r(t).

To calculate the probability P[n(t)|r(t)], first adjust the baseline of r(t) by adding a constant so that r(t) is always >0 and normalize r(t) by dividing by its integral so that the total integral is 1. This must be done so that r(t) can be interpreted as a probability density. Then split n(t) and r(t) into left and right halves, so that nL(t) is the left half of n(t) from time 0 to T/2, nR(t) is the right half from time T/2 to T, and rL(t) and rR(t) are defined similarly. Count the number of spikes in nL(t) and nR(t) and call these nL and nR. Integrate the continuous signals rL(t) and rR(t) and call the integrals rL and rR. Because of the independent increment property of Poisson processes, the probability of obtaining exactly nL spikes over interval (0, T/2) is determined by the Poisson rate rL. To calculate the probability of "splitting" the total number of spikes into the two half-signals, note that since rL + rR = 1, the probability of the split is the probability of dividing n spikes into two bins (the left and the right halves) of exactly nL and nR spikes. This is given by the standard equation for the binomial distribution
<IT>P</IT>(<IT>n<SUB>L</SUB>, n<SUB>R</SUB>‖r<SUB>L</SUB>, r<SUB>R</SUB></IT>)<IT>=</IT><FENCE><FR><NU>(<IT>n<SUB>L</SUB></IT><IT>+</IT><IT>n<SUB>R</SUB></IT>)!</NU><DE><IT>n<SUB>L</SUB></IT>!<IT>n</IT><SUB><IT>R</IT></SUB>!</DE></FR></FENCE>(<IT>r</IT><SUP><IT>n<SUB>L</SUB></IT></SUP><SUB><IT>L</IT></SUB><IT>r</IT><SUP><IT>n<SUB>R</SUB></IT></SUP><SUB><IT>R</IT></SUB>)
This expression gives the probability of finding exactly nL and nR spikes in the right and left half-intervals, given total probabilities in each half-interval of rL and rR. (This equation is also equal to the joint probability of achieving exactly nL and nR spikes from two Poisson distributions with rates rL and rR.)

We now continue this process by normalizing rL(t) and rR(t) to each have integral 1, splitting each half-interval again in half (yielding quarter-intervals) and calculating the probability of the two new splits. A bin that contains only zero or one spikes is not split. The process continues for each interval until there are only zero or one spikes in each interval. The probability of the spike train n(t) given the rate r(t) is then the combined probability of making all the specific splits, and since the splits are independent this probability is given by the product of the probabilities of all the individual splits. The resulting number will be very small (the probability of any particular spike train will always be small), but it allows comparison of the relative probability that different rate functions r(t) might have led to the particular spike train n(t). Given a series of candidate rate functions ri(t), we perform this procedure on each and select the rate function with the largest probability P[n(t)|ri(t)]. This gives the ML estimate.

The computation of the probabilities is very rapid, but it can be made even more rapid by noting that, since the final result is a comparison, we do not need to calculate an accurate value of the probability so long as we are certain that the rankings of the candidate rate functions ri(t) will remain in order. Note that the term
<FENCE><FR><NU>(<IT>n<SUB>L</SUB></IT><IT>+</IT><IT>n<SUB>R</SUB></IT>)!</NU><DE><IT>n<SUB>L</SUB></IT>!<IT>n<SUB>R</SUB></IT>!</DE></FR></FENCE>
is independent of r(t) and so will be the same for all candidate rate functions. It can therefore be left out of the computation without affecting the relative rankings. The probability P[n(t)|r(t)] is the product of probabilities, each of the form
(<IT>r</IT><SUP><IT>n<SUB>L</SUB></IT></SUP><SUB><IT>L</IT></SUB><IT>r</IT><SUP><IT>n<SUB>R</SUB></IT></SUP><SUB><IT>R</IT></SUB>)
and this computation can therefore be simplified by taking logarithms, so that the comparison becomes a sum of terms of the form
<IT>n<SUB>L</SUB> </IT>log <IT>r<SUB>L</SUB></IT><IT>+</IT><IT>n<SUB>R</SUB> </IT>log <IT>r<SUB>R</SUB></IT>
The complete algorithm used in the simulations is the following:

Function prob [n(t), r(t)]:

1. If n(t) = 0 return

2. Normalize: r(t) right-arrow r(t)/int r(t)

3. Split n(t) and r(t) at T/2 into nL(t), nR(t), rL(t), and rR(t)

4. Count spikes and integrate the signals nL = Sigma  nL(t), nR = Sigma  nR(t), rL = int  rL(t), and rR = int  rR(t)

5. Calculate the log split probability factor P = nL log rL + nR log rR

6. Call recursively pL = prob [nL(t), rL(t)] and pR = prob [nR(t), rR(t)]

7. Return p + pL + pR

(Matlab code for the algorithm and a demonstration using simulated data can be downloaded from http://www.stanford.edu/~sanger/demo1.zip.)

For each simulation, 50 continuous rate functions r(t) were each generated by adding a series of five cosine functions with random amplitude and phase, and periods varying from one cycle per total time T to five cycles per total time T. A constant was added so that the minimum value of each r(t) was greater than zero. For each rate function r(t), a single Poisson spike train n(t) was generated by placing spikes pseudo-randomly within each time interval, with probability of firing proportional to the average value of the rate function in that interval. Thus for each comparison, 50 different 1-s sets of spike counts n(t) were generated from the 50 generating rate functions r(t), with bins sampled at 1000 Hz. Bin sizes are therefore 1 ms, for a total of 1000 bins. The probability algorithm was run on all 2500 possible comparisons [n(t), r(t)], and for each spike train n(t), the rate function r(t) with the largest resulting log probability was chosen as the best estimate. The best estimate is correct if n(t) was generated from the r(t) with highest log probability, and otherwise incorrect. The number of correct associations of n(t) and r(t) for each simulation is therefore between zero and 50. The entire procedure was performed 50 times, and the mean and SD of the number of correct associations were calculated.

For comparison, two correlation-based methods were tested on the same rate function and spike data. In the "bin correlation" method, each spike train n(t) is represented as a series of spike counts in each of the one thousand 1-ms time bins. Each rate function is then correlated against the spike train by calculating
&rgr;=<FENCE><FR><NU>1</NU><DE><RAD><RCD>&Sgr; <IT>n</IT><SUP><IT>2</IT></SUP>(<IT>t</IT>)<IT>r</IT><SUP><IT>2</IT></SUP>(<IT>t</IT>)</RCD></RAD></DE></FR></FENCE> <LIM><OP>∑</OP><LL><IT>t</IT><IT>=1</IT></LL><UL><IT>1000</IT></UL></LIM> <IT>n</IT>(<IT>t</IT>)<IT>r</IT>(<IT>t</IT>)
In the "interval correlation" method, the interval between each pair of spikes is calculated, and the inverse of this interval is assigned as the value at each 1-ms time bin between the two spikes. Thus if there is a spike at t1 and t2, then all points between t1 and t2 are assigned the value 1/(t2 - t1) and the interval sequence is then correlated against the rate function as shown above. In each case, the spike sequence with the largest correlation value was chosen as the best estimate. The SPSS software package (SPSS, Inc.) was used for data analysis.


    RESULTS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

Figure 2A shows a comparison between the probability method and the two different correlation-based methods for estimating the correct spike train associated with a given smooth signal. In this case, the average spike rate is 20 Hz. The mean number correctly categorized by the probability method is 26 (SD 3.6); the mean for the bin correlation method is 22 (SD 4.0), and the mean for the interval correlation method is 21 (SD 3.3). The probability method result is significantly different from both correlation methods by paired-samples t-test (P < 0.001 in both cases). Although none of the methods was able to categorize more than about half the examples correctly, it should be noted that with an average spike rate of 20 Hz there are only about 20 spikes in the time-window, so that this is a very difficult categorization problem with very little data to distinguish between possibilities. Also note that all algorithms do much better than chance performance, for which the expected number correctly categorized is 1. 



View larger version (15K):
[in this window]
[in a new window]
 
Fig. 2. Simulation results comparing the probability method to two correlation-based methods. The vertical axis gives the number of continuous signals that are correctly matched to the appropriate spike train, with a maximum of 50 possible correct answers per trial. The dark horizontal bar is the mean number of correct categorizations over 50 trials, the vertical limits of the gray bars specify the 25th and 75th percentile, and the whiskers specify the extreme values. A: performance of the three algorithms for an average spike rate of 20 Hz. B: performance for an average spike rate of 100 Hz.

Figure 2B shows a comparison for an average spike rate of 100 Hz. In this case, the mean number correctly categorized by the probability method is 49 (SD 0.96), the mean for the bin correlation method is 48 (SD 1.2), and the mean for the interval correlation method is 48 (SD 1.1). The probability method is significantly different from both correlation methods by paired-samples t-test (P < 0.001 in both cases), although in this case the magnitude of the difference in performance is negligible. The two correlation methods are not significantly different from each other.


    DISCUSSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

The probability technique is a new method for determining which of several smooth signals is most likely to be related to a train of neural spikes. It is based on the assumption that the spikes can be described by a Poisson process with rate given by the correct smooth signal. Under this assumption, an algorithm has been developed that recursively calculates the probability of producing a particular spike train, and the signal with maximum probability determines the maximum likelihood estimate. This is the best possible estimate under these assumptions and with no additional available information. There is no need to choose bin widths a priori, since the recursive splitting algorithm dynamically selects bin widths by splitting until there is no more than one spike per bin. If there is an unknown lag between the spike train n(t) and the continuous signal r(t), the probability algorithm can be used to compare the signal at varying lags to determine the maximum likelihood solution. This is illustrated in Fig. 3B, which shows that the peak of the probability as a function of time lag occurs near the simulated correct value of 500 ms. If multiple spike trains are tested at multiple lags, it is possible that different spike trains would match a particular continuous signal at different lags, and such a result would need to be interpreted with caution.



View larger version (20K):
[in this window]
[in a new window]
 
Fig. 3. Simulation of using the probability method to calculate an unknown time lag between a spike train and a continuous signal. A: 1-s long spike train generated by the continuous signal above it. B: log probability for time lags between 0 and 1000 ms. The correct lag is 500 ms.

As shown above, the probability algorithm outperforms the bin correlation method and the interval correlation method on simulated data that conforms to its underlying assumption. The difference is most noticeable for lower spike rates. It is not known whether the probability algorithm would perform better on real data, but this is likely to require verification on a case-by-case basis, since not all neural spike trains will be well described by time-varying Poisson processes. In practice, it may be necessary to verify that a spike train has nearly Poisson statistics (with variance in spike counts between bins equal to the mean spike rate) to be confident that this algorithm will apply.

The main disadvantage of the new technique is the relative complexity of the equation and the moderately increased computational requirements compared with the correlation method. Another disadvantage is that the algorithm cannot be run as a real-time filter, and it is therefore not useful for neural control applications. The algorithm analyzes only the information in the spike train that is carried by the spike rate, and it may therefore underestimate relations due to other coding strategies (Sherry and Klemm 1984). The rate function is adjusted to be positive and its integral is normalized, so information in the DC baseline or the signal magnitude is not used.

It is important to realize that the absolute magnitude of the calculated probabilities is a small number that may not have an easily interpretable meaning. The probability P[n(t)|r(t)] is therefore used to compare the relative probabilities between different signals r(t). For example, a spike train n(t) recorded from a single cell using an extracellular electrode can be compared with a set of surface potentials r(t) on the scalp to determine the spatial location on the scalp that is most closely related to the cell's firing rate. For each scalp electrode signal r(t), the value of P[n(t)|r(t)] is calculated, and the signal r(t) that maximizes this value can be interpreted as the most closely related scalp surface signal to the spike train n(t). A similar experiment could be performed to test which of several rectified and filtered surface EMG signals is related to an extracellular recording. Note that although the Poisson model treats r(t) as the "generator" of n(t), this does not imply causality. In other words, it is not possible to conclude that r(t) drives the spike train n(t), but only that the two are related.

There are many possible variations on the basic structure of the algorithm. Rather than splitting at one-half the time window, the splits could be placed at experimentally relevant time points. Alternatively, the split could be chosen to place an equal number of spikes on both sides, or each split could form a division of the smooth signal into two orthogonal wavelet-like components with a smooth boundary.

The utility and validity of this technique need to be tested in particular experimental situations. Nevertheless, under this model of spike generation it is possible to determine the probabilistically correct solution for associating a spike train with a continuous signal.


    ACKNOWLEDGMENTS

I thank S. Giszter and P. Cisek for helpful discussions and comments.

This work was supported in part by the Stanford University Department of Child Neurology.


    FOOTNOTES

Address for reprint requests: Dept. of Child Neurology, Stanford University Medical Center, 300 Pasteur Dr., MS:5235, Stanford, CA 94305.

Received 12 February 2001; accepted in final form 25 October 2001.


    REFERENCES
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES

0022-3077/02 $5.00 Copyright © 2002 The American Physiological Society



This article has been cited by other articles:


Home page
J. Neurophysiol.Home page
Z. Del Prete, S. P. Baker, and P. Grigg
Stretch Responses of Cutaneous RA Afferent Neurons in Mouse Hairy Skin
J Neurophysiol, March 1, 2003; 89(3): 1649 - 1659.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (4)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Sanger, T. D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sanger, T. D.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Visit Other APS Journals Online