## Abstract

The existence and role of fine-temporal structure in the spiking activity of central neurons is the subject of an enduring debate among physiologists. To a large extent, the problem is a statistical one: what inferences can be drawn from neurons monitored in the absence of full control over their presynaptic environments? In principle, properly crafted resampling methods can still produce statistically correct hypothesis tests. We focus on the approach to resampling known as jitter. We review a wide range of jitter techniques, illustrated by both simulation experiments and selected analyses of spike data from motor cortical neurons. We rely on an intuitive and rigorous statistical framework known as conditional modeling to reveal otherwise hidden assumptions and to support precise conclusions. Among other applications, we review statistical tests for exploring any proposed limit on the rate of change of spiking probabilities, exact tests for the significance of repeated fine-temporal patterns of spikes, and the construction of acceptance bands for testing any purported relationship between sensory or motor variables and synchrony or other fine-temporal events.

- spike timing
- neural coding
- jitter
- hypothesis testing
- conditional inference
- synchrony

### 1 Introduction

What is the temporal resolution of neuronal activity? What role does fine-temporal structure play in neuronal processing? Whereas millisecond resolution has been demonstrated in a multitude of special systems—the retina of several vertebrate species (Meister and Barry 1999), the mammalian medial superior olivary nucleus (Yin et al. 1990), the honeybee olfactory system (Wehr and Laurent 1996), the bat sonar system (Ferragamo et al. 1998), and the cat lateral geniculate nucleus (Alonso et al. 1996; Dan et al. 1998), to name a few—the significance of these observations to cortical processing in general is controversial (Gray 1999; Riesenhuber and Poggio 1999; Shadlen and Movshon 1999). Are they the exception or the rule? The advent of multielectrode technologies and the feasibility of stable simultaneous recordings from multiple neurons in awake and performing animals now offer hope for meaningfully addressing these fundamental issues.

Many authors have argued that the useful information transmitted from one neuron to another, in the general cortical circuitry, is well described by the number of spikes, as measured over a 10- or 20-ms interval, rather than the timing of the spikes, as might be measured at a 1- or 2-ms resolution. The issue is unresolved. On one hand, we are able to associate firing rates of select cortical neurons (as measured on a 10 or greater millisecond scale) with numerous perceptual and motor states, such as a face in the visual field or the direction of limb movement. On the other hand, there are indeed a multitude of systems, albeit highly specialized, which demonstrate (and almost certainly require) millisecond accuracy in spike timing. What is more, there is the often-repeated argument that neural representation must accommodate the relational structure among the parts of an action, a thought, a perceived object, or a scene, and that fine-temporal structure might offer a natural way to capture this kind of information (Abeles 1991; Bienenstock 1995; Geman 2006; Roelfsema et al. 1996; Singer 1999; von der Malsburg 1994).

The analysis of neuronal data in an awake animal is made especially challenging by inevitable changes, from instant to instant and trial to trial, of the statistical nature of the recorded signal (Amarasingham et al. 2006; Brody 1998; Churchland et al. 2010; Cohen and Kohn 2011). The status of tens of thousands of presynaptic neurons, not directly influenced by the experimental paradigm, is largely out of the control of the experimenter; the environments of individual neurons are in constant flux. Thus time-honored assumptions about “repeated” samples (having similar or identical distributions) become untenable in cortical recordings from awake and performing animals. The unique character of variability in a neurophysiological data set raises, in statistical terms, concerns of robustness: the conclusion of a conventional statistical analysis of a spike train may be contaminated by the fact that trial-to-trial variability causes its underlying probabilistic model to be misspecified.

These observations highlight a fundamental challenge to gathering statistical evidence for a lack of precision in the cortical microcircuitry (Amarasingham et al. 2006). Imprecision could arise from nonrepeatability in the dynamics of individual neurons or, equally well, from a failure to reproduce a neuron's trial-to-trial input. On the other hand, these observations do not preclude collecting statistical evidence against any given limitation on precision or repeatability.

We will review a collection of statistical approaches to temporal resolution collectively known as jitter methods. We will formulate a definition of temporal resolution and review the application of jitter methods for testing the hypothesis that neurons operate at or below any specified resolution. These procedures are based on observations of precisely timed events in the spike record, such as counts of synchronous events across neurons or of more general spiking motifs, but their statistical significance is computed without estimating instant-to-instant firing rates or trial-to-trial variability of the participating neurons.

One way to motivate jitter is through a thought experiment, suggested by our colleague Prof. Elie Bienenstock of Brown University. Bienenstock imagines a pharmacological agent with the peculiar property that it modestly advances or retards every spike of every neuron, independently, for the duration of the drug's action. How large would the “jitter window” have to be to disrupt cortical processing? Would one feel the effect, or, indeed, would the brain even function, under a 5-ms random change in the timing of every spike? How about 10 ms? In addition to taking a step toward a definition of temporal resolution, the thought experiment suggests a statistical approach to testing any given hypothesized limit on resolution. In particular, beginning with an original data set of simultaneously recorded neurons, one could create a surrogate data set by randomly jittering each spike within a 10-ms window. How do statistics computed from the original data set compare with statistical features computed from an ensemble of such surrogate data sets, created by repeatedly jittering? This is the beginning of a statistical definition of temporal resolution and of a statistical test. Complications introduced by trial-to-trial variability are avoided because the individual trial structure is preserved; indeed, the approach applies just as sensibly to single-trial data and to cases where there is no clear notion of a trial at all.

As it stands, this type of test is crude in that it does not accommodate even slow variations in an underlying firing rate, much less bursting or absolute and relative refractory periods. We will formulate jitter more rigorously as an example of conditional modeling and exploit this framework to develop a suite of tools for measuring and testing temporal resolution, despite the time-varying nature of neuronal excitability, the nonrepeatability of trials, and the various phenomena, such as bursting and refractoriness, that statistically tie together individual spikes.

In various guises, jitter and related spike resampling methods are in common use (Abeles and Gat 2001; Amarasingham 2004; Baker and Lemon 2000; Butts et al. 2007; Date et al. 1999; Dragoi and Buzsáki 2006; Fujisawa et al. 2008; Furukawa and Middlebrooks 2002; Gerstein 2004; Grün 2009; Grün et al. 2009; Gütig et al. 2002; Harrison 2005; Harrison and Geman 2009; Hatsopoulos et al. 2003; Ikegaya et al. 2004; Jones et al. 2004; Lu and Wang 2003; Maldonado et al. 2009; Nádasdy et al. 1999; Oram et al. 1999; Pazienti et al. 2007, 2008; Pipa et al. 2007; Renart et al. 2010; Rieke et al. 1997; Rokem et al. 2006; Shmiel et al. 2005, 2006; Smith and Kohn 2008; Stark and Abeles 2009; see also Amarasingham et al. 2011).^{1} The merits and limitations of these techniques, as well as their proper application and interpretation, are best viewed through rigorous statistical formulation. We provide an elementary overview of jitter-based resampling and a statistical framework that supports a precise statement of assumptions and a precise interpretation of conclusions. We provide a step-by-step development of jitter-based methods, for investigating temporal resolution, “excess” synchrony, and other measures of precision, and we will explore related formulas for making exact hypothesis tests about fine-temporal structure. Both simulated and real spike data are used for illustration. The discussion is largely nontechnical but includes frequent pointers to the mathematical appendix provided in a parallel technical report (Amarasingham et al. 2011), which also contains the details of the analyses and computer experiments used in the main text. Together, the text and appendix provide an essentially self-contained and complete formal statistical treatment.

In the following section (Section 2), taking the analysis of synchrony and the cross-correlogram as a pedagogical example, we introduce the basic idea of spike jitter and its utility for exploring temporal resolution. We make comparisons with trial shuffling and provide a framework, conditional modeling, for articulating and evaluating the statistical assumptions under which jitter and other resampling methods are valid. In Section 3, we demonstrate the scope of the conditional modeling framework by reviewing broader classes of jitter techniques, such as accommodate slow variations in underlying firing rate and structured firing patterns. Drawing from this development and in the way of further illustration, in Section 4 we analyze the evidence for fine-temporal structure in the recordings of three simultaneously recorded primate motor cortical neurons. We conclude in Section 5 with further discussion and summary.

#### 2 Spike Re-Sampling, Spike Jitter, And Conditional Inference

##### 2.1 Trial shuffles and interval jitter.

Imagine making a 1-s recording from each of two neurons for each of 100 trials. Possibly, the two neurons have identical inhomogeneous (i.e., time varying) firing rates, and possibly, the firing rate profile changes from trial to trial. In such a circumstance, even slow variations in firing rate will induce a peak in the cross-correlation histogram at zero, and the peak will appear to be significant when measured, for example, against a corresponding cross-correlation histogram value produced by a “shuffle” (permutation) of trials of one or the other of the neurons. These effects have been widely discussed (Baker and Gerstein 2001; Brody 1998; Ventura et al. 2005b). A peak at and around zero in a shuffle-corrected cross-correlation histogram is not, in and of itself, an artifact; common variation in firing rates across two neurons induces a statistical correlation in spike timings. The issue is one of interpretation and, in particular, one of time scale. Statistically, these coupled but nonetheless slow temporal effects violate the assumption of independence, and violate it in the direction of increased synchrony, so significance with respect to the shuffling procedure is not surprising. It is not, however, due to finely timed spikes, per se. When the goal is to isolate effects at a particular time scale, independence across neurons is the wrong null hypothesis.

Is there a way to test for millisecond precision, per se, despite the evident correlation induced by shared trial-to-trial variation in firing rate? It is tempting to try to first correct for trial-to-trial variation and then to look for residual high-resolution correlation, and in fact many methods along these lines have been proposed. As already mentioned in Section 1, we have some doubts about the feasibility of estimating parameters, much less rate functions, for individual trials. Here we focus instead on a jitter-based approach that avoids direct estimation of firing rates, whether common to trials or corrected for trial-to-trial variation.

A simple computer experiment illustrates the idea. An artificial data set has two simultaneously recorded neurons with a zero-lag peak in the shuffle-corrected cross-correlation histogram (Fig. 1). By construction, the peak comes from shared trial-to-trial variability in the Poisson intensity functions operating on relatively slow (50 ms) time scales (Fig. 1*A*), as evidenced in the cross-correlogram (Fig. 1*C*), and not from any additional correlations in spike timing. The peak is significant with respect to the shuffling procedure (Fig. 1*D*), but how are we to know and quantitatively report that it is the result of the relatively slow, 50-ms undulations in firing rate that are common between the two neurons in a given trial, but different from trial to trial? Suppose that instead of shuffling trials we were to generate surrogate spike trains by perturbing individual spikes, independently, within a small window, say 20 ms, mimicking the action of the fictional drug described in Section 1. Does the apparent synchrony survive? If so, then we could reasonably interpret the peak in the cross-correlation histogram as arising from temporal effects acting on a time scale no finer than about 20 ms. In particular, we would conclude that the apparently significant zero-lag peak in the cross-correlation histogram is consistent with events operating on a slower time scale (in this case, the 50-ms-wide peaks in the shared intensity functions).

The perturbation experiment was performed and is illustrated in Fig. 2. The time axis was first partitioned, independently of the observed spikes, into disjoint and contiguous intervals of length 20 ms. Each spike of each neuron was then independently moved to a new location, chosen from the uniform distribution on the 20-ms interval to which it belonged in the original data. Repeating the process produces a sequence of surrogate spike trains (Fig. 2*A*). Application to the synthetic data from Fig. 1 produces a sequence of surrogate data sets, each consisting of 100 trials of a 1-s spike train for each of the two neurons. The number of synchronous spike pairs (“synchronies”) across the two neurons, as defined by occurring within +1 or −1 ms of each other, was recorded for each of 10,000 surrogate data sets. Figure 2*B* shows the cross-correlation histogram with jitter-derived acceptance bands (compare to Fig. 1*C*), and Fig. 2*C* shows the distribution of synchronies (compare to Fig. 1*D*). More than 21% of the randomly generated synchrony counts exceed the unperturbed count: the interval jitter procedure suggests that the peak in the cross-correlation histogram in Fig. 1 is due to effects occurring on a time scale coarser than 20 ms.

##### 2.2 Hypothesis testing and conditional inference.

Interval jitter (Amarasingham 2004; Date et al. 1999; Harrison 2005; Harrison and Geman 2009) tests hypotheses about local (within interval) distributions on the placements of spikes. Given that there is a spike in a particular 10- or 20-ms interval, are all placements of the spike essentially equally probable? Temporal resolution is about precision in the placement of spikes. Questions about the probability of having a spike in a given interval, or having any given number of spikes in a given interval, are interesting in their own right, but are different and off the point. The focus is on timing and to the extent possible we would prefer a statistical approach that avoids assumptions about anything but the issue at hand.

Interval jitter is an example of conditional modeling. Statistical models are specified via conditional distributions on the data. Inference is conditioned on a temporal coarsening of the spiking processes, as specified by the numbers of spikes in the jitter intervals. No assumptions are made about the process, random or otherwise, that determines the existence and numbers of spikes found in the jitter intervals. The assumptions that are made, and therefore tested, address the question being asked: how precise is the timing of spikes? By conditioning on the numbers of spikes in each interval, we sidestep entirely the problem of specifying the process through which these numbers arise. If the intervals are, say, 10 ms wide, and if we were to assume that all arrangements in a given interval that preserve the observed number of spikes are equally likely, then, informally, we are testing the hypothesis that there is no temporal structure at a scale finer than 10 ms. Rejecting the null would provide evidence for faster inter- and intraneuronal mechanisms. More formally, we would be rejecting a compound null hypothesis that includes every point process for which the conditional placements of spikes within intervals are uniform, given the interval counts. And indeed, the proportion of surrogate spike trains for which synchrony counts are greater than that of the original train essentially provides an exact *P* value for this null hypothesis. (See Amarasingham et al. 2011 for a precise formal description of the test and a demonstration of its validity.)

At the risk of belaboring the point, there is another (“generative”) way to understand the null hypothesis: imagine that you are in charge of generating the spike data that are recorded in an experiment. You first partition time into intervals (say 10 ms). You then assign spike counts to every interval of every neuron in every trial. These assignments can follow any rule whatsoever, deterministic or stochastic, with arbitrary dependencies between intervals and across neurons and trials, or no dependencies at all, or anything in between. Hidden variables, like the states of 10 billion unobserved neurons, can be determining or irrelevant to your assignments, as you wish. You then assign the actual locations of the spikes within intervals according to a distribution that includes the hypothesized limits on the precision of spike timing. (Here this limit is that all intrainterval locations are equally likely; in Sections 3.1 and 3.2, we describe generalizations of such limitations. The statistical formulation, via conditioning, remains the same.)

Trial shuffling is another, familiar, example of conditional inference. Given two neurons observed over *N* trials, inference is conditioned on the two sets of spike-time recordings, each set with *N* records representing the respective observations from the two neurons. The assumption, which again defines a compound null hypothesis, is on the conditional distribution: every one of the *N!* pairings of records is equally likely (as would be the case, for example, if the trials were statistically repeatable and the two neurons' responses statistically independent). Notice that the null hypothesis says nothing about temporal resolution. There is nothing wrong with using a tally of 1-ms synchronies to test the null hypothesis, so long as we keep in mind that an excess of synchronies in the original pairing, relative to the trial-shuffled pairings, rejects the hypothesis that all pairings are equally likely, not a hypothesized lack of precision of spike timing.

##### 2.3 Synchrony as a test statistic.

Jitter methods compare an original data set with an artificially generated ensemble of jittered data sets, but this comparison must be based on some feature of the data, such as the synchrony count, the shape of the cross-correlation histogram (Section 2.7), the number of repeating spiking motifs (Section 4.1), or an estimate of the mutual information between spiking and stimulus (see also Section 4.2). In the terminology of hypothesis testing, this feature of the data is called the test statistic. Jitter methods can be applied with any test statistic. Our emphasis here in Section 2 is jitter, not the test statistic. We use synchrony count as the test statistic primarily for pedagogical purposes. In Section 4, we illustrate jitter with other test statistics.

##### 2.4 Basic jitter.

In the experiment of Section 2.1, the intervals across which spikes were (uniformly) perturbed were chosen independently of the neurons' observed spike times. Here, by way of motivation, we compare this former procedure with the intuitive and common approach of simply jittering individual spikes using jitter intervals that are centered at the original spike locations. We refer to this familiar, heuristic procedure as “basic jitter”: to form surrogate spike trains, simply jitter the spikes in an interval centered around their original locations. Whereas the “*P* values” that result from such perturbations are indeed (one sided) tail probabilities under the jitter distribution on synchrony counts, they are not, strictly speaking, rejection probabilities in a well-formulated hypothesis test. No “null hypothesis” has been articulated, and in point of fact, none could be. The implicit assumption, when comparing observed synchrony counts with a distribution of counts derived from a spike resampling algorithm, is that the original locations of spikes are “statistically indistinguishable” from the resampled locations, provided that the null hypothesis is true. Statistically indistinguishable is called “exchangeable.” A formal definition is included in the appendix of Amarasingham et al. (2011), but the idea is simple enough: an observer who believed in the null hypothesis could not pick out the real spike train from a lineup that also included the simulated spike trains. The trouble with basic jitter is that the original spikes stand out; they can always be found at or near the center of their resampled surrogates.

In typical practice and for simple test statistics, like synchrony counts, there may not always be a big difference between basic jitter and interval jitter. Indeed, the two procedures are quite similar, and, unsurprisingly, the results of the experiments in Sections 2.1 are qualitatively unchanged by using basic jitter, rather than interval jitter, with 20-ms jitter windows. However, an important distinction is that, as described in Section 2.2, interval jitter does make for an exact hypothesis test of a large class of null hypotheses: those spike-generating processes for which the locations of spikes, conditioned on the interval spike counts (i.e., the numbers of spikes found in each interval), are all equally likely. (See further discussion in Amarasingham et al. 2011.) Basic jitter is only one example of many heuristic spike resampling methods that have appeared in the literature. Although we recognize that these heuristic methods are useful exploratory tools, we recommend some caution in their use, and we recommend against their misuse as hypothesis tests. Whenever possible, heuristic methods, like basic jitter, should be converted into proper statistical methods, like interval jitter, which use well-formulated statistical models that facilitate communication and interpretation of results.

##### 2.5 Accidental synchrony.

One way to experiment with fine-temporal structure is to modify the simulated spike trains depicted in Fig. 1 by injecting randomly placed pairs of highly synchronous spikes, as might occur, for example, through common inputs or some other physiological process. To what extent, if any, will the presence of these pairs alter the conclusion that the fine-temporal structure is adequately explained by a slow modulation in firing rates? After all, each injection contributes real fine-temporal structure, but how is this to be separated from the “accidental” synchrony that results from correlated but slow changes in firing rate?

Three additional data sets were produced with progressively more injected synchronous spikes (Fig. 3). For each of the three data sets, a distribution of synchrony was generated by jittering spikes, exactly as in the experiment reported in Fig. 2. Figure 3*B* shows the resulting jitter distributions, along with the number of observed (±1 ms) synchronies and the *P* value of the observed synchronies relative to the jitter distribution. Notice that a total of 55 injected synchronies (Fig. 3*B*, *third* row, injection rate 0.5 Hz) in 100 s of recording containing ∼5,000 spikes in each neuron and almost 600 accidental synchronies is already detected as significant (*P* = 0.015).

Returning to the issue of time scale, and keeping in mind that the jitter window in these experiments is 20 ms wide, it is not unreasonable to think of the jitter distribution on synchronies as a distribution on accidental synchronies. These are synchronies that occur by chance, given the numbers and approximate locations of spikes and given that rate changes occur on a coarse time scale, greater than 20 or so milliseconds. Correlations at finer time scales would be disrupted, and indeed the observed numbers of synchronies in the experiments with injected pairs move further into the tails of the jitter distributions as more pairs are injected. In fact, as shown in Fig. 3*B*, we can make a very rough estimate of the number of injected synchronies by subtracting the mean of the jitter distribution from the observed number of synchronies. It is always a good idea to examine physiologically interpretable measures, like this rough estimate of injected synchrony rate, to verify the scientific, in addition to the statistical, significance of *P* values. Here we caution that although the hypothesis testing conclusions drawn from interval jitter are precise statistical conclusions, this rough “estimate” of injected synchrony rate is heuristic and should be interpreted as such. Proper statistical estimation of injected synchrony would require the development of precise models of injected synchrony, which are outside of the scope of this article.

##### 2.6 Temporal resolution.

What if repeated trials really are “repeated,” in the statistical sense, and the spike trains are independent of one another? For example, suppose that the response of two neurons is characterized by firing rates that do not vary from trial to trial. There might then appear little to argue against trial shuffling to explore the significance of fine-temporal structure, such as synchrony, above and beyond what might be explained by firing rates. However, if the two fixed firing rates for the neurons were themselves rapidly changing and correlated in time, then this might represent a possible mechanism, or at least a model, for fine-temporal structure. Yet, this kind of fine-temporal structure would be invisible to trial shuffling exactly because the trials are truly repeatable and the spike processes are dictated solely by their respective firing rates. By contrast, whether or not the assumption of statistical repeatability is valid, temporal processes operating at fast time scales relative to a given jitter window would be disrupted by jitter and thus in principle detectable by comparing statistics of the unperturbed spike train with those of the jittered spike train.

An extreme example is illustrated in Fig. 4. Pairs of model neurons are assumed to share inhomogeneous Poisson firing rates, and the same rate functions are repeated from trial to trial. By varying the bandwidths of the inhomogeneous rate functions, the sensitivity of jitter can be tested as a function of bandwidth and its relation to the size of the jitter window (20 ms in these experiments). As expected, jitter disrupted temporal structure at scales near or smaller than the jitter window. A similar experiment (not shown) fixing bandwidth while systematically increasing the size of the jitter window gives similar results: a more or less monotonic rise in significance (drop in *P* value).

##### 2.7 The jitter-corrected cross-correlation histogram.

A traditional way to look for synchrony is to look for peaks in the cross-correlation histogram in and around *lag* 0. The other bins, corresponding to other spike-to-spike intervals, provide analogous evidence for other fine-temporal relationships, such as those that might arise from direct (monosynaptic) connections between two recorded neurons. As described in Fig. 1, *C* and *E*, and further illustrated in Figs. 3*A* and 4*B*, one way to explore peaks in the cross-correlation histogram is through surrogate data generated by trial shuffle. An alternative approach, suggested by the apparent superiority of jitter methods in differentiating the temporal scales underlying observed coincidences, is to use jitter in place of shuffle to generate the surrogate data. The construction, then, is the same as the one described in Fig. 1*E*, except that 10,000 jitter-generated pairs of spike trains are used instead of 10,000 shuffle-generated pairs of spike trains. Figures 2*D*, 3*B*, and 4*C* revisit the experiments of Section 2, comparing shuffle-corrected with jitter-corrected cross-correlation histograms. The results are consistent with the observations already made about the potential hazards of trial shuffling and the potential benefits of jitter-based resampling. In that neither approach imposes a computational barrier, we recommend jitter over trial shuffling for producing corrected cross-correlation histograms and acceptance bands when the goal is to isolate fine-temporal correlations.

#### 3 Variations on the Jitter Theme

Jitter methods are an instance of conditional modeling. Null hypotheses of temporal resolution are specified via conditional distributions on the data. The key idea is to condition on the spike counts within intervals. The temporal extent of the intervals then specifies the temporal resolution that is hypothesized. Once the conditioning framework is in place, many extensions and generalizations follow naturally. In this section, we review a selection of examples.

##### 3.1 Rate of change of intensity functions.

Hypothesis tests, perhaps because of their simplicity as statistical decisions, can make extremely efficient use of data. This phenomenon is already apparent in Section 2.5, where very small quantities of injected synchronies (e.g., 55 out of ∼5,000 spikes and 600 accidental synchronies) can be picked out by the jitter procedure. Such sensitivity, a virtue, has a subtle corollary that calls for caution: a hypothesis test is equally capable of picking out a very fine misspecification of the null hypothesis. Examining this idea closely, the jitter null hypothesis, that having fixed the number of spikes in a jitter window every configuration of those spikes has precisely the same probability, might be criticized as an extreme interpretation of the absence of temporal precision. In particular, it is important to understand how firing rate variations within a jitter window affect jitter-derived conclusions. If, for example, we were to assume that the spike sequences in two neurons were generated by Poisson processes with time-varying rate functions (as in the artificially generated data of Section 2), how might we test the hypothesis that firing rates do not vary by more than, say, 25% within any 20-ms window? Interval jitter can accommodate exact tests for this and related hypotheses (Amarasingham 2004; Harrison 2005) through the use of nonuniform jitter distributions, chosen, for example, to maximally promote synchronies. See Amarasingham et al. (2011) and its appendix for details.

How does the *P* value, as computed using the interval jitter procedure introduced in Section 2.1, hold up under the more conservative and exact test provided by interval jitter with nonuniform relocation probabilities? Referring to the experiments in Section 2.5, we found that in the case of 0.75-Hz injected synchrony (Fig. 3, *bottom* row) we could allow for as much as an 81% change in firing rate (i.e., maximum rate/minimum rate ≤ 1.81) over each 20-ms jitter interval and still achieve a *P* value below 0.1. (Of course, it is easier to reject stricter limits on the change in firing rate; smaller *P* values go with smaller percentages.) Staying with the Poisson example (but the null hypothesis here is far more general; see Section 2.2 and Amarasingham et al. 2011), “we reject (at *P* = 0.1) the hypothesis that the data was generated from an inhomogeneous Poisson process with rate function piecewise linear on 20-ms jitter intervals and changing no faster than 81% in any one interval.” In plain words, the evidence favors a fast (fine-temporal) mechanism (as well it should given that we injected pairs of millisecond-synchronous spikes). For 0.5-Hz injected synchrony (Fig. 3, *third* row), *P* remains below 0.1 with up to 41% change in each 20-ms interval. In short, interval jitter continues to flag the presence of small numbers of synchronous events even when recast into the framework of a well-specified, exact, and conservative hypothesis test.

##### 3.2 Resampling patterns.

Relative and absolute refractory periods, and the patterned sequences of neuronal spikes known as bursting, in and of themselves constitute a form of fine-temporal structure. A neuron that fires in highly prototypical bursts, with well-preserved interspike intervals, is indeed operating with high precision. In that these phenomena restrict the placements of spikes relative to neighboring spikes, they already violate the hypothesis under which interval jitter calibrates the likelihood of an observed count of synchrony. How then is the practitioner to interpret observations of “excess” synchrony? Are the alignments an epiphenomenon, secondary to the structuring effects of refractory periods (Baker and Lemon 2000; Oram et al. 1999) and bursting (Abeles and Gat 2001), or are they due, instead, to less local properties such as common input, network oscillations, or poststimulus transients?

Pattern jitter is a method for exploring fine-temporal structure in spike trains that cannot be explained by short-range temporal dependencies among spikes of single neurons. The idea is to construct a randomization technique that not only preserves spike counts but also preserves the local statistics of neighboring spikes within single neurons. To this end, pattern jitter is a spike resampling method that rigidly moves local patterns of successive spikes. The short-range statistics within individual neurons are preserved, exactly, through restrictions on the allowable translations. Like interval jitter, pattern jitter corresponds to a proper statistical hypothesis test, as will be transparent from the conditional modeling perspective.

A “pattern” is a sequence of successive spikes from a single neuron satisfying two constraints: on one hand, the distances between neighboring spikes in the pattern are small, but on the other hand, the pattern is well separated from surrounding patterns. Precisely, given a “history” parameter, *R*, successive spikes in a pattern are separated by no more than *R* ms, and there are no surrounding spikes from the same neuron that are within *R* ms of the pattern spikes. The actual number of spikes in a pattern is variable. The smallest pattern is made of a single spike, but larger patterns will be typical when *R* is large. For the synchrony analysis at the beginning of Section 4, we used *R* = 100 ms; for the triplet analysis in Section 4.1, we explored a variety of values from 1 to 16 ms. Notice that the parameter *R* defines a unique partitioning of a spike train into patterns.

Rather than jittering individual spikes, we jitter patterns. If we designate the location (time) of a pattern to be the location of its first spike, then pattern jitter is just a generalization of interval jitter: *R* = 0, which generates single-spike patterns, is replaced by *R* > 0. Windows are defined a priori, as with interval jitter (Section 2.1), but patterns (rather than spikes) are randomly relocated in such a way that the first spike of each pattern (i.e., the pattern location) remains in its original interval. The only complication is that to preserve the statistics of local spike arrangements (as defined by the history parameter *R*), we must prevent the appearance of new patterns in the new (jittered) spike sequence. Hence, patterns must remain separated by more than *R* ms. Fortunately, an ensemble of pattern-preserving spike trains can be efficiently generated (Harrison and Geman 2009; see http://www.dam.brown.edu/ptg/jitter/ to download a library of pattern jitter tools).

Figure 5 illustrates pattern jitter using two jitter intervals (20 and 64 ms) at three different values of *R* (0, 25, and 100 ms). The point is to produce a set of surrogate records that are statistically indistinguishable (exchangeable) from each other and from the recorded spike train, under a null hypothesis that is consistent with physiological interactions between spikes having a range of up to *R* ms within a neuron. The null hypothesis focuses on the issue of temporal resolution. There are no unnecessary assumptions. One way to appreciate this is to interpret the null hypothesis from the generative point of view, as in Section 2.2: one produces spike records by choosing sequences of spike patterns that accommodate absolute and relative refractory periods, bursting, or any other interspike structure up to the specified history length of *R* ms (i.e., every 2 successive spikes in a pattern are separated by no more than *R* ms). There are no constraints on the choice of these patterns; they can be large or small, and they can depend on each other or not. The only constraint is on the locations of the patterns, and it is the one implied by their definition: they cannot be positioned so closely as to destroy the chosen pattern structure, i.e., they need to be spaced by more than *R* ms (distance between last spike of one pattern and first spike of the following pattern exceeds *R* ms). If we think of a pattern as located in the interval that contains its first spike, then the rest of the story is the same as for spike jitter. Hypotheses about temporal resolution are translated into intrainterval location likelihoods. A variety of statistics (e.g., synchrony counts, as in Section 2.1, or repetitions of precisely timed spike sequences, as in Section 4.1) can then be used to construct exact tests for the validity of the null hypothesis. (See Amarasingham et al. 2011 for further simulation experiments illustrating the uses of pattern jitter and the role of the choice of *R*).

#### 4 Jitter Analysis of Three Motorcortical Neurons

Three neurons (designated N1, N2, and N3) were selected from simultaneous multineuronal records extracted from the measurements of a chronically implanted, 100-electrode array (1.0-mm electrode length; 400-μm interelectrode separation) in primary motor cortex (MI) of the monkey (Hatsopoulos et al. 2007). The monkey was operantly trained to move a cursor to targets projected onto a horizontal, reflective surface in front of the monkey. At any one time, a single target appeared at a random location in the workspace, and the monkey was required to move to it. As soon as the cursor reached the target, the target disappeared and a new target appeared in a new, random location (see Amarasingham et al. 2011 for more details on spike sorting and experimental protocol).

There were 391 successfully completed trials, each with a sequence of several targets. Spike times were discretized to 1/30 ms and were analyzed for the duration of the first second for each neuron of each successful trial. The three neurons, N1, N2, and N3, were recorded from separate electrodes. All three had relatively low firing rates: 5.39, 5.12, and 4.28 Hz, respectively. Spike-time rasters and poststimulus time histograms (PSTH) are shown in Fig. 6*A*. The three shuffle-corrected cross-correlation histograms (N1/N2, N1/N3, and N2/N3), along with shuffle-derived 95% acceptance bands (as described in Fig. 1), are shown in Fig. 6*B*, *left*.

There is evidence in the cross-correlation histograms for fine-temporal correlations between N1 and N2 as well as between N1 and N3. Analyses with interval and pattern jitter essentially rule out explanations for the excess synchrony that are based on slow but correlated fluctuations in firing rates and trial-to-trial variation (Fig. 6*B*, *middle*), or “history effects,” such as relative or absolute refractory periods, or bursting (Fig. 6*B*, *right*). (Although these fine-temporal correlations cannot be reasonably explained as artifacts of trial-to-trial variation or statistical dependencies among spikes within single neurons, our purpose here is to demonstrate jitter techniques rather than to argue for the existence or role of fine-temporal structure. Thus we want to be clear that these three neurons were not chosen randomly from a library of recordings. To the contrary, they were selected for the specific purpose of illustration.)

##### 4.1 Beyond synchrony: other measures of precision.

In principle, an ensemble of jittered spike trains, constrained to reproduce length-*R* local spike statistics and coarse (trial dependent) time-varying firing rates, can be used to explore any measure of temporal precision, not just spike synchronies. To illustrate, we looked at the recurrences of specific, precisely timed sequences of three spike events (“triplets”) within neuron N2, throughout the 391 successfully completed trials (a total of 1,752 s of data containing 8,455 spikes). Abeles and colleagues offered repetitions of precisely timed spike sequences (Abeles and Gerstein 1988; Abeles et al. 1993) as evidence for a hypothesized collective neuronal dynamic known as a synfire chain. They reported that the numbers of repetitions of such spike sequences were excessive (Abeles et al. 1993), measured against what would be expected from Poisson-like statistics. On the other hand, Oram et al. (1999) and later Baker and Lemon (2000), both using spike resampling methods, found that the number of repetitions in their data, of triplets as well as other precisely timed sequences, was adequately explained by taking into account the distribution of nearest-neighbor interspike intervals and inhomogeneous firing rates.

There are many statistics involving repeating triplets that could be examined for evidence of fine-temporal structure (Abeles and Gat 2001; Baker and Lemon 2000; Oram et al. 1999) We chose to look at the maximum number of repetitions of any triplet (Date et al. 1999), reasoning that an extremal statistic is likely to be particularly sensitive to repeatable and precise influences on spike timings. The N2 recordings contained 15 triplets (*t*_{1}, *t*_{2}, *t*_{3}) with interspike times *t*_{2} − *t*_{1} = 23 ms and *t*_{3} − *t*_{2} = 27 ms, and this was the maximum number of repetitions of any triplet in the record. Pattern jitter points to significant rate changes occurring on a scale of about 64 ms (or finer), or significant interspike interactions that extend at the least 8 ms, or both (see Fig. 7).

Obviously, triplet repetitions could be substituted with other measures of precision, dictated by the experiment and the experimental focus. Spike sequences involving more than three spikes, or more than one neuron, and statistics such as the total number of repeating sequences (not just the maximum number of repeated triplets), are some of the choices, but really any function of the recorded data (i.e., any “statistic”) can be used to test for fine-temporal structure with the same resampling method. Naturally, different statistics have different power, depending on the hypothesis being tested. Repetitions of multispike sequences make a sensible statistic when looking for synfire chains; millisecond synchrony is sensible when looking for common input. In any case, the usual caution applies: it is unwise and poor practice to use a statistic suggested by the data itself, at least for any purpose beyond exploratory analysis. In fact, it is arguably good practice to use simple or at least clearly motivated statistics; an uninitiated reader may be more likely to suspect that an exotic, unmotivated statistic has been “fished out” of the data, possibly for good reason.

##### 4.2 Temporal resolution and the neural representation of behavior.

When do precisely timed events, such as millisecond synchronies or specific patterns of interspike intervals, carry information about sensory stimuli or motor actions? What role, if any, can jitter methods play in assessing an apparent relationship between a fine-temporal event (e.g., millisecond synchronies) and a response or stimulus variable (e.g., the direction of movement or the identity of a target object)? Although jitter-based hypothesis testing is not, per se, a tool for measuring information or quantifying dependence, it can still be useful in these contexts for evaluating whether a relationship is indeed the result of fine-temporal structure. To choose a few prominent examples, Jones et al. (2004) and Butts et al. (2007) used jitter techniques to assess the contribution of spike timing to the ability of a reconstruction model to predict stimuli from spike trains in the rat trigeminal system and cat lateral geniculate nucleus, respectively. Fujisawa et al. (2008) employed a jitter principle to identify and subsequently study the dynamics of putatively monosynaptic activity in a large population of simultaneously recorded rat medial prefrontal cortex neurons.

Here we reanalyze the data from neurons N1, N2, and N3 to illustrate the application of pattern jitter for exploring the relationship between synchrony events and direction of movement. For a given pair of neurons, let *S* = 1 indicate the occurrence of a synchrony and *D* = *d* indicate the instantaneous movement in the direction *d*. One way to quantify the relationship between synchrony and movement direction is through the ratio θ(*d*) = *P*(*D* = *d*, *S* = 1)/[*P*(*D* = *d*)*P*(*S* = 1)]. If θ(*d*) does not vary with *d*, then θ(*d*) is identically one and movement direction and synchrony are statistically independent. If θ(*d*) varies with *d*, then there is a statistical relationship between synchrony and movement direction such that directions *d* with large values of θ(*d*) are associated with high probabilities of synchrony. In this sense, θ(*d*) can be viewed as a synchrony-based tuning curve. [Amarasingham et al. 2011 contains further discussion of the interpretation of θ(*d*), which arises as a fundamental quantity in both statistics and information theory. Our point, however, is not to argue for θ(*d*) as a uniquely well-suited measure of the relationship; there are many others, some more model based, that explicitly attempt to separate, or identify, the relative contributions from firing rates and synchrony in the prediction of direction.]

Figure 8 shows estimates of θ(*d*) for synchronies defined by each of the three neuron pairs. In each case, the estimated θ(*d*) modulates strongly with *d*, indicating a relationship between precise millisecond synchrony and movement direction. Synchronous spikes between neurons N1 and N2 are more common when the monkey is moving in the downward and rightward direction, and more common between neurons N1 and N3 with movement in the upward direction. There are other notable features, and in fact each pair of neurons produces several peaks in the directional tuning curve. What should be made of these relationships? Are they evidence for a tight statistical, and perhaps even mechanistic, connection between the fine-temporal structure of spike patterns and movement direction? If so, then we would expect these relationships to be largely altered by jittering the original spikes. On the other hand, if the same relationships are largely preserved after jittering, then it seems likely that they are primarily a result of the coarse temporal dynamics of spiking and not additional precise spike timing: we are merely viewing these coarse dynamics through their effects on precise millisecond synchrony.

Consider the prominent peak at about *d* = 120° in the direction-tuning curve based on N1/N2 synchronies (Fig. 8). Our estimator of θ(120) is a function of the data, i.e., a statistic, and in this regard no different from, say, a synchrony count or maximum occurrence of precisely timed triplets. In fact, our estimate of the curve θ(*d*) across varying *d* is no different from a cross-correlation histogram across varying lags. We used these estimators as test statistics to evaluate the pattern-jitter null hypothesis (*R* = 100 ms with 20-ms jitter intervals) in exactly the same way that we used the cross-correlation histogram and the *lag 0* synchrony counts in Fig. 6*B*. [Strictly speaking, we are conditioning not only on the observed patterns and their interval locations but also on the sequence of directions *D*(*t*), because these too remain fixed in the subsequent jitter experiments.] Under this null hypothesis, we would expect the observed estimate of θ(*d*) to be typical of the ensemble of identical estimators derived from the ensemble of surrogate spike trains generated by pattern jitter. This is indeed the case: there is nothing notable about the original record, vis-à-vis the estimate of θ(*d*), compared with the ensemble of estimators derived under the null hypothesis, as indicated by the acceptance bands in Fig. 8. Jitter analysis indicates that the relationship between fine-temporal structure and movement direction is secondary to coarsely timed differences in the locations of spikes.

In Fig. 6*B* we used the observed number of precise millisecond synchronies as a test statistic to reject the null hypothesis of no fine-temporal structure. In Fig. 8, we used the estimated tuning curve, θ(*d*), as a test statistic for the same null hypothesis, this time failing to reject. In these data, there is strong evidence for fine-temporal structure that is not explained by the coarse placement of spikes, but there is little to suggest that fine-temporal structure adds to coarse structure for the purpose of predicting movement direction. Quantitative conclusions about why the data reveal fine-temporal structure using one test statistic but not another require statistical models incorporating varying degrees of precise timing. More generally, what is needed to go further is a formulation not only of the null hypothesis but also of its many alternatives.

#### 5 Summary and Discussion

It has by now been widely observed that the classical distinction between rate and temporal coding lies on a continuum and is more generally an issue of time scale (Dayan and Abbott 2001; Rieke et al. 1997). In vivo evidence regarding timing (Tiesinga et al. 2008) is commonly drawn from statistical analyses of correlations between environmental or behavioral signals and spike counts measured on various time scales. For example, many investigators model neurons with stationary or time-varying firing rates for each trial and assume that neurons are firing independently, given their firing rates (Frostig et al. 1990; Martignon et al. 1995; Prut et al. 1998; Villa et al. 1999). Estimated firing rates are then incorporated into a bootstrap hypothesis test of such models (Aertsen et al. 1989; Grün et al. 2002a, 2002b; Samonds et al. 2006; Ventura et al. 2005a; see also Ito and Tsuji 2000) wherein fine-temporal statistics such as synchrony counts (see also Abeles and Gerstein 1988; Abeles et al. 1993) are utilized as the test statistic. Another approach along similar lines is to compare observed synchrony counts with those expected if all trial pairings were equally likely, as in the shuffle predictor (Constantinidis et al. 2002; Hirabayashi and Miyashita 2005; Perkel et al. 1967). The key underlying assumption (null hypothesis) of these techniques is that neurons act independently of each other, given a probabilistic structure that is common to all trials.

However, many authors (Bair et al. 2001; Ben-Shaul et al. 2001; Brody 1998) have pointed out that the relevance of such an analysis to the issue of time scale is unclear. The basic, and quite plausible, counterexamples are forms of trial-to-trial variability, such as latency variations (Ventura 2004) or slowly varying common input, such as that due to internal variables not controlled by the experiment (Baker and Gerstein 2001; Czanner et al. 2008; Grün et al. 2003; Kass and Ventura 2006). One proposal for dealing with such concerns is to incorporate trial-varying firing rates into parametric models of the spiking process, which can again be tested, for example, by estimated trial-varying firing rates incorporated into bootstrap tests (Bair et al. 2001; Ben-Shaul et al. 2001; Brody 1999; Pauluis and Baker 2000; Ventura et al. 2005b). Of course, in any such model the number of parameters that must be estimated from the data is proportional to the number of trials, and for this reason, at least, standard statistical issues concerning model complexity become very delicate. Implicitly or explicitly, stronger modeling assumptions are required. This raises a question of interpretation. Does the rejection of such a model, using synchrony counts as a test statistic, for example, indicate that there is fine-temporal structure, or does it reflect, instead, the particulars of the model?

Jitter methods are a statistically rigorous alternative approach. In place of estimating firing rates, an ensemble of spike processes is defined through random perturbations of observed spike timings, with the intention of preserving coarse characteristics of the spiking process (e.g., fluctuations in firing rate) while disrupting fine-temporal structure. Conclusions about the existence of fine-temporal structure are then derived from a comparison of the original spiking process with those in the randomly generated ensemble.

Jitter is an application of conditional modeling. The task of modeling the structure and mechanisms of the spike-generating process are largely sidestepped by formulating hypotheses in terms of the conditional distribution on the locations of spikes or spike patterns, given their observed counts in coarsely defined bins (the jitter window). Local firing rates are preserved to a specified resolution, determined by the width of the window, and local spike dependencies are preserved to a specified range, determined by the history parameter *R*. How surprising are apparent coincidences of spike timing, such as a large number of synchronous spike pairs or multiple repetitions of precisely timed spike sequences? An ensemble of jittered spike trains produces a corresponding ensemble of coincidences. By comparing the original observation with the ensemble, hypotheses about precision are put to the test: do the observations in the recorded spike train depart from the range of values that could be reasonably expected under the hypothesis that spike times are unconstrained beyond slowly varying firing rates and local interspike interactions?

Jitter is a tool for exploring hypotheses about temporal resolution. It should not be expected to address questions about mechanism that go beyond, or are unrelated to, issues of timing. Thus, for example, statistical or physiological models for spike generation, such as Poisson-process models or more general point-process models (e.g., Ventura et al. 2005b), serve a different purpose. In our view, it is best not to confuse this type of model building and parameter estimation with testing hypotheses about temporal resolution or with exploring data for evidence of excess synchrony or surprising repetitions of spike sequences. Indeed, the two are nearly orthogonal and can be complementary. In Section 4.2 we estimated synchrony-based direction-tuning curves by using standard and time-honored statistical methods. Jitter played no role, except in the interpretation of these curves in terms of temporal resolution. In that it focuses only on the timing issue, jitter is a good way to evaluate the statistical significance of the estimated tuning curves or, for that matter, any other estimated relationship between fine-temporal structure and behavior.

As a final remark on conditional modeling, we note that this point of view serves to clarify the relationship between spike or pattern jitter and bootstrap methods. The two are superficially similar and easily confused. Bootstrap samples are generally interpreted as samples from the full (i.e., unconditional) distribution on the data, whereas jitter gives samples from a hypothesized conditional distribution. The interpretations are completely different. Bootstrap samples, for example, might be used to investigate the variability in observed firing rates. It would make no sense to use jitter samples for this purpose, since by design, slowly varying firing rates are preserved, almost exactly, from one surrogate spike train to another. (See Amarasingham et al. 2011 for additional remarks on the distinction between conditional and unconditional inference.) Although the primary difference is one of interpretation, we also note that proper use of bootstrap requires good estimates of the full data distribution. As we have pointed out, this is a strong requirement, especially in neurophysiology. Jitter avoids the estimation problem by narrowly focusing on scientifically relevant questions about conditional distributions.

There are many variations on the theme. The jitter distribution can be uniform in a window, or tilted by as much as a prespecified amount, to probe for lower bounds on the intrinsic rate of change of spike probabilities. A model that purports to predict a stimulus or a motor response from fine-temporal neuronal events, such as synchronies or precisely timed spike sequences, can be evaluated for its sensitivity to temporal precision. We have reviewed these and many other applications of the jitter method. In each instance, the statistical assumptions are revealed and the statistical correctness is established through the conditional modeling perspective.

## GRANTS

This work was supported in part by National Science Foundation (NSF) Grants DMS-0240019 (to M. T. Harrison) and DMS-1007593 (to M. T. Harrison and S. Geman), an NSF Postdoctoral Fellowship in Biological Informatics (to A. Amarasingham), National Institutes of Health Grants 2R01 MH064537 (to M. T. Harrison) and R01 NS45853 (to N. G. Hatsopoulos), Office of Naval Research Contract N000141010933 (to S. Geman), and Defense Advanced Research Projects Agency Contract FA8650-11-1-7151 (to S. Geman and M. T. Harrison).

## DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the author(s).

## AUTHOR CONTRIBUTIONS

Author contributions: A.A., M.T.H., N.G.H., and S.G. conception and design of research; A.A., M.T.H., and S.G. analyzed data; A.A., M.T.H., N.G.H., and S.G. interpreted results of experiments; A.A., M.T.H., and S.G. drafted manuscript; A.A., M.T.H., N.G.H., and S.G. edited and revised manuscript; A.A., M.T.H., N.G.H., and S.G. approved final version of manuscript; M.T.H. prepared figures; N.G.H. performed experiments.

## ACKNOWLEDGMENTS

We are grateful to A. Date for insightful comments and suggestions, to B. Anderson and W. Truccolo for feedback while exercising the methods and the software on their data, and to G. Buzsáki, K. Diba, and A. Renart for comments on an earlier version of the manuscript.

## Footnotes

↵1 Some authors prefer alternative terms for jittering, such as “dithering” (Gerstein 2004; Grün 2009; Pazienti et al. 2007, Pazienti et al. 2008), “teetering” (Shmiel et al. 2005), or “artificial jitter” (Rokem et al. 2006), etc., presumably to distinguish from another use of the word “jitter” as the intrinsic temporal variability in individual spikes. For example, in some situations involving highly reliable (Billimoria et al. 2006) or simulated or modeled spike trains (Pazienti et al. 2007), individual spikes can unambiguously be placed in correspondence with one another across trials. In that case, the temporal variability in a spike's timing, under this correspondence, can be quantified directly and is commonly called the “spike jitter” (Billimoria et al. 2006; Victor and Purpura 1996). In this article, we continue to use “jitter” in its resampling sense, leaving these two uses of the term to be disambiguated by context.

- Copyright © 2012 the American Physiological Society