Journal of Neurophysiology

Inferring Spike Trains From Local Field Potentials

Malte J. Rasch, Arthur Gretton, Yusuke Murayama, Wolfgang Maass, Nikos K. Logothetis

Abstract

We investigated whether it is possible to infer spike trains solely on the basis of the underlying local field potentials (LFPs). Using support vector machines and linear regression models, we found that in the primary visual cortex (V1) of monkeys, spikes can indeed be inferred from LFPs, at least with moderate success. Although there is a considerable degree of variation across electrodes, the low-frequency structure in spike trains (in the 100-ms range) can be inferred with reasonable accuracy, whereas exact spike positions are not reliably predicted. Two kinds of features of the LFP are exploited for prediction: the frequency power of bands in the high γ-range (40–90 Hz) and information contained in low-frequency oscillations (<10 Hz), where both phase and power modulations are informative. Information analysis revealed that both features code (mainly) independent aspects of the spike-to-LFP relationship, with the low-frequency LFP phase coding for temporally clustered spiking activity. Although both features and prediction quality are similar during seminatural movie stimuli and spontaneous activity, prediction performance during spontaneous activity degrades much more slowly with increasing electrode distance. The general trend of data obtained with anesthetized animals is qualitatively mirrored in that of a more limited data set recorded in V1 of non-anesthetized monkeys. In contrast to the cortical field potentials, thalamic LFPs (e.g., LFPs derived from recordings in the dorsal lateral geniculate nucleus) hold no useful information for predicting spiking activity.

INTRODUCTION

In a typical electrophysiology experiment, the signal measured by an electrode placed at a neural site represents the mean extracellular field potential (mEFP) from the weighted sum of all current sinks and sources along multiple cells. If a microelectrode with a small tip is placed close to the soma or axon of a neuron, then the measured mEFP directly reports the spike traffic of that neuron and frequently that of its immediate neighbors as well. If the impedance of the microelectrode is sufficiently low and its exposed tip is a bit farther from a single large pyramidal cell, so that action potentials do not predominate the neural signal, then the electrode can monitor the totality of the potentials in that region. The EFPs recorded under these conditions are related both to integrative processes (dendritic events) and to spikes generated by several hundred neurons.

The two different signal types can be segregated by frequency-band separation. A high-pass filter cutoff of about 300–500 Hz is used in most recordings to obtain multiple-unit spiking activity (MUA), and a low-pass filter cutoff of about 300 Hz to obtain the so-called local field potentials (LFPs). Numerous experiments have presented data indicating that such a band separation does indeed underlie different neural events (for references see, e.g., Logothetis 2003).

In summary, depending on the recording site and the electrode properties, the MUA most likely represents a weighted sum of the extracellular action potentials of all neurons within a sphere whose radius is about 140 to 300 μm, with the electrode at its center (Henze et al. 2000). Spikes produced by the synchronous firings of many cells can, in principle, be enhanced by summation and thus detected over a larger distance (Arezzo et al. 1979; Huang and Buchwald 1977). In general, experiments have shown that large-amplitude signal variations in the MUA range reflect large-amplitude extracellular potentials and that small-amplitude fast activity is correlated with small ones (Buchwald and Grover 1970; Gasser and Grundfest 1939; Grover and Buchwald 1970; Hunt 1951; Nelson 1966).

The low-frequency range (i.e., the LFPs) of the mEFP signal, on the other hand, represents mostly slow events reflecting cooperative activity in neural populations. Initially these signals were thought to represent exclusively synaptic events (Ajmone-Marsan 1965; Buchwald et al. 1965; Fromm and Bond 1964, 1967). Evidence for their origin was often gathered from current-source density (CSD) analysis and combined field potential and intracellular recordings (Mitzdorf 1985; Nadasdy et al. 1998). Mitzdorf suggested that LFPs actually reflect a weighted average of synchronized dendrosomatic components of the synaptic signals of a neural population within 0.5–3 mm of the electrode tip (Juergens et al. 1999; Mitzdorf 1987). Later studies, however, provided evidence of the existence of other types of slow activity unrelated to synaptic events, including voltage-dependent membrane oscillations (e.g., Kamondi et al. 1998) and spike afterpotentials. Taken together, LFPs represent slow waveforms, including synaptic potentials, afterpotentials of somatodendritic spikes, and voltage-gated membrane oscillations, that reflect the input of a given cortical area as well as its local intracortical processing, including the activity of excitatory and inhibitory interneurons.

Given the different natures of LFPs and MUA, we felt that it would be interesting to address the question of whether one can infer spiking of neurons from the locally measured field potentials. Herein we address this question in a straightforward manner. We use methods derived from the field of machine learning in an attempt to infer exact spike timings from the underlying LFPs. We compare the accuracy of spikes trains predicted by supervised learning algorithms on a wide range of recordings from the primary visual cortex (V1) as well as from the lateral geniculate nucleus (LGN) of anesthetized and non-anesthetized macaques and investigate what kinds of features of the LFP are important for inferring spikes from LFP.

METHODS

Data acquisition

Electrophysiological data recorded from nine anesthetized and two non-anesthetized monkeys (Macaca mulatta) are included in the present study. All animal experiments were approved by the local authorities (Regierungspraesidium) and are in full compliance with the guidelines of the European Community (EUVD 86/609/EEC) for the care and use of laboratory animals. Surgical procedures are described elsewhere (Logothetis et al. 2002) together with hardware details of the recording setup.

To perform the neurophysiological recordings in anesthetized monkeys, the animals were anesthetized [remifentanil (typically 1 μg·kg−1·min−1)], intubated, and ventilated. Muscle relaxation was achieved with mivacurium (5 mg·kg−1·h−1). Body temperature was kept constant, and lactated Ringer solution was given at a rate of 10 ml·kg−1·h−1. During the entire experiment, the vital signs of the monkey and the depth of anesthesia were continuously monitored. Drops of 1% ophthalmic solution of anticholinergic cyclopentolate hydrochloride were instilled into each eye to achieve cycloplegia and mydriasis. Refractive errors were measured and contact lenses [hard poly(methyl methacrylate) lenses by Wöhlk Contactlinsen, Karlsruhe, Germany] with the appropriate dioptric power were used to bring the animal's eye into focus on the stimulus plane. Simultaneous recording of neural activities were made from the primary visual cortex (V1) using 8–16 electrodes configured in 4 × 4 or 2 × 8 matrices in a grid of 1–2 mm. Electrode tips were typically (but not always) positioned in the upper or middle cortical layers. The impedance of the electrode varied from 300 to 800 kΩ. In the case of simultaneous LGN recording an additional set of drives, usually consisting of two to four electrodes, was additionally positioned. The electronic interface, including drives, holder, and preamplifier, was custom designed to minimize cross talk of signals between electrodes (typically ∼1 ppm). The signals were amplified and filtered into a band of 1–8 kHz (Alpha Omega Engineering, Nazareth, Israel) and then digitized at 21 kHz with 16-bit resolution (National Instruments, Austin, TX), ensuring enough resolution for both local field and spiking activities. Binocular visual stimulation was provided through a two-fiber optic system (Avotec, Stuart, FL) after fine alignment to each of the animal's foveas by a modified retinoscope coupled with a stimulus projector holder.

In the case of the anesthetized animals we differentiate between two different conditions: spontaneous activity (“spo”) and movie-driven data (“stm”). In the former the input screen is blank for about 5 min; in the latter a 4- to 6-min segment of a commercially available movie is shown. Movie frames were synchronized with the refresh rate of the monitor (60 Hz, two synchronizations per movie frame) and covered 7–12° of the visual field. Most of the electrodes were confirmed to have a receptive field within the movie presentation area (see Fig. 1B for an example). Multiple trials of movie presentations and spontaneous activity are run within one recording session (intermingled with recordings of other stimuli not considered here). From these data we include 1,304 recorded time series in the present study, which we call trials throughout this article. The data set consisted of recordings from nine animals collected in 12 recording sessions. From each session we take five repeats of movie presentation and five repeats of spontaneous activity trials (with the exception of j97nm1, where only one movie trial is available). To avoid any subjective selection bias all measured electrode channels per session are included. This results in 670 trials for spontaneous activity and 634 for movie stimulus recorded using 134 electrode placements. Movies are identical within a session but may differ between sessions.

FIG. 1.

A: representative electrode recording from Session a98nm5 of an anesthetized monkey. Top plot: the instantaneous firing rate of an experiment during movie presentation from a primary visual cortex (V1) electrode in a small time region. Movie presentation starts after a blank period at 30 s. Recording time of 170-s duration starting 5 s after movie onset is used for prediction performance evaluation and is called a trial (see methods). Bottom plot: the local field potential (LFP) trace corresponding to the V1 electrode above. B: the arrangement of receptive fields relative to the movie presentation area for Session a98nm5, where simultaneous V1 and lateral geniculate nucleus (LGN) recordings are available. Other sessions have similar electrode arrangements.

In three sessions (two anesthetized monkeys) no more than four electrodes were simultaneously placed in the LGN. Thus stimuli reflected in these data (100 trials) are exactly identical to those in corresponding recordings of the V1 data. This data is separated again according to the conditions spontaneous activity (“spo(L)”) and movie-driven activity (“stm(L)”). Data from non-anesthetized monkeys are more limited and were included in the present study only to corroborate results for the data described earlier. They are recorded using one to four tetrodes from chronic implants penetrating V1 (for a detailed description of surgical methods and recording setup see Tolias et al. 2007) in a total of 56 trials. Unlike the data from anesthetized animals the stimulus conditions here are mixed, with spontaneous activity (no task) and a fixation task showing gratings of different orientations. This last data is labeled “awake” in the following. All data are processed in the same way as outlined in the next section.

PROCESSING.

The data preprocessing steps are as follows. Electrode signals were decimated to 7 kHz. Spiking activity is inferred from high frequencies of the resulting signal (see following text). The recording hardware introduces a high-pass filter with a cutoff <1 Hz; 1 Hz is thus the lowest frequency considered here.

The 7-kHz signal is low-pass filtered with a cutoff frequency of 250 Hz and resampled first to 500 Hz for computational convenience. The resulting signal is low-pass filtered at 90 Hz to derive local field potentials (LFPs). For low-pass filtering we use a custom finite impulse response filter (FIR): a Kaiser window FIR filter with 60-dB attenuation in the stopband, a 0.01-dB passband ripple, and a transition band of 1 Hz. To eliminate possible phase shifts, signals were filtered forward and backward (using the MatLab filtfilt function). The signal is then resampled to a final sampling rate of 200 Hz.

Good properties of FIR filters are won at the expense of large filter sizes (a few seconds). However, since we discard leading and trailing portions of >15 s of each trial, filter on- and offset artifacts are of no concern here.

SPIKE EXTRACTION.

Spike times are detected by applying a threshold to the high-pass filtered 7 kHz signal described earlier (fourth-order Butterworth, cutoff frequency 500 Hz). Since this MUA signal is usually asymmetric, the detection threshold is automatically applied to that side where spike waveforms exhibit greater deflection. To avoid dependence of the size of spikes the threshold is applied at 3.5SD (σ) of the “noise component” of the MUA signal. σ is estimated by calculating the SD of the signal, neglecting the 4.55% (>2σ) absolute highest values divided by the percentage of variance that is kept in general, when setting the probability of absolute values >2σ to zero.

Visual inspection confirms that spikes are well detected. If the assumption of a Gaussian “noise component” is correct, then the rate of wrongly detected spikes is about 1.6 Hz (for σ = 3.5 SD). Note that the resulting spike trains will most likely include spikes from multiple neurons (see discussion). Because most recordings were done with single-tip electrodes we do not use any kind of spike sorting.

Learning to infer spike trains

The learning algorithm has to learn to map from LFP waveforms (or other LFP features) to spikes. Ideally, the learning algorithm should output all predicted real-valued spike timings at once if the LFP time course is given as input, although this task requires too much data. Instead, we simplify the task by assuming that spikes are independent and that the spike-to-LFP relationship remains constant over time. With these assumptions one can use a binary classifier, which yields the prediction of a spike (or no spike) at time t. Concatenating the prediction for each t results in a predicted spike train for a given LFP. Note that the independence assumption does not imply that predicted spike trains are necessarily uncorrelated because temporal correlation can be induced by the underlying LFPs.

In supervised fashion the binary classifier is trained on a set of training examples and tested on a distinct test set. We train a binary classifier on LFP features summarized in the sample vectors xi, i = 1, …, L, to predict the label yi ∈ {1, −1}. The index i is the ith point of the discrete LFP time series with sampling frequency 1/Δ at recording time ti = iΔ + t0. Thus yi = 1 states that there occurs (at least) one spike within time bin ti and yi = −1 indicates no spike. In this framework prediction is temporally restricted to the sampling resolution of the LFPs, making it necessary to bin the spike timings. The sampling interval Δ is 5 ms, in accordance with the sampling frequency of the LFP signal (200 Hz).

LEARNING ALGORITHMS.

A support vector machine (SVM) was used (Vapnik 1999) as our learning algorithm. For a more detailed introduction to SVMs see, for example, Bishop (2006), Burges (1998), and Schölkopf and Smola (2002). SVMs perform binary classification in a supervised fashion.

Briefly, the model can be stated as follows (for details see Bishop 2006) Math(1) where one looks for the decision boundary or weight vector w; b is a bias term and Φ is a projection into a space of features. SVMs choose the hyperplane that has the widest margin between both classes rather than an arbitrary separating hyperplane. This is achieved by enforcing appropriate constraints in the optimization. For nonseparable problems, such as our real-world data, one introduces the concept of soft margins, i.e., in the optimization one now allows for incorrectly classified examples, where an additional parameter C regulates the penalty.

SVMs have the power to do nonlinear separation (seen from the perspective of input space) by choosing an appropriate kernel that implicitly defines the feature map Φ. Herein nonlinear radial basis function (RBF) kernels are used.

As a simple alternative to SVMs we used standard linear regression (with a constant bias term) on the label vector and the samples (see, e.g., Bishop 2006). Briefly, using a linear model hreg(xi) = wTxi + b we calculated the optimal weight vector w* by minimizing the mean squared error 〈[hreg(xi) − yi]2〉 on the training samples. Class labels on the test set were obtained by thresholding with the sign function, i.e., yj = sign [h*reg(xj)].

EXTRACTION OF LFP FEATURES.

An LFP feature could be any aspect of the LFP that one might deem helpful for inferring whether there is a spike at ti. In our analysis, we used the LFP at different lags with respect to ti, its power at different frequencies, and the phase of oscillations at particular frequencies (also at different lags). Multiple features are simply concatenated in the sample vector xi. Note that each dimension of the resulting samples {xi} is normalized to zero mean and unit SD. Features are extracted prior to dividing the samples into test and training sets.

If g(ti) represents the (normalized) voltage at sampling bin ti, a time feature may be defined as Tk(ti) := g(ti+k), where τ = kΔ represents the time lag (we neglect boundaries to simplify the description). Features Tk(ti) simply represent the LFP time course relative to sample time ti.

Another type of feature, which we denote Pf,k(ti), is the estimated power at (center) frequency f of the LFP time course at time ti+k. To obtain an estimate for the power at a given frequency and time, we calculated the spectrogram, using the multitaper approach introduced by Thomson (Jarvis and Mitra 2001; Percival and Walden 2002; Thomson 1982). Because spikes are single events on the timescale of Δ, we chose a high temporal resolution at the expense of frequency resolution. We set the moving window to 150 ms and the time–bandwidth product to 1.6, which implicitly sets the half-bandwidth to W = 10.67 Hz. Spectral estimation was averaged over two Slepian tapers. Because this window setting does not allow accurate power estimation <20 Hz we used larger windows for bands <20 Hz (500 ms) and <6 Hz (2 s). This reduced the half-bandwidth to 3.2 and 0.8 Hz for frequencies <20 and 6 Hz, respectively. We also tried Morlet wavelets with variable bandwidth per frequency, but this did not alter prediction performance.

To access phase information at particular frequencies of the LFP, we first band-pass filtered the recorded signal with the FIR Kaiser filter (described earlier) with a bandwidth of 2 (Fig. 6) or 4 Hz (Fig. 7), and then used the Hilbert transform to extract an instantaneous phase φf(ti) at frequency f. From these signals we defined phase features φf,k(ti) := cos [φf(ti+k)] having a lag of τ = kΔ. These features have phase information identical to that of the band-passed signals but are devoid of any amplitude modulation (AM). Additionally, we used φ̂f,k := sin [φf(ti+k)] in the feature analysis of Fig. 6 to help the classifier linearly extract phase locking at phases where the cosine would be near the zero crossing.

PERFORMANCE MEASURES.

The kappa measure was used as a measure of performance (Cohen 1960). Let pl,r be the fraction of samples having target label l ∈ {−1, 1} and predicted label r ∈ {−1, 1} and let ql and q̃r be the fraction of samples in the test set that have the label l in the target or the label r in the prediction, respectively. Then the chance level for classification is given by ρc = q11 + q11. If we define ρ0 = p−1,−1 + p1,1 to be the overall fraction of correctly classified samples (both positive and negative), then κ is given by Math(2) This measure is a normalized above-chance classification rate. It can be easily seen that κ equals zero if prediction is at chance level (i.e., ρ0 = ρc) and equals one if the predicted classification is perfect (i.e., ρ0 = 1).

Another performance measure is the Spearman rank correlation rσ between smoothed predicted spike trains and target spike trains. This might give a more intuitive picture of the prediction quality. If not stated otherwise, spike trains are smoothed by a Gaussian kernel of width 25 ms.

Yet another measure for prediction quality is the mutual information between class labels. The mutual information (MI) between target spikes S and the prediction outcome of a classifier C(F) using LFP features F and labels L := {−1, 1} is Math(3) where we take the probabilities defined earlier. This estimation of MI is different from nonparametric approaches in that it can access dependence that is only in reach of the classifier; thus one has to make sure that the classifier captures the main aspects of its dependence. Note that we use the naive estimator for mutual information [without bias correction (Panzeri et al. 2007)]. Since all MI value calculations involve an identical number of bins—that is, two, one for each class—we can nevertheless safely compare results even for classifications with different numbers of features. However, the absolute MI values might be biased.

To access redundancy, synergy, and independence of information (Pola et al. 2003; Schneidman et al. 2003) conveyed by two features F1 and F2 about the spiking activity S, we estimate mutual information using two classifiers trained on features F1 and F2 individually, yielding I(S; F1):= I[S; C(F1)] and I(S; F2) := I[S; C(F2)]. Then a third classifier is trained on both features jointly, yielding I(S; F1, F2) := I[S; C(F1, F2)]. If both features carried independent information from the perspective of the classifier, both features together would convey identical information as individual features; i.e., I(S; F1, F2) = I(S; F1) + I(S; F2). If both features were related by a one-to-one mapping (completely redundant information about the spikes), then all terms would be equal: I(S; F1) = I(S; F2) = I(S; F1, F2). If the two features did not carry information individually, i.e., I(S; F1) = I(S; F2) = 0, but carried information together, I(S; F1, F2) > 0, they would be termed (completely) synergistic. Thus we define a normalized degree of synergy of information about the spikes (as measured by the classification algorithm) as (Schneidman et al. 2003) Math(4) This measure ranges from −1 in the case of completely redundant information to 1 for completely synergic information. The measure syn (F1, F2|S) is zero if both features F1 and F2 convey independent information about the spikes S.

To analyze prediction accuracy on different timescales, we used spectral coherence (Jarvis and Mitra 2001). Spectra were again estimated via a multitaper approach designed for point events (Jarvis and Mitra 2001). Here the time–bandwidth product was set to TW = 3 using the average of K = 5 tapers, yielding a half-bandwidth of W = 0.001 Hz for T = 17 s.

PERFORMANCE EVALUATION.

We evaluated the prediction performance for each trial separately, using 10-fold cross-validation. We analyzed a 170-s region, avoiding the on- and offset of the movie stimulus. Spontaneous activity trials are also restricted to 170-s duration. In the case of tetrode recordings, performance is estimated as the average performance of the four wires of the tetrode.

Hyperparameters for the SVM algorithm were estimated as follows. The RBF kernel width ρ was selected by a heuristic procedure. We took ρ to be 1.77 (or 3.54) times the median distance of all Euclidean distances in the training set. For each trial we chose that C (and ρ) showing the best performance (averaged over 10 cross-validation runs on a logarithmic grid of 25 values from 0.25 to 400). We visually confirmed that this range is appropriate for our data (not shown). We used the libSVM library (http://www.csie.ntu.edu.tw/∼cjlin/libsvm/) for all SVM calculations.

Since the sample sizes were heavily biased toward the negative (nonspiking) class we randomly picked approximately the same number of samples of both classes from the training region. This effectively changes the loss function from equal loss to higher importance for spikes (about a fivefold increase, depending on the mean firing rate). We used 1,000 and 1,200 samples for spiking and nonspiking classes (or the maximum available in the training region with a constant class ratio) and empirically found this to be a good compromise between prediction quality and computational speed because >1,000 samples only marginally improved the results (not shown). Training the classifier on all possible samples was prohibitive due to the enormous sample size. We tried to use class biasing in the C parameters (Musicant et al. 2003), but this only increased computation time with little gain in prediction quality.

The test set was always a temporally contiguous region to avoid feature correlation of trained and tested samples that might lie nearby in time if a randomized set of samples were used.

FEATURE SELECTION ALGORITHM.

We now describe how to determine the usefulness of different features for spike prediction. Although features important for the SVM classifier are hard to interpret, in the case of linear regression (with squared loss) one can derive an analytical expression for the prediction error of a set of features involving only the STAs and correlation among features. Based on this prediction error, we derived an algorithm that forwardly selects a small subset of features out of a much larger pool of features. As subsequently explained, the selected subset will show minimal prediction error compared with other subsets with the same number of features. In that sense the subset of features selected by our algorithm represents the most useful features from a given pool. Because this feature selection can be efficiently done for huge feature pools, we restricted feature analyses to linear classification, rather than using the SVM classifier. This is not too restrictive in our case because linear classification achieves almost 90% of the performance of an SVM classifier (see results).

In linear regression we look for the weight vector w that has minimal error in a mean squared sense Math(5) where the brackets indicate averaging over all samples i. Minimizing the error is straightforward and results in the optimal weight vector1 w* of Math(6) provided that the estimated correlation matrix A: = 〈xixiT〉 has full rank. We note that we have a binary classification and thus yi ∈ {−1, 1}. Thus Math where m−1 and m1 are the class means for the nonspiking and spiking class, respectively, and c−1 and c1 are the number of samples in each class. With the condition that each feature is normalized to variance one and zero mean, Eq. 6 reduces to Math(7) with Math The minimal error is then given by Math(8) In other words, the error of the linear regression is dependent only on the STA m1 and the correlation among features. We note that if we restrict ourselves to the use of n features (the dimensions of xi) out of a pool of N > n features, the preceding equations remain valid if the correlation matrix and the mean are also restricted to these features only (that is, A has size n × n with rank n and m1 is n-dimensional).

To select a set of n important features we use the following iterative algorithm. We start with the feature that has the highest STA, i.e., f1 = arg maxj |(m1)j| (the variance of each feature being normalized to one). Assume now that n − 1 features are already selected. Then we search through all Nn + 1 remaining features and choose one that minimizes the error (Eq. 8), where the restriction of the correlation matrix is now enforced on the n features rather than on n − 1 (and analogously with m1). We stop this iteration when the desired number of features mN is selected.

This algorithm is highly efficient in finding a good set of features, since we need to calculate the correlation matrix only between the selected features and all other features (which costs much less effort than calculating it for all pairs).

RESULTS

This section is organized as follows. After showing the general spike-to-LFP relationships present in our data, we report the population performance for the task of predicting spike trains from LFP, focusing first on data from V1 of anesthetized monkeys (nine monkeys) collected during the presentation of 5 min of commercial movie stimuli and equally long periods of spontaneous activity. We then compare these results with a more limited set of sessions recorded from V1 of awake monkeys (two monkeys) and with results on data from LGN of anesthetized monkeys (two monkeys). Finally, we investigate which LFP features are important for the prediction task and which aspect of the spikes they code for.

Average spike to LFP-power and -phase relation

Figure 1 shows spiking activity and (normalized) LFPs of a representative electrode. Relationships between spiking activity and underlying LFPs are visualized in Fig. 2.

FIG. 2.

Spike–LFP relationships for one electrode in V1 of an anesthetized monkey during movie stimulation (AC) and across all recordings from V1 in anesthetized monkeys (D and E). A: spike-triggered average (STA) LFP. For significance levels interspike intervals (ISIs) are shuffled and the SD of the resulting STA is calculated. B: STA of the LFP spectrogram (see methods), with power series normalized to zero mean and unit SD. Power at high frequencies is clearly modulated by spiking activity, whereas power at lower frequencies shows only diffuse dependence on spikes. C: probability distribution of LFP phases at a particular spiking position. LFP phases are computed via Hilbert transform (1-Hz bands). Here all spikes over 30 repeats of movie-driven activity are included (same electrode as before). Note that the color map shows only a narrow range of probabilities and that values above or below the limits are truncated. Black dots indicate the preferred (i.e., mean) phase. No phase locking of spikes would result in a uniform distribution at 2% per bin. Although locking to low frequencies is strong, locking to high frequencies is only weak (but present). D: the average preferred phases for all electrodes across all anesthetized data individually for movie-driven activity (“movie”) and spontaneous activity (“spont”). Bars indicate SEs. The phase range containing half of the spikes around the preferred phase is indicated by the shaded area. E: the interpretation of phase, showing that spikes are locked at very low frequencies to the onset of a positive half-wave and at high frequencies to the valley.

In Fig. 2A the spike-triggered average (STA) of the LFPs of an example electrode during movie stimulus is plotted. Clearly, there is a linear relation between spikes and LFPs. One notes a sharp negativity at spike position at zero time lag and a prominent upswing for positive time lags, i.e., after spiking has occurred. Likewise, in the STA of the spectrogram (Fig. 2B) power is enhanced in the high-frequency range (40–90 Hz) during spiking activity. Enhancement of power at high LFP frequencies as a response to spikes is common among electrodes, stimulus conditions, and monkeys, as we will see in the next sections.

Figure 2C shows the probability of spiking activity at the oscillation phase of a particular LFP frequency for the same example electrode, but averaged >30 repeats of the movie presentation (∼120 min of recording time). One notes that the phases of all LFP frequencies are at least weakly related to spiking activity (Raleigh test of nonuniform angular distribution). Most strikingly, spikes are relatively tightly locked to phases of low frequencies (≤10 Hz). The generality of this behavior is illustrated in Fig. 2D, where the phase preferred by spikes is plotted as an average across all data from V1 (anesthetized animals). The average preferred phase shifts with frequency from the onset of a positive half-wave to the valley of the LFP oscillation (compare with Fig. 2E). This behavior is very consistent, regardless of whether activity is spontaneous or movie-driven. For some electrodes the preferred phase-frequency dependence is slightly different (as in the example of Fig. 2C for high frequencies). For a few electrodes the phase-to-spike relation seems to be mirrored at π (not shown).

The gray-shaded area in Fig. 2D shows the (average) phase range within which 50% of the spikes fall. It would be zero for perfect phase locking and π for no phase–spike relation. One notes that this range is somewhat smaller for low frequencies (0.85π), but approaches 0.98π for frequencies >20 Hz, indicating that the phase locking is far from perfect at all frequencies and is especially weak for high frequencies.

In summary, we have seen that there is indeed a consistent relationship between LFPs and spiking activity on average. In the next sections we ask to what degree it is possible to exploit these relations (and maybe other information available in the LFP) in a systematic way to infer spikes from the LFP.

Population prediction performance

In Fig. 3 a typical example of a predicted spike train is depicted together with the used LFP features. Figure 3, A and B shows 8 s of LFP spectrogram and the time course in the test region, respectively. Small vertical lines in Fig. 3B indicate spike times before binning to 5-ms resolution. Several interesting points can be noted. As expected from the LFP-to-spike relations discussed in the last section, spikes preferentially occur in the upswing and valleys of very low and medium LFP oscillations, as seen for instance at times 171 and 173.4 s in Fig. 3B. Additionally, the power of multiple frequencies is enhanced when a burst of spiking activity occurs, as suggested by Fig. 2B, but the frequency response to bursts is diffuse and variable (compare the burst at 173.4 s to that at 174.5 s). The clustering of spiking activity on a timescale of a few hundred milliseconds in this example is actually quite typical in our V1 data (see following text). For the single spikes in between the clusters no feature of the LFP is immediately predictive.

FIG. 3.

Example of spike prediction from LFP (anesthetized monkey, Session a98nm5, spontaneous activity). A: the (normalized) spectrogram of the 8 s of LFP activity. B: corresponding LFP time course and spiking activity. Spikes are indicated by marks before binning to the LFP resolution (5 ms). C: the binned target spikes and their spike density function (blue) together with the predicted spikes and their spike density function (red). The prediction is relatively good (κ = 0.40, r25 ms = 0.60) on this trial, but other trials show even better performance (compare with r25 ms values of other trials in Fig. 4B, “spo”). One notes that regions of high activity are well predicted, whereas the location of single spikes is less accurate. Classification is done with the support vector machine (SVM) radial basis function (RBF) classifier trained on the region 35–160 s using the same features as for the population analyses (Fig. 4).

Figure 3C shows target and predicted spiking activities. Prediction of spikes is made for individual sample times at a resolution of 5 ms using a set of LFP time course and frequency features (see following text). Concatenating the prediction over time yields a predicted spike train that is compared with the target spike train. One notes that the prediction captures, at least approximately, the overall structure of the spike train. The occurrence of bursts of spiking activity, which are associated with easily seen traces in the LFP time course and spectrogram, is well predicted. Nevertheless, the exact onsets and offsets of the bursts are somewhat inaccurate in the prediction. Even some smaller bursts and single spikes are closely predicted (172.5 s), although no clear mark in the LFP time course or spectrogram can be seen with the naked eye. However, their length (176.6 s) and exact position (176.3 s) sometimes seem inaccurate. There are also occasions where spikes are simply missed (172.9 s) or fabricated (173.9 s).

Prediction performance is evaluated in different ways. One measure is the κ performance, a measure defined on the samples in the test set, which is positive for above-chance classification; it equals one for perfect classification (see methods for definitions). In contrast, the correlation measure rσ (see methods) is defined as a local average in the time domain and is therefore less sensitive to small temporal inaccuracies. The performance measure κ of the predicted spike train in Fig. 3C has a value of κ = 0.40, which is relatively good (but not the best possible; see following text). Rank correlation is r25 ms = 0.60.

In the example of Fig. 3, the predicted spike train resembles the original to a certain degree. We ask whether this prediction quality carries over to LFPs recorded from different monkeys, electrodes, and stimulus conditions. For that we estimated prediction performance using a large data set (see methods). Inferences are made on the basis of a set of LFP features, with which we observed a dependence between spikes and LFPs in the previous section. In the population analysis we include as features the time course around each sample position (in a window of 100 ms before and 300 ms after spike position) and an estimate of the frequency content of LFPs at zero time lag [Pf,0(ti); see methods], resulting in a total of 116 features. This feature set generally produced good performance (with a reasonable computational speed) over a wide range of data. For the prediction itself a nonlinear support vector machine is used with radial basis functions such as kernel (SVM–RBF) and a linear classification is employed (for details see methods).

In Fig. 4 the prediction performance over all trials is evaluated (on 10 cross-validation runs) and averaged. The anesthetized V1 data set is labeled “spo” for spontaneous activity and “stm” for movie stimulus-driven activity. We shall focus on these data for the moment. The remaining conditions shown in this plot are discussed in the following text.

FIG. 4.

Population performance for spike prediction from LFP. A: average prediction performance κ for SVM and the linear regression classifier across conditions [anesthetized monkey V1, movie stimulus (“stm”), anesthetized monkey V1, spontaneous activity (“spo”), non-anesthetized monkey V1 with mixed stimuli (“awake”), and spontaneous activity or movie-stimulus driven activity in anesthetized monkeys from LGN, “spo (L)” and “stm (L),” respectively]. Prediction is above chance level for all conditions (see results for significance tests). B: prediction accuracy of the nonlinear classifier evaluated by rank correlation between target and predicted spikes train smoothed by a Gaussian kernel of width σ = 25 ms. Red horizontal lines indicate the average performance within each condition and its SE. Small black lines show the quality on individual trials. In some cases prediction yields very accurate results, with correlations as high as 0.8–0.9. Black curves on the sides of each condition are smoothed histograms over trials. Symbols indicate the average performance of individual sessions (i.e., 1 day of recording). Average session performance clusters near the overall mean, but variance for individual electrodes is high. C: cross-electrode prediction. LFPs are taken from one electrode and spikes from another. Relative prediction decreases much faster with increasing electrode distance in the “stm” condition than for the “spo” condition. A linear fit of the data points is also shown. Vertical lines indicate SEs. D: average prediction performance for simultaneous LGN and V1 recordings (3 sessions, 2 monkeys). Performance is compared for available cross-electrode prediction from different areas and with average performance when using the same electrode for spikes and for LFPs. X-axis labels indicate where LFP and spikes originate from.

Plot A shows the average performance κ for the SVM–RBF classifier and for linear classification. From the results we draw the following insights. First, since performance measure κ is >0 for above-chance prediction it can be said that both classifiers can exploit information in the LFP time course to predict spiking activity (all conditions highly significant; t-test, P < 10−6, Wilcoxon signed-rank test for zero median, P < 10−6). Second, prediction quality for the stimulus condition and for spontaneous activity differs only slightly: indeed, one cannot reject the hypothesis that the underlying distributions have identical means (two-sided unpaired t-test, P = 0.21; linear, P = 0.18). However, if one compares pairwise recordings during spontaneous activity and stimulus presentation done with identical electrodes, mean and median prediction performances on spontaneous activity are significantly better than those on stimulus-driven activity (one-sided paired t-test: P < 10−4; linear, P < 10−4; for the distribution-free Wilcoxon matched-pairs signed-ranks test: P < 10−4; linear, P < 10−3). Average prediction performance for spontaneous activity is κ = 0.211 ± 0.006 (linear κ = 0.185 ± 0.005) and κ = 0.201 ± 0.005 (linear κ = 0.175 ± 0.005) for stimulus-driven activity.

Third, nonlinear margin classification is consistently better than linear classification (one-sided paired t-test: “stm” P < 10−6, “spo” P < 10−6; Wilcoxon matched-pairs signed-ranks test, P < 10−6). It amounts to an increase, on average, of about 12% in performance. This suggests that the mapping from LFP features is nonlinear. However, since a simple linear regression classifier already achieves almost 90% of the accuracy of the nonlinear classifier, one could state that the LFP feature space exploited here seems expressive enough for this task.

We found that for individual trials performance varies widely. For selected trials prediction performance can reach κ = 0.65. Plot B of Fig. 4 shows the rank-correlation measure r25 ms of the SVM–RBF prediction. Each thin short line represents performance for an individual trial. Whereas the correlation for some trials is as high as 0.8–0.9 on this moderately small timescale (25 ms), it is almost zero in others. There are some trials where prediction fails completely in each of the conditions. The failing trials are not all from the same sessions since the session means (markers) tend to cluster around the overall mean.

There is not much variability in performance over time: the average SD for the κ performance of five repeats of 170-ms recordings for the same electrodes is 0.023 ± 0.002 for stimulus-driven activity, 0.026 ± 0.002 for spontaneous activity, and 0.045 ± 0.002 for both together. This is in contrast to the variance across electrodes recorded simultaneously. Here the average SD (in κ) is 0.113 ± 0.004 for stimulus-driven and 0.130 ± 0.005 for spontaneous activity. The roughly 25-fold increase in variance across electrodes compared with within-electrode variance suggests that prediction performance is a matter of which electrode is being observed, rather than stimulus condition or time. Electrode tips might be positioned in a region where the arrangement of current sources and sinks might differ (e.g., in deep or superficial layers), or where active neurons might be less well correlated with the bulk activity of the cortex. Since we cannot distinguish the layers from which electrodes record, we cannot pursue this further.

Up to now we have presented results for recordings only from V1 of anesthetized monkeys. We also have a limited amount of data available from V1 where monkeys were not anesthetized and free to behave. The stimuli for this data set (labeled “awake”) are mixed and include spontaneous activity and fixation tasks. Another pool of data consists of recordings from LGN of anesthetized monkeys (labeled “stm(L)” and “spo(L)”; see methods).

We see from Fig. 4A that prediction differs quite drastically for the different data types. Spike prediction for the anesthetized monkey data from V1 is more than fivefold better than that in the LGN, where performance is hardly above chance: on average, κ = 0.035 ± 0.005 (linear κ = 0.033 ± 0.005) for movie-driven activity and κ = 0.027 ± 0.003 (linear κ = 0.022 ± 0.003) for spontaneous activity.

As in V1, there is little difference between spontaneous and movie-driven activity in LGN, although there is a reversed tendency for spike prediction to be easier on movie-driven activity than that on spontaneous activity. This tendency is barely significant (one-sided paired t-test: P = 0.02; linear, P = 0.01; Wilcoxon matched-pairs signed-ranks test: P = 0.05; linear, P = 0.08).

We find that average prediction performance on awake data is κ = 0.063 ± 0.005 (linear κ = 0.046 ± 0.005). This is much worse than that on anesthetized V1 data (unpaired t-test, P < 10−5), but still significantly better than that on LGN data (all unpaired one-sided t-test, P < 0.05). Figure 4B reveals that individual trials have a correlation of target and prediction similar to that in anesthetized monkeys. There are trials with correlation up to r25 ms = 0.6, whereas in the case of the LGN, no trial exceeds 0.3 correlation.

Cross-electrode predictions

The volume of cortex that contributes to the generation of LFPs is different from that producing our spiking signal (see introduction). Thus it might be interesting to see how the relationship between the two signals changes with distance. Because recordings were done simultaneously with multiple electrodes (in the data set from anesthetized animal), we tried to infer spikes from LFPs collected with two different electrodes. In Fig. 4C the average performance is plotted against the (three-dimensional) distance of the electrode tips. To facilitate comparison, performance is evaluated relative to the average performance achieved using the spiking signal from the electrode from which the LFPs were taken.

One notes that prediction performance drops to about 40% when electrodes are 1 mm apart (the minimal distance in our recording setup). Interestingly, for stimulus-driven activity performance degrades significantly with distance (rank correlation between distance and relative kappa performance using all measurements: −0.20, P < 10−4), whereas for spontaneous activity no significant correlation with distance can be found for distances ≤1 cm (rank correlation 0.015, P = 0.2). Note that the number of samples becomes relatively small for distances >6 mm since rectangular electrode grids with 1-mm spacing are used for most sessions. However, we can safely compare spontaneous and stimulus-driven activity because the electrode placements do not change with the condition.

Because LGN data were collected while other electrodes simultaneously recorded from V1, we can investigate whether the LFPs of V1 can be predicted on the basis of spikes from LGN and vice versa. This is shown in Fig. 4D averaged over data from the three sessions recording simultaneous measurements from V1 und LGN (see methods). Performance is averaged either across electrode predictions (regardless of distances) or over all predictions using the same electrode for both signals. Although results are difficult to interpret because of the limited size of the data set, one notes that using LFPs from LGN and spikes from V1 results in performance above chance, whereas LFPs from V1 seem to hold no information about spikes in LGN (unpaired Wilcoxon signed-rank test for median performance different from zero, significance level 0.05).

Temporal accuracy of predicted spike trains

We found an average κ value of about 0.2, which is well above chance but nevertheless far from perfect prediction at κ = 1. On the other hand, in example Fig. 3C some features of the target spike trains seem to be well captured by the prediction, especially regions of high and low activity, which alternate on a timescale of about 0.5 s in this example. Thus one might ask at what timescale the predicted spiking activity most closely resembles the target spiking activity or at what timing accuracy the prediction fails.

To answer this we evaluated the coherence between target and predicted spike train (Fig. 5). Coherence is a correlation measure in the frequency domain. Coherence at a particular frequency makes a statement about the exactness of the prediction on a timescale of one over that frequency. We also estimated the temporal accuracy directly in the time domain (by varying the correlation kernel width) where one arrives at similar conclusions (not shown).

FIG. 5.

Coherence levels of predicted and target spike trains. Coherence is plotted against frequency and is averaged over all trials for each condition. For comparison we included surrogate data, in which spike trains were generated with the same ISI distribution as in the monkey data. Gray dashed lines show coherence between surrogate spike train and its jittered version with Gaussian noise of different SDs (σ from 5 to 500 ms, as listed in the plot). Coherence drops for higher frequencies, suggesting that, on average, prediction is reasonably good only for slow structure in the spike trains. Note that the chance coherence level is at about 0.15 here, as shown by the surrogate data. Chance levels do not tend to zero because coherence is estimated on 10 cross-validation regions (each 17-s duration) and only subsequently averaged over all trials. Colored areas indicate SEs.

In Fig. 5 one observes that coherence is low for high frequencies and rises for low frequencies. Thus the general resemblance of a predicted spike train might be adequate, but the exact spike position is often predicted with some jitter. This is also evident in the example of Fig. 3.

Coherence drops at about 25 Hz for the anesthetized V1 data. This timescale is comparable to a spike train whose spikes are jittered by Gaussian random noise with SD of 25 ms. Coherence levels of such surrogate data are indicated by the dashed lines in Fig. 5. Since the jitter destroys all information in the high frequencies, the plateau at low coherence for the surrogate data can be taken as a significance level for the coherence estimation. In surrogate data low-frequency aspects stay completely intact (thus a coherence of 1), but for predicted spike trains this is only partly the case. However, average coherence rises considerably for larger timescales compared with smaller ones, suggesting that at least in a subset of trials, slow structure is well predicted.

Data from the non-anesthetized monkey are less coherent at low frequencies but much more so than for data from LGN, where almost no significant coherence is observed, even for low frequencies. Note that we have far fewer trials from LGN and non-anesthetized V1, so averaging is less effective in smoothing.

In summary, predicted spike trains are seldom accurate to a spike timing precision of <25 ms, as suggested by comparison to a jittered version of the original spike train. On the other hand, predicted spike trains capture structure on a larger timescale reasonably well, say for clusters of high spiking activity in the 100-ms range.

LFP features important for inferring spikes

For determining the usefulness of particular LFP features for inferring spiking activity, we iteratively select a small number of features out of a large pool of possible features. The selected subset shows minimal prediction error for a given number of features, and therefore selected features can be seen as the most important for prediction. Because spike prediction in LGN is almost impossible, only V1 data are analyzed in the following.

We consider a feature pool consisting of phase and power features (Pf,k, φf,k, and φ̂f,k). Phase is estimated on 45 frequency bands each 2 Hz wide, whereas the power features Pf,k have different frequency resolutions (see methods). Setting k appropriately, we include time lags of ≤3s in both directions (before and after ti). Out of this pool of features, containing together N = 138,115 features, only up to m = 10 features are selected for each trial individually using the algorithm outlined in methods. Figure 6D shows that, on average, selecting only 10 features out of the huge pool is enough for a linear classifier to approach the performance of the linear classifier used previously (Fig. 4), which used 116 general features (dashed line). For the first five selected features the gain in performance is highest.

FIG. 6.

Most useful LFP features for spike prediction. Distribution of m = 5 selected features per trial, pooled over all trials. Color plots show the density of the selected features in the feature space (frequency and time lag). Each feature is represented by a Gaussian with 95% of its density falling into the width of correlatedness (i.e., the time window and bandwidth of spectral estimation for power features; we have chosen a quarter oscillation period for phase features). Marginal plots show the marginal distribution of the histogram for absolute lags ≤3 s, where colors code for feature type [power Pf,k (red); phase features φf,k and φ̂f,k (blue)]. AC: conditions “stm,” “spo,” and “awake” (as described in Fig. 4). See results for discussion. D: performance of linear classifier restricted to m selected features relative to the performance of the full linear classifier. To avoid overfitting, performance is tested on a time region that was not used to estimate the importance of features. Δκ denotes the difference in prediction performance κ between the full linear classifier κFull with 116 features (as in Fig. 4) and the restricted linear classifier using m selected features. 〈Δκ〉 divided by the mean performance of the full classifier is plotted against the number of selected features m. One observes that performance with selected features approaches the full classifier.

Figure 6, AC shows histograms of m = 5 selected, most important features aggregated for all trials. Phase- and power-related features are colored blue and red, respectively. Analogous to previous results, useful features differ only slightly between stimulus-driven activity (Fig. 6A, “stm”) and spontaneous activity (Fig. 6B, “spo”): stimulus induction does not seem to induce a general change in the preference of features for spike–LFP interaction. One notes that in both spontaneous and stimulus-driven activity power fluctuations in the high γ-band (40–90 Hz) are preferred features. Selected frequencies are biased toward high values, with 80–90 Hz being the most likely selection. Indeed, high-frequency power features are selected as the first and most useful feature in about 90% of the trials (and in 82% in non-anesthetized animals; not shown). The time lags of the selected γ power features are almost symmetrically distributed around zero (with a small bias toward positive lags) in a zone spanning about 50 ms to either side. There are smaller symmetrical peaks at 150 ms, which may be attributed to the power estimation, where we use a moving window of 150-ms duration. Likewise peaks at 80 and 60 Hz are introduced by spectral estimation because the bandwidth is roughly 20 Hz (see methods).

We identified low-frequency information as a second class of useful LFP features, in particular phase information of low-frequency bands <10 Hz. Time lags of selected phase features are mostly positive, meaning that the time of the feature is most informative after the spike. Useful lags vary from −50 to 200 ms, depending on the frequency, and they can be as long as 500 ms for the lowest frequency bands (≤2 Hz). Time lags vary according to an oscillation period of the low bands. Power modulations in the low-frequency bands are selected about as often. The time lags of these features are distributed widely, which is caused by the long window setting of 2 s needed to estimate power at low frequencies (see methods).

Bands from 10 to 40 Hz, especially 15 to 30 Hz, seem to be much less important for inferring spikes. Despite a small number of scattered features in the “spo” condition, phase information for γ-bands (e.g., >40 Hz) does not play a role, either.

For the non-anesthetized animals results are hard to interpret, given the limited amount of data (see Fig. 6C). However, it seems that the overall structure is similar to the V1 data of anesthetized monkeys in having high-frequency power features as well as very low frequency phase features for positive lags. However, there seems to be an increase of selected power features for intermediate frequencies.

Both feature types, meaning high-frequency power features around zero lag and low-frequency information, either low-frequency power or low-frequency phase features with positive lags, are often jointly selected among the five optimized features. This shows that individual trials have similar features. We found that in 75% (“stm”), 59% (“spo”), and 72% (“awake”) of the trials both types of features are jointly selected, more specifically a high-frequency power feature (>40 Hz) with absolute lags of <250 ms and a low-frequency phase feature (<10 Hz) with positive lags or a low-frequency power feature <10 Hz. In absolute terms high-frequency powers are preferred over low-frequency features (“stm” 61 vs. 24%; “spo” 62 vs. 19%; “awake” 47 vs. 24%). In “stm” and “spo” conditions low-frequency phase features with positive lags are selected slightly more often in combination with high gamma powers than low-frequency power features (“stm” 49 vs. 44%; “spo” 38 vs. 33%), whereas low-frequency powers are preferred in the awake condition (28 vs. 60%). Neither of the low-frequency features is present in about 25% of the trials when the first 5 selected features are considered, although this value drops to about 5% when the first 10 selected features are examined.

In summary, our analysis shows that two feature types are most useful for predicting spikes: power in the higher bands (>40 Hz) and (to a lesser degree) low-frequency information (<10 Hz), which can be power modulation or phase information with lags around and after the spike.

POPULATION STATISTICS OF LOW-FREQUENCY PHASE FEATURES.

Low-frequency phase features indicate spike positions relative to the low-frequency oscillations of the LFP. This feature thus carries the information of the phase locking to lower bands (Fig. 2, C and D). However, from the perspective of phase locking it is surprising that the informative lags are asymmetrically distributed around the spiking position. This indicates that it is not merely the locking to a phase that is important, but that instead the LFPs at low frequencies display a consistent slow wave following spiking activity and spikes are locked to the onset of that wave. In contrast LFPs before spikes are less well determined on average. This asymmetry of the phase locking to lower bands can be seen in the STA LFP, as pointed out earlier (Fig. 2A).

The form of the STA is stereotypical for the majority of electrodes in V1. As we showed earlier in methods, a high value in the STA is a good feature for classification (given some covariance constraints). Thus the typical form of the STA explains why the phase features for positive lags are consistently selected among the best features. To show the generality of the form of the STA in our data, we select two particular phase features, φA and φB. Feature φA is determined by the position of the first maximum (peak) for positive lags of the 1- to 4-Hz band-pass filtered LFP (cross in Fig. 7A). To reject any AM in that band we take the cosine of the time course of the phase instead of the time course of the LFP (see methods). The second phase feature φB is the valley nearest to zero lag in the 4- to 8-Hz band of the LFP (star in Fig. 7A).

FIG. 7.

Information about spiking activity conveyed by phase features and frequency power features. A: STA. LFP STA (blue line) as well as the STA for the 1- to 4-Hz (black line) and 4- to 8-Hz (green line) band-pass filtered LFP. In the latter cases all power modulation is discarded (see methods), yielding purely phase related signals. The definitions of phase features φA (black cross) and φB (green star) are indicated. B: distribution of lags of features φA and φB for all trials of the anesthetized V1 data. C: information about spikes conveyed by single features. D: redundancy of information about spikes conveyed by any combination of LFP features. Color coding indicates the amount of information synergy (see methods, Eq. 4). See results for discussion.

In Fig. 7B the distribution of these two features is shown in a scatterplot across all trials. Note that the feature φA (black crosses) lies very consistently at mean lag around 112 ± 1 ms, although the height differs somewhat (see marginal distributions in attached plots). The feature φB (green stars) is likewise consistent across trials. However, the height distribution of this second feature (right margin plot) is more skewed to lesser values than for feature φB, suggesting that it might be less useful for prediction. There is also a minority of trials in which neither feature was well expressed, indicated by the scattered outliers.

INFORMATION CONVEYED BY LOW-FREQUENCY BANDS AND HIGH-FREQUENCY POWER FEATURES.

To compare information conveyed by different features about spikes we use the mutual information between target spikes train and predicted spikes train (see methods). Mutual information between the class labels is a lower bound for the mutual information contained between the signals under consideration (Natschläger and Maass 2005). It is only a lower bound, since a classification method might fail to use all the information contained in the signal. However, it is unlikely that a nonlinear SVM classifier would miss much of the dependence in our data, since the relationship between features and spikes seems to be mostly simple proportionality. Recall that for our data a linear classifier already achieves about 90% of the performance of a nonlinear classifier.

First, we tested prediction performance for single-frequency power and low-phase features individually using the SVM classifier. In Fig. 7C we show the average information about the spikes at different frequencies for V1 of anesthetized monkeys (blue crosses). The information change with frequency closely resembles the number of selected power features per frequency from our selection algorithm in the previous section. We note that, on average, frequencies around 80 Hz convey the most information about the spikes. If one uses all the features tested here simultaneously, average performance reaches 0.037 ± 0.002 bits, which is 35% higher than that when the best individual feature is used.

Information contained in the power decreases monotonically with frequency. This decrease can also be seen on the level of individual trials (not shown). If one uses either one of the low-frequency phase features φA and φB on its own, information drops to about a third for φA and much lower for φB compared with that of the best power feature (black and green lines in Fig. 7C, respectively). Despite the usefulness of low-frequency power modulation in combination with high-frequency powers (as shown in the feature selection), low-frequency power exhibits poorer performance as a single feature than the phase features (in particular in comparison to φA; see Fig. 7C). Note that the timing resolution of the phase features is much higher as phase is defined at any moment in time, whereas power has to be estimated within a window of sufficient length. The induced temporal correlation of nearby time points for the low-frequency power seems to be too high to predict spike times on its own.

We next investigated the information conveyed by any two features jointly. If two LFP features F1 and F2 conveyed independent information about spiking activity S, the normalized measure for synergy of information syn (F1, F2|S) (Eq. 4) would be zero. In general, this measure ranges from minus one for completely redundant information to one for completely synergistic information (for details see methods). Figure 7D shows the average normalized synergy of information about the spikes for all combinations of features. Here synergy of information is calculated on the basis of single trials, where trials having joint information not significantly above zero information are excluded (Wilcoxon signed-rank test, P value >0.1). Generally, information conveyed by high-frequency bands is mainly independent from information contained in low-frequency bands. The information in individual high-power features is more redundant [e.g., for 87 and 50 Hz, syn (P50 Hz, P87 Hz|S) = −0.40 ± 0.02] than between high-power features and phase features, where information is nearly independent [synergy values with high-frequency power features around −0.2; for instance, syn (φA, P81 Hz|S) = −0.21 ± 0.01 and syn (φB, P81 Hz|S) = −0.14 ± 0.02]. Phase feature φB becomes more redundant with power for decreasing frequency [e.g., syn (φB, P2 Hz|S) = −0.45 ± 0.03], whereas φA redundancy is relatively low even with low-frequency powers [e.g., syn (φA, P2 Hz|S) = −0.23 ± 0.03]. However, redundancy between any two low-power features is much higher. Both phase features convey almost independent information about spikes [syn (φA, φB|S) = −0.05 ± 0.02].

Note that the high redundancy of information in two high-frequency powers that are <20 Hz apart is a result of spectral estimation, which is done on the bandwidth of 21 Hz (see methods).

In summary, both feature types, high-frequency power and low-frequency information, seem to code for mostly independent information, whereas two high-power features convey more redundant information.

PREDICTION PERFORMANCE IS RELATED TO CLUSTERS OF SPIKES.

We noticed that generally prediction performance is superior on data where spikes tend to cluster to bursts of activity with relatively long silent periods in between. This can be seen if one correlates the interspike interval coefficient of variation (ISI-CV) with prediction performance. If spikes are temporally clustered, many short intervals are interspersed with few very large intervals, causing a large value for ISI-CV, the ratio of SD to mean of the ISI distribution. Thus ISI-CV can be seen as an approximate measure for the degree of temporal clustering of spike trains. One notices a strong correlation between the ISI-CV and prediction performance (rank correlation 0.86), whereas prediction performance is only poorly correlated with the firing rate (0.47). This behavior can also be seen for individual features. High-frequency power features have the highest correlation with ISI-CV (0.92), whereas correlation with single phase features is lower (e.g., 0.64 for φA). Correlation with rate for both features is much lower (high gamma frequency 0.34, phase feature φA 0.44). We found that if one defines larger clusters of spike or burst events directly and discards single spikes in between (see Fig. 8), low-phase features are locked to the timings of such bursts and the performance of phase feature φA is highly correlated with the burst rate (rank correlation 0.82). In our burst definition (see caption of Fig. 8) the average burst length is 122 ± 1 ms, which suggests that low-frequency phase information preferentially codes for (rather sustained) bursts of activity.

FIG. 8.

Locking of high spiking activity events to phases of slow LFP oscillations. Here a high spiking activity event (burst) is defined as having ≥10 spikes. Spikes constituting a burst have to occur within a maximal mean ISI of 5 ms, and spikes not contained in a burst are deleted. Bursts have to occur ≥25 ms apart from each other; otherwise, they are regarded as one continuous event. The middle position is taken as the timing of an event. The probability of event times occurring in a particular phase of the 1- to 4-Hz oscillation of the LFP is plotted for individual electrodes (averaged over 10 trials of conditions “stm” and “spo” recorded at the same electrode site). Electrodes are sorted according to the performance κ of the low-frequency phase feature φA. For about 60% of all electrodes burst times are locked consistently to the upswing of a wave in the low LFP band. For most of the rest of the electrodes too few or no bursts are found when the preceding definition of a burst event is applied. Phase is illustrated in the right margin plot. Burst rates correspond well to the performance of φA (rank correlation 0.82, bottom plot).

DISCUSSION

Local field potentials (LFPs) are the best indicators of integrative activity in an area. They reflect the area's input activity in terms of population excitatory and inhibitory postsynaptic potentials, but also the area's regional processing because they are directly affected by dendritic spikes, voltage-gated oscillations, and various after-potentials—all markers of diverse neural computations (see Buszáki 2006 for an overview). Not surprisingly, an increasing number of studies report their specificity and usefulness in the search for neural correlates of behavior (Kreiman et al. 2006; Lee et al. 2005; Liu and Newsome 2006; Osipova et al. 2006; Rubino et al. 2006; Scherberger et al. 2005). Although these studies show that LFPs convey information that is to some degree independent of spiking activity, it has been suggested and demonstrated by many researchers that spikes synchronize to—or that synchronization gives rise to—specific oscillation frequency of the LFPs, in particular γ-bands (>40 Hz) in visual cortex and θ-band (4–8 Hz) in the hippocampus (see Buszáki 2006 and references therein).

Herein we investigated the relation of spiking activity to LFP on a more fundamental level by asking which aspects of the LFP can generally be exploited to predict spike times. Unlike other approaches in which simple linear interaction between both signals is tested (e.g., with means of coherence or correlation), the classification approach used in our study can exploit multiple different features of the LFP simultaneously in a nonlinear way to infer spiking activity from LFPs. We found that when the best single feature is used, performance drops to about 70%, showing that multiple features are essential. The nonlinear component in the code is rather small, but there is still an average increase of 12% when SVM classifiers are used.

In contrast to V1, the prediction of spikes from LFPs is (almost) impossible in the LGN. The reason for this could be that spiking activity might be less correlated and thus have less effect on the LFP or, alternatively, that the geometrical arrangements of current sources and sinks in the thalamus generate field potentials of lower spatial specificity than those observed in cortex. Furthermore the fact that 80% of the LGN input is cortical and modulatory is an additional potential reason for the LFP-spiking decoupling in this nucleus. Neuromodulation might explain our observation in simultaneous thalamus–cortex recordings that V1 LFP cannot predict LGN spiking. LGN–LFP, on the other hand, is a reasonable predictor of V1 spiking, likely because the former is a better indicator of local LGN activity that is correlated with the V1 LFPs, which in turn can predict the spiking of this cortical area.

From the point of view that LFP is mostly generated by the totality of synaptic input and local processing in a region one might ask whether it is possible to predict spikes solely on the basis of information preceding the actual time of the spike. To evaluate this possibility we recomputed prediction performance for the same feature types but with shifted lags, so that only causal information of the LFP is included. When we did this, prediction performance dropped to 68% of the average noncausal classifier used for the population analyses, which is still well above chance level. However, conclusions about the relation between synaptic input and spiking output are difficult to draw from this number. For instance it is likely that due to temporal correlation in the spike train (and various LFP bands) neighboring times in the time series are good predictors for each other.

LFP features related to spiking activity

Besides having generated and visualized concrete spike trains inferred solely from information contained in the LFPs, our analysis revealed those features that are the most important carriers of information about spiking activity in the LFP and estimated their relative importance and redundancy properties.

The first and most useful feature for inferring spike trains from LFPs is the power modulation of high-frequency components in the upper γ-band from 40 to (at least) 90 Hz. This is in good agreement with other studies that have established a link between gamma-frequency bands and spiking activity (Csicsvari et al. 2003). The biophysical origin of gamma frequencies in the LFP remains a topic for current research. The rather fixed relationship between spikes and LFPs over a wide range of data and conditions could reflect physical constraints (layered organization of the cortex, distribution of sinks, and sources) or inherent properties of the neural network topology and function. For instance, gamma activity might be the effect of fast inhibitory circuits on the LFPs (for a recent review see Bartos et al. 2007).

In contrast to power modulations, phase in the γ-range is much less important for prediction than one might have expected from the well-documented fact that spikes are synchronized in this range (Gray et al. 1989; Kreiter and Singer 1992). Our stimuli consist of cinema movies of several minutes’ duration and containing a mixture of objects, faces, actions, colors, and edges. Thus specific object encoding might be weak and cluttered only with other aspects of the code because the stimulus consists of multiple objects at any given time. Because we assume a stationary spike–LFP relation for the duration of a trial (170 s), more subtle aspects of the relation such as transient rapid neuron-to-neuron synchronizations would be averaged out and therefore not detected as a useful feature. Another possibility for the relative unimportance of gamma-phase features might be that different subgroups of neurons present in the MUA signal lock to different phases in the gamma cycle, as suggested for inhibitory and excitatory neurons (Hasenstaub et al. 2005). In general, if two subgroups of units present in the MUA signal locked to a different feature of the LFP, our classification method should not degrade in performance because it can exploit many features simultaneously. However, the relative usefulness of both features would indeed decrease in the average features analyses.

In addition, locking to phases of higher frequencies is difficult to detect because even a small amount of jitter in spike-time precision abolishes locking. Such jitter might be introduced during binning of the spike timings to LFP sampling resolution (5 ms). This is in contrast to the effect of a small amount of jitter at lower frequencies. Accordingly, the phases of low-frequency components are indeed useful for predicting spiking activity. Moreover, our analysis suggests that the oscillation phases of low frequencies code for larger bursts of temporally correlated spiking activity. In fact, the high probability of spikes occurring in clusters in our V1 data helps to infer spikes from LFPs. Prediction quality is highly correlated with ISI-CV because one can predict spiking activity on a timescale of >25 ms, rather than reaching single spike precision. In the LGN data, where the spikes are less clustered (low ISI-CV, average of 1.1 ± 0.05 compared with 2.4 ± 0.05 in anesthetized V1 and 1.8 ± 0.05 in non-anesthetized V1), it is almost impossible to predict spikes from LFPs. Since we observe stronger clustering for anesthetized data, we conclude that this might be partly an effect of anesthesia (Steriade et al. 1993).

The low-frequency power modulations selected as useful features in a part of trials probably have origins similar to those of the slow-phase features and may code for the relative amount and size of clusters of activity. Thus low-frequency power modulations might provide a slow-changing state variable that is useful in combination with the fast-changing high-frequency powers, as indicated by the low redundancy values.

About half a century ago, several studies were conducted that attempted to relate electrical encephalographic (EEG) signals to spiking activity. Then it was found that spikes occur preferably at the negativity of 0.2- to 2-Hz waves (Fromm and Bond 1964). Since EEG has a basis of origin similar to that of LFPs (Nunez and Srinivasan 2006), this confirms our findings, where spikes occur at the minimum to rising phase of the slow oscillation. It is not just the negativity that is important, however, but also the peak that is seen following the spikes. This is similar to slow-wave sleep of cats as seen in the STA (Destexhe et al. 1999) and is taken to the extreme in spike-wave complexes, which can be observed in cases of epilepsy (e.g., Destexhe et al. 2001). In the latter case, it was suggested that the slow positive wave form after a burst of activity is accompanied by neuronal silence and can be explained by slow inhibitory effects mediated by γ-aminobutyric acid type B (Destexhe et al. 2001). Although we observe neither clear up- and downstates nor any pathological periodic activity in the LFPs of our data, the similarity of the STA suggests that such processes may play a role during more physiological states, albeit in a much weaker form.

Effects of spike detection method

The spiking signal used herein is generated by a simple threshold-based procedure for detecting spiking events from the recordings. This detecting procedure is prone to false-positive detection, as well as to a smaller fraction of missed spikes originating at larger distances or from neighboring interneurons. With our method we expect a false-positive rate of 2 Hz, if one assumes that nonspike values (i.e., noise) in MUA amplitudes are subject to Gaussian distribution (see methods). Thus the spiking signal contains not only the activity of multiple neurons and possibly of different cell types, but also noise spikes. Since noise spikes should be independently distributed with respect to time, false-positive labels should actually reduce performance compared with the “true” spiking signal. We tested higher spike detection thresholds, where the contribution of noise becomes negligible (e.g., for 5SD only 0.05 Hz), and found instead reduced prediction performance, although performance still remained well above chance (not shown). Since it was estimated that in principle spikes arising from ≤1,000 neurons could be detected by a single electrode (Henze et al. 2000), a spiking signal generated with a higher threshold will include fewer smaller spikes from neurons that are further away and incorporate only those neurons that happen to be in the immediate neighborhood of the electrode tip. Thus the spiking signal becomes more local and its relationship to the relatively global LFP signal is naturally weaker. Therefore noise spikes are not likely to artificially enhance our performance results, but influencing the number of neurons whose spikes are detected via threshold will have an effect.

Similarly, spike sorting yielding multiple single-unit activities rather than multiunit activity would naturally decrease prediction performance, since spiking signals become more local and, additionally, the prediction task would change from binary classification to a more difficult multilabel classification task.

Encoding of the stimulus

We found that the relationship between LFPs and spikes is almost unchanged during stimulus-driven activity compared with spontaneous activity. This finding corresponds well to the results of others (Fiser et al. 2004; Kenet et al. 2003; Vincent et al. 2007), who found that the structure of spontaneous activity is rich and sometimes even resembles stimulus-driven activity. Although there is no general change in the structure of features, there are of course transient aspects of the stimuli encoded by spikes (and LFPs) that have an effect on the spike rate, for example, which increases during stimulus presentation (not shown). Because of the encoding of the changing movie stimulus over time, the temporally contiguous training and test sets might differ in their LFP–spikes relation (i.e., in their sample distributions). Thus movie encoding might explain why prediction performance on spontaneous activity is slightly better than that on stimulus-driven activity.

Effects of stimulus encoding might also explain the observation that the LFP is “more global” during spontaneous activity in that prediction of spikes degrades much more slowly with increasing cortical distance than that during stimulus presentation. Thus the stimulus actually decorrelates neural activity spatially in comparison to spontaneous activity.

Note that we do not analyze features that actually encode information about the stimuli. Information contained in LFPs and spiking activity about aspects of the movie stimulus will be investigated in a forthcoming paper (A Belitski, A Gretton, C Magri, Y Murayama, MA Montemurro, NK Logothetis, S Panzeri, unpublished observations) using the very same data as in the present study. Although we show here that the structure of the relationship of spikes and LFP does not change considerably during spontaneous activity, Belitski et al. nevertheless show that very low and very high oscillation powers of the LFP are highly informative about the stimulus. They find that high frequency LFP power series (50–120 Hz) contain information about the stimulus that is partially redundant with that in power series derived from the spike trains. In this respect, their results complement ours.

Conclusion

We conclude that to a certain degree spikes can be inferred from LFPs—a fact that reflects the interaction of these signals. However, we find that millisecond precision, which has been shown to be used for temporal coding (Mainen and Sejnowski 1995), cannot be inferred from LFP. The temporal aspects of neural spiking used for information coding, rate coding, or coding on spike timing remain a topic of current research (Rieke et al. 1999). We might conservatively say that irrespective of whether they are important for coding, time-varying rates on the scale of about a hundred milliseconds can be moderately well inferred from the LFPs, but that exact timings cannot. Thus given our results, it should in principle be possible to develop an appropriate methodology that permits the extraction of certain spiking features from signals measured by methods relying on LFP-like signals, such as functional MRI (Logothetis et al. 2001) or optical recordings (Grinvald et al. 1985, 2004). Nevertheless, the strong dependence of spike predictability on electrode position suggests that the reliability of such predictions may depend on the brain site. Finally, the fact that in the thalamus it is practically impossible to predict spikes from the LFP suggests that computations based on input, local processing, and output—as instantiated in the different frequency bands of the mEFP–can be helpful only for structures with the appropriate element geometry (e.g., fascicles of pyramidal cells vs. potentially close-field arrangement of thalamic neurons) and the proportion of driver to modulator afferents.

GRANTS

This work was supported by the Max Planck Society and the Austrian Science Fund Project S9102.

Acknowledgments

We thank A. Tolias and G. Keliris for providing the non-anesthetized monkey data set, A. Belitski for comments on earlier versions of the manuscript.

Footnotes

  • 1 Note that the optimal weight vector w* has only minimal error for the regression, and that there may be a better weight vector for classification. We neglect this here for the sake of simplicity.

  • The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

REFERENCES

View Abstract