|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
INNOVATIVE METHODOLOGY
1Department of Physics, University at Albany, State University of New York, Albany; 2Department of Neuroscience, Albert Einstein College of Medicine, Bronx; 3Cognitive Neuroscience and Schizophrenia Program, Nathan Kline Institute, Orangeburg; 4Department of Psychiatry, Columbia College of Physicians and Surgeons, New York, New York; 5Department of Internal Medicine, Yale University School of Medicine, New Haven, Connecticut; 6Neuroscience Department, Brown University, Providence, Rhode Island; 7The J. Crayton Pruitt Family Department of Biomedical Engineering, University of Florida, Gainesville; and 8Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, Florida
Submitted 24 June 2005; accepted in final form 2 February 2006
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
The last decade has seen great developments in blind source separation (BSS) and independent component analysis (ICA) techniques, such as Infomax ICA (Bell and Sejnowski 1995
), FastICA (Hyvärinen and Oja 1997
), and second-order blind identification (SOBI) (Belouchrani et al. 1993
). These algorithms have been useful in identifying sources in EEG and MEG signals using both ensemble-averaged data (Makeig et al. 1997
; Särelä et al. 1998
; Vigário et al. 1999
) and single trials (Cao et al. 2000
; Jung et al. 1999
; Makeig et al. 2002
; Tang et al. 2002
). However, along with its strengths, each technique has limitations, and often these limitations can be addressed. ICA, for example, allows reliable source (component) separation with minimal a priori assumptions and constraints. Its limitation is that although trial-to-trial variability can assist in separation, these effects are not explicitly considered and quantified, and these are substantial opportunities missed. Also, like many other techniques, ICA solves for maximal independence of components, despite the fact that components are often dynamically coupled. Thus although ICA may be reasonable for source separation per se, it is not explicitly designed to quantify the dynamical interactions between the neuronal ensembles that generate the components. This is one of the most interesting and important aspects of the behavior of brain function and a major focus of our efforts.
Here we describe a model of the sensory-evoked neural response that is more realistic than previous models in that it explicitly models trial-to-trial amplitude as well as latency variability in single-trial responses. Using this model, we derive the differentially variable component analysis (dVCA) algorithm, and demonstrate how different variability patterns in neural ensemble activity can be used to separate and identify their component signals. Using simulations, we evaluate not only the ability of dVCA to characterize single-trial responses, but also its robustness to noise. We also compare the specific capabilities of dVCA with two other popular decomposition techniques: principal component analysis (PCA) and ICA. Finally, we demonstrate how to apply dVCA by analyzing neuronal ensemble responses recorded intracortically in macaque V1. Throughout this paper we emphasize the ability of the dVCA algorithm to 1) more accurately account for observed trial-to-trial variability, 2) use differential variation in the amplitudes and latencies to separate and identify sources, 3) avoid enforcing statistical independence of the sources, and 4) more accurately estimate the ongoing background activity. It also merits emphasis that, in addition to its uses in ERP and ERF source separation, the dVCA algorithm is a generally useful instrument for defining and analyzing single-trial neural responses.
| METHODS |
|---|
|
|
|---|
Trial-to-trial variability of evoked responses can conceivably take many forms: latency shifts, amplitude variation, and even waveshape changes. Our earlier findings have shown that the observed variability can be well described by amplitude and latency variations of stereotypic response waveshape components (Truccolo et al. 2002
). Of course, waveshape variability also exists, but robust single-trial amplitude and latency estimates are nonetheless obtainable with the assumption of fixed component waveshapes. For this reason, we model the response evoked from a single neural ensemble as a stationary stereotypic waveshape that can vary in amplitude and onset latency from trial to trial. We write the response evoked in a given trial mathematically as
s(t
), where the function s( · ) represents the waveshape of the response as a function of elapsed time t,
represents the trial-specific amplitude scaling factor of the response, and
represents the trial-specific onset latency shift. Furthermore, when multiple neural ensembles are engaged in the response to a stimulus, the activity of each ensemble is represented in the model as a separate waveform with a distinct trial-specific amplitude and latency shift. It is important to note that by modeling both the waveshape and its amplitude and latency, there is degeneracy in the model because an overall change in amplitude scale or latency shift can be described either by the amplitude and latency parameters or by the overall scale and shift of the waveshape. To eliminate this degeneracy, we take as a convention that the ensemble average amplitude scaling factor of the response over the recorded trials is unity, 

= 1, and the ensemble average latency shift is zero, 

= 0.
In many experimental paradigms, investigators use several detectors positioned at different locations to measure the evoked responses of neural ensembles. The degree to which a detector records a signal evoked by a particular source depends on many factors including the position and orientation of the source relative to the detector. To describe this sourcedetector coupling, we introduce a coupling matrix C, where the matrix element Cmn describes the degree to which the mth detector detects the nth source. This coupling matrix is known as the mixing matrix in the source separation literature and as the lead-field matrix in electrophysiology.
During the course of an experiment the investigator records responses to multiple presentations of a stimulus. Each presentation is typically called a trial. For the rth recorded trial, we model the data x(t) recorded by the mth detector in component form as
![]() | (1) |
nr is the amplitude scale of the nth source during the rth trial,
nr is the latency shift of the nth source during the rth trial, sn( · ) is the waveshape of the nth source, and
mr(t) is the unmodeled part of the data recorded in the mth detector during the rth trial. The unmodeled part of the data record is typically a combination of the recorded background activity along with any noise in the detector. For simplicity, we assume that
mr(t) has zero mean. Thus Eq. 1 describes the data recorded in a given detector during a given trial as a sum of the signals generated by each of the N neural sources appropriately scaled in amplitude and shifted in latency for that trial and also scaled according to the coupling between each source and that detector plus the signals that we do not yet understand or care to understand. We call Eq. 1 the multiple component event-related potential (mcERP) model of evoked activity.
By adopting the mcERP model of the evoked responses, we implicitly adopt a well-defined set of characteristics capable of describing a neural source. The term "component" refers to the waveshape describing the temporal activation pattern of a particular neural source. Because no information regarding the spatial locations or distributions of the neural sources has gone into the model, this model does not distinguish between two spatially distinct groups of neurons that produce the same waveshape varying identically with latency and proportionally in amplitude. However, in such a situation, examination of the estimated coupling matrix would reveal two spatially distinct sources if there are detectors positioned within the proximity of each source. The major advantage obtained by estimating the coupling matrix is that practical experience in conjunction with previously obtained physiological or anatomical data suggesting source distributions can be used to independently evaluate the solutions obtained with this technique. The disadvantage, aside from withholding information from the algorithm, is that there remain two degeneracies in the model. First, there is no specified order to the sources. Second, there is no specified scale for the coupling matrix; one could halve the coupling matrix elements while doubling the magnitude of the source waveshapes and obtain an equally valid solution. These degeneracies, which are present in every other linear source separation algorithm, including PCA and ICA, pose no difficulties to the interpretation of the results. In our implementation we have chosen to normalize the columns of the coupling matrix so that the maximum value is equal to one. However, it should be noted that this scaling degeneracy could be remedied by adopting a physical model of the source currents (Knuth and Vaughan 1998
). With these conventions defined, it is a straightforward matter to map the parameter values estimated by the dVCA algorithm back to a model of the single-trial response. A single-trial component is found by applying the single-trial estimates to the relevant portion of the mcERP model above, which is simply
nrsn(t
nr). The single-trial ERP measured by the mth detector can be found by summing the contributions from each component,
n=1N Cmn
nrsn(t
nr), which gives a noise-free estimate of the recorded evoked response.
One last notable strength of the mcERP model is that no component waveform is required to be present in every trial. In other words, a single-trial amplitude of zero is allowable for any component. This is important because it acknowledges the empirical fact that in a given site in the brain, the trial-to-trial response is not stereotypic. The general resemblance of most single-trial responses to the averaged waveform arises from the fact that in most experimental trials, similar configurations of neural elements are activated with trial-varying latency and amplitude; by the same token, the occasional extreme deviation from the "stereotypic" waveform shows that essentially different ensemble configurations can substitute on a subset of the trials. By not assuming a priori that identical neural processes are active during every trial of the experiment, one is able to investigate the possibility that different processing strategies are used by the brain during the course of the recordings.
The dVCA algorithm
Bayesian probability theory is used to derive equations that use the recorded data to identify the maximum probability estimates of the mcERP model parameters in Eq. 1. Like PCA and ICA, each source waveshape is modeled as a pointwise time series where each point of the time series is considered to be a model parameter. The entire set of model parameters consists of these time-series points along with the mcERP parameters described above. Equations are obtained for each model parameter and an iterative fixed-point method is used to solve them jointly. The implemented algorithm is hierarchical in the sense that components are added one by one as the data analysis progresses. Figure 1 provides a flowchart describing the analysis stages used by the algorithm. In the text that follows, we present a basic outline of the operation of the algorithm. A more detailed description, including derivations of the equations, can be found in APPENDIX A.
|
The dVCA algorithm improves on the average ERP in two ways: by extending our ability to address single-trial responses and trial-to-trial variability, and by generalizing the one-component stereotypic waveshape model of the average ERP to a multiple-component stereotypic waveshape model. The trial-to-trial variability addressed by dVCA includes only single-trial amplitude and latency variation. Although retaining the concept of a stereotypic component waveshape from the average ERP may be seen as a liability, the state-of-the-art in signal processing does not allow us to consider radical multiple-component waveshape variations because that approach could easily lead to an algorithm that is overly sensitive to noise. Below, we will apply the dVCA algorithm to field potential data recorded from cortical area V1 of the macaque and demonstrate how dVCA can be used to address significant waveshape variations.
The fact that this algorithm is derived from a signal model that describes the recordings as a mixture of components implies that dVCA has something in common with techniques such as factor analysis (FA), the more familiar principal component analysis (PCA), and the more recent independent component analysis (ICA). This is in fact the case, and by reverting to a more simplistic signal model, assigning the appropriate prior probabilities and marginalizing over the source waveshape parameters one can derive, using the same methodology described in APPENDIX A, these more familiar techniques (Knuth 1997
, 1999
, 2005
). PCA assumes that the source amplitudes are Gaussian-distributed and chooses a solution such that a single source will contribute as much of the signal variance as possible. The result is an unphysical separation, which is constrained only by this unnatural variance assumption. ICA allows for more general assumptions regarding the distribution of the source amplitudes. ICA basically assigns non-Gaussian priors to the source amplitude distributions. By assuming that the sources are independent (or equivalently that these priors are factorable) and that there are as many sources as detectors, an integration over all possible source waveshapes can be performed, resulting in a straightforward learning rule for the coupling (mixing) matrix. Mixed signals will tend to have more Gaussian amplitude distributions, so ICA strives to find a separation matrix that minimizes the Gaussianity of the results, thus optimally separating the signals.
dVCA relies on the prior information about the phenomenon of trial-to-trial variability. This information is incorporated by explicitly introducing new relevant model parameters into the signal model (Eq. 1). In the derivation, we assume that we are ignorant about the values of the model parameters and therefore assign uniform priors to the source waveshapes, single-trial amplitudes, and single-trial latencies. One can also view this as an assignment of a multidimensional uniform density, which is maximally ignorant, but also conveniently factorable, because the probability of each variable is independent of one another. However, because it is a multidimensional uniform density, any combination of values of amplitude and latency from any source is as equally probable as any other. So the algorithm is not biased toward any particular values of amplitude or latency given the amplitude or latency of the same source in another trial or another source altogether. In this sense, dVCA does not impose any condition of independence in the basic assumptions underlying the algorithm. All possibilities are equally probable.
Because dVCA makes no explicit assumptions regarding independence, this algorithm has the ability to identify distinct sources with similar waveshapes or similar amplitude and latency variability patterns. We have performed some early experiments demonstrating the ability of dVCA to identify coupled sources and sources with similar waveshapes (Shah et al. 2003
). Not surprisingly, if the sources have similar waveshapes, but different trial-to-trial variability patterns, they can be separated. However, sources with identical waveshapes and identical trial-to-trial variability are inseparable. Later, we demonstrate using field potential data recorded from cortical area V1 of the macaque that coupled sources can be identified. Specifically, we identify amplitudeamplitude coupling across components, latencylatency coupling across components, and amplitudelatency both within and across components.
| RESULTS |
|---|
|
|
|---|
The dVCA algorithm was evaluated using synthetic data to demonstrate the utility of amplitude and latency variability in the identification of multiple evoked components and also to assess the robustness of the algorithm in the presence of noise. We simulated electric field potentials recorded from a linear-array multielectrode with 15 channels spanning the cortical laminae in V1. Specifically, we designed three synthetic ERP components (Fig. 2A) sampled at 2 kHz to approximate the neural ensemble response to diffuse red-light stimulation in macaque V1 (Givre et al. 1995
; Mehta et al. 2000
). Figure 2B shows the field potentials derived from the noise-free synthetic data as the detectors in the multielectrode would record them. Superimposing the field potentials from the three sources and approximating the second spatial derivative of this summed activity yields the current source density (CSD) profile shown in Fig. 2C. The value of the CSD technique (Freeman and Nicholson 1975
; Nicholson and Freeman 1975
; Mitzdorf 1985
; Tenke et al. 1993
) is evident as it both localizes the neural activity at the current sources and sinks, and eliminates volume-conducted, or far-field, activity (i.e., see component c3 below). Many of our results will be displayed using the CSD profile. We emphasize that dVCA works with the recorded field potentials only, and that displaying the CSD profile is used only to emphasize the quality of our results by retrieving accurate (in simulations) or physiologically reasonable (in real data) localizations of neural activity.
|
In the majority of simulations, uncorrelated Gaussian-distributed, additive noise was introduced, as in Eq. 1. The signal-to-noise ratio (SNR) is different for each component from trial to trial because the three simulated components have different single-trial amplitudes. For this reason, we specify the trial-average SNR for each component. Since the mean amplitude scale of the components is set to one, this is easily computed from the standard deviation (SD) of the detected component waveshape divided by the SD of the additive noise
![]() | (2) |
component = 0.876, 0.174, and 0.935 for c1, c2, and c3, respectively. Because we are using multiple detectors, the values of
component were found by multiplying the SD of the waveshape
wave by the L2 norm of the coupling coefficients for that component. For the nth component, this is
![]() | (3) |
The performance of the dVCA algorithm was evaluated in several ways. First, the ability of dVCA to separate the three mixed signals was measured using a unit-normalized Amari error (Amari et al. 1996
), which describes the degree of component mixing (details can be found in APPENDIX B, Eq. B5). A completely separated solution gives an Amari error of zero, whereas the worst case mixture gives an Amari error of one. For three signals of equal variance, an Amari error of 0.065 corresponds to 10% mixing.
Second, the ability of dVCA to estimate each component waveshape was evaluated by calculating the fractional root-mean-square (RMS) error of the estimated waveshape as compared with the correct waveshape. Note that before this comparison could be made the estimated components
(t) needed to be rescaled and permuted because of the indeterminacy described by Eq. B1. For the jth component
![]() | (4) |
Finally, the accuracies of the amplitude and onset latency estimates for the jth component were evaluated by computing the deviation from the correct values for the entire set of single-trial estimates using
![]() | (5) |
![]() | (6) |
), then the error one obtains by working with the average ERP necessarily must be greater than or equal to
because it does not account for trial-to-trial variability. Thus errors in the dVCA estimates below the SDs of the variability represent a significant improvement over the standard assumption that amplitudes and latencies do not vary. Effects of amplitude variability
We first demonstrate that amplitude variability aids in the separation process. Eleven experiments using synthetic data, each consisting of 50 simulated trials from the source component configuration described above, were performed where the degree of variability of the single-trial amplitudes was controlled. To generate the synthetic data, 50 single-trial amplitudes for each component were randomly sampled from a log-normal distribution with a sample mean µamp = 1.0 and sample SDs of
amp = {0, 0.0625, 0.125, 0.1875, 0.219, 0.25, 0.375, 0.5, 0.625, 0.75, 1.0} for each of the 11 sets of synthetic data. Because latency variability was not simulated in this set of experiments, the single-trial latency parameters were set to zero (
jr = 0). Given the component waveshapes, the coupling matrix, and the single-trial amplitudes and latencies, Eq. 1 was used to generate the synthetic data. The SD of
noise = 0.217 was used for the noise, resulting in SNRs of 12.1, 1.9, and 12.7 dB (4.04, 0.80, and 4.31 in terms of SD ratios) for c1, c2, and c3, respectively. The dVCA algorithm was used to estimate the coupling matrix, component waveshapes, and single-trial amplitudes and latencies from the synthetic data. Because the true parameters were known, the performance of the algorithm was evaluated as previously described.
Without amplitude variability, dVCA was unable to separate the components as demonstrated by the high Amari error of 0.219 (see Fig. 3A). A small degree of amplitude variability,
amp = 0.25, renders the problem solvable as demonstrated in Fig. 3A, where the Amari error drops below 0.05 corresponding to a fractional RMS waveshape error of <10% (SNR
23 dB) for three equal variance components, and remains relatively constant with increasing variability hovering about an average of 0.028. Figure 3B shows the dramatic improvement in the waveshape estimation as quantified by the RMS errors, which rapidly drop to levels commensurate with the SNRs of the individual components. The effect of amplitude variability on the single-trial amplitude estimates (not shown) remained relatively constant for
amp
0.25 with mean SDs of the errors in the single-trial amplitude estimates of 0.014, 0.076, and 0.010 and for c1, c2, and c3 respectively. This is well below the SD of the amplitude variability. It should be noted that even though the true onset latencies were set to zero in these simulations, dVCA still estimates these quantities. The errors of the onset latency estimates (not shown) also remained relatively constant with respective mean SDs of 0.417, 2.059, and 1.000 ms.
|
Effects of latency variability
Next we demonstrate that latency variability also aids in the separation process. Eight experiments using synthetic data, each consisting of 50 synthetic trials from the source component configuration described above, were performed where the variability of the single-trial latencies was controlled. In these experiments there was no amplitude variability (
jr = 1). To generate the simulated data, 50 single-trial latencies were randomly drawn from a Gaussian distribution with sample mean µlat = 0 and sample SDs of
lat = {0, 1.25, 2.5, 3.75, 5.0, 6.25, 7.5, 10.0} ms for each of the eight simulations. The same noise variance was used as in the amplitude variability case.
Amari error was found to decrease with increasing latency variability (Fig. 3C), dropping to <0.05 with
lat
7.5 ms. Component waveshape estimation was also found to improve with increasing onset latency variability, although the effect is not nearly so dramatic as in the amplitude variability case. Moreover, with the onset latency variability ranges considered, the accuracy of the algorithm's estimates never attained the levels found with amplitude variability. This difference between the effects of amplitude and latency variability was not attributable to the sampling distributions because we controlled for both the sample mean and variance. Instead, the most likely reason for this effect is the fact that in terms of signal amplitude, amplitude variability is a first-order effect, whereas latency variability, when written as a Taylor expansion, is a second-order effect dependent on the first derivative of the waveshape. Although the amplitudes were all set to one in the simulated data, dVCA still estimates these quantities. Again, these amplitude estimate errors were low for
lat
7.5 ms with the mean SD of the errors in the estimates equal to 0.017, 0.077, and 0.011 for c1, c2, and c3, respectively, which is on the order of the errors seen in the amplitude variability trials. Onset latency estimate errors also remained relatively constant with SDs of 0.250, 2.250, and 1.142 ms, respectively.
Sensitivity to additive white Gaussian noise
In this first noise study we examined the robustness of dVCA in the face of additive white Gaussian noise by simulating 12 different noise levels. Each simulation relied on 50 trials of synthetic data where variability in both the amplitudes and latencies of the components was modeled. Variability ranges were in accordance with those observed in our previous investigations (Truccolo et al. 2001
). Single-trial amplitudes were drawn from a log-normal distribution with sample mean µamp = 1.0 and sample SD
amp = 1.0. Single-trial latencies were drawn from a normal distribution with sample mean µlat = 0 and sample SD
lat = 10.0 ms. The synthetic signal from each electrode (detector) was contaminated with a unique white Gaussian noise waveform, which is specified by
mr(t) in Eq. 1.
The component-specific SNRs and resulting Amari errors, listed in Table 1 and shown in Fig. 4A, indicate a relatively smooth increase in Amari error with decrease in c1 SNR. However, a significant jump in error occurs as the SNR of c2 passes below 17 dB (note that this corresponds to a c1 SNR = 3 dB; see figure caption), suggesting that an SNR level of about 17 dB may denote a transition from a useful data set to a prohibitively noisy one. These results can be compared with the waveshape fractional RMS errors, which grow exponentially with decreasing SNR (Fig. 4B). The errors reach 1.0, signifying that the deviations in the estimates are on the scale of the waveshape itself, at about 15 to 17 dB for the localized components and about 17 dB for the far-field component. Figure 4E shows waveshape results for four different noise levels, providing a better idea of the quality of the separation under these conditions. Most noteworthy is the fact that the majority of the waveshape error arises from the high-frequency contamination of the Gaussian noise rather than mixing of the components. Much of this could easily be improved by incorporating a prior probability describing the expectation that components should be slowly varying with respect to typical sampling rates, which would effectively low-pass filter the results. As we will demonstrate, explicitly filtering a real data set can remove important signals, and alter others (Bogacz et al. 2002
; Mocks et al. 1986
). Regardless, some distortion in the positive phase of c1 can be seen at SNR1 = 3.0 dB, which is much more easily noticeable at SNR1 = 15 dB, where only c1 and c3 remain detectable. The improved accuracy in the estimation of the far-field component c3 (illustrated in Fig. 4B) is most likely explained by the fact that each electrode in the array provides information about the far-field source, whereas for local sources some detectors have small signals.
|
|
Figure 4D shows the behavior of the onset latency estimates, which are less well estimated than the amplitudes. In addition, the degradation of the quality of the far-field estimates was not noticeably different from those of the local sources with 95% of the estimates being within the range of variability down to an SNR of 3 to 1 dB, and 68% of the estimates within the range of variability down to 15 to 18 dB. Figure 4G shows scatterplots of the true onset latencies versus the estimates for c1. The diagonal pattern, which indicates a high level of predictability, is almost lost by 15 dB (r = 0.392). Note that as a result of the difference in variability levels of the amplitudes and onset latencies, the distribution of points in Fig. 4F should not be directly compared with those in Fig. 4G because they are effectively at different magnifications.
Sensitivity to correlated far-field noise
In an effort to more accurately simulate the conditions that may be experienced during a real experiment, we designed a second noise study to determine the effect of ongoing correlated far-field activity on dVCA performance. The ongoing far-field activity (noise) was modeled as a time series with a 1/f spectrum so that correlations would exist at all timescales. To simulate its far-field nature the ongoing activity was coupled identically to each detector, which can also be viewed as a spatial correlation across channels.
Again, 12 levels of the ongoing noise amplitude were tested; however, in this study Gaussian noise was not added to the individual electrodes, only the ongoing correlated far-field noise was used. Noise amplitude was computed from the sample variance of the time-series noise signal. The specific noise levels, SNRs of the three components and resulting Amari errors are listed in Table 2. Figure 5A shows the Amari error smoothly decreasing with increasing SNR. The Amari error reaches the same level of error as in the Gaussian noise case with 20 dB less noise, amounting to large errors around a c1 SNR of 3 to 8 dB. It is apparent that this type of noise more severely limits the ability of dVCA to separate signals. Figure 5B shows the component waveshape errors blowing up around 1 dB, with the effect on the far-field component c3 (red) being understandably catastrophic because both c3 and the noise are correlated across detectors. The amplitude errors reach unacceptable levels around 0 to 3 dB (Fig. 5C). However, most interesting are errors of the onset latencies (Fig. 5D). The errors for the local sources reach high levels (68% of estimates being within the range of variability) between 0 and 1 dB, whereas the errors for the far-field component c3 are large across the entire SNR range considered with 95% of the estimates never being within the range of variability. This is because the correlated far-field noise is severely interfering with the ability to identify the far-field component in any given trial.
|
|
Neuroscientists have several available techniques to analyze evoked responses, although each of these techniques is based on different assumptions and signal models. Here we briefly evaluate two popular techniques, PCA and ICA, and compare them to dVCA using our synthetic data, which as described above, is based on experimental findings regarding both the componentry and their laminar distribution (Givre et al. 1995
; Mehta et al. 2000
) and observed trial-to-trial variability (Truccolo et al. 2002
). ICA was performed by applying the Infomax ICA algorithm in the EEGLAB toolbox (Delorme and Makeig 2004
). The runica.m algorithm (Makeig et al. 1996
) was used both with and without the "extend" option (extended ICA), which allows ICA to model sub-Gaussian sources (Lee et al. 1999
). Because extended ICA consistently gave better results in the single-trial data sets exhibiting trial-to-trial variability, we will present those results here and in Fig. 6. PCA was performed using the EEGLAB routine runpca.m (Colin Humphries, CNL/Salk Institute, August 1997), which computes the singular value decomposition (SVD) of the data and identifies the resulting eigenvectors as the estimates of the source waveshapes. Because each data set has 15 channels, both PCA and ICA estimate 15 source waveshapes. Because it is known that the synthetic data are composed of three sources, the three estimated source waveshapes with highest correlation to the three original sources were chosen for analysis, whereas the remaining estimated sources were ignored. This avoids the difficult issues of estimating the number of sources in both PCA and ICA, and thus gives them a slight advantage.
|
amp = 1.0, normally distributed latency variability with sample mean µlat = 0 and sample SD
lat = 10.0 ms, and possessed additive Gaussian noise identical in SNR level to that of the no variability case. These two data sets were analyzed using PCA and ICA in two different ways: first by averaging the epochs and analyzing the average response (average analysis), and second by treating the data as a long string of concatenated single-trial responses (single-trial analysis). Figure 6A shows the source waveshape results for PCA, extended ICA, and dVCA in the single-trial analysis cases. Note that only the dVCA analysis of the variable responses results in accurate identification of the underlying source waveshapes. The Amari errors in Fig. 6B quantify the degree to which the components were correctly identified in the four cases. In this example, the performance of the original ICA algorithm was comparable to PCA. Because dVCA can analyze only single-trial responses, the dVCA analysis consisted of analyzing the single-trial cases, resulting in only two bars in the bar graph. Note that dVCA performs poorly when there is no trial-to-trial variability. However, the presence of trial-to-trial variability dramatically improves the performance of dVCA, enabling it to surpass both PCA and extended ICA in separation quality.
Finally, we compared the ability of extended ICA and dVCA to identify source waveshapes in the face of additive white Gaussian noise. These data sets are identical to those analyzed in the earlier section on Sensitivity to additive white Gaussian noise. Note also that the data set with the smallest noise level (SNR1 = 15 dB) is the same data set that was used in the variable single-trial case considered above. Figure 6C demonstrates that dVCA consistently outperforms extended ICA with respect to robustness to noise. It should be noted that the extended ICA algorithm outperformed the original ICA algorithm in all but two cases. At SNR1 = 9 dB, the Amari error of extended ICA was 0.240, whereas the Amari error of ICA was 0.177 compared with the Amari error of dVCA, which was 0.100. In the second case, the Amari error of extended ICA and ICA differed by only 0.001.
Application to cortical field potential data
To further demonstrate the utility of dVCA, we applied the algorithm to field potentials recorded from primary visual cortex of a male macaque monkey during the cuing period of an earlier selective attention study (Mehta et al. 2000
) for which all animal care and use procedures were approved by the Institutional Animal Care and Use Committee, and were in accordance with National Institutes of Health publication no. 8623 (rev. 1087). During data collection, the subject was required to discriminate between standard and target visual stimuli for a juice reward. The standard visual stimulus was a 10-µs red-light flash presented through a diffusing plate subtending a 20° visual angle, centered on the point of visual fixation, while the target was presented similarly, although differing slightly in intensity. Stimuli were presented at irregular interstimulus intervals (ISIs) (minimum of 350 ms, average of 2 stimuli/s). Intracortical field potential profiles were recorded using a linear-array multielectrode with 14 contacts (or channels), equally spaced at 150 µm, inserted into V1 and positioned so that the channels spanned all six laminae (see Fig. 2A, middle). The continuous field potential record from all channels, incorporating the stimulus tags, was sampled at 2 kHz and recorded on a PC-based data acquisition system (Neuroscan, El Paso, TX).
The signals examined with dVCA were recorded during 171 trial presentations of the standard visual stimulus and were epoched from 0 to 300 ms (0 ms indicates stimulus presentation). No other preprocessing of the data was performed. The average field potential signals in each electrode contact were calculated and used to determine the current source density (CSD) profile of these data (Fig. 7A). The CSD is an estimate of the second spatial derivative of the electric potential, which is proportional to the vector divergence of the current field. This profile was approximated using a three-point, second-order difference of the field potentials (Nicholson and Freeman 1975
; Schroeder et al. 1995
). Because the CSD represents the divergence of the electric current, it indexes the local regions of transmembrane current flow and eliminates volume-conducted activity generated at a distant site, which often contaminates field potential recordings. This makes the CSD a useful tool for studying the electric signals detected by an electrode array. Consistent with earlier studies reviewed by Schroeder et al. (1995
, 2001
), the visually evoked CSD profile sampled from V1 during this experimental session shows that the earliest activation occurs in the granular subdivisions of Layer 4, which constitute the major target of thalamic afferents. A second focus of activity localizes to the supragranular (laminae 2 and 3) layers and is thought to index a feedforward activation of pyramidal cells by interneurons in the granular layers (Schroeder et al. 1990
, 1991
). Note that because we are using a three-point, second-order difference to approximate the CSD, the top waveform in Fig. 7A is associated with electrode channel 2 rather than channel 1.
|
1 = 0.1044), which is on the order of 10% variation. In contrast, the distribution of single-trial latency shifts (Fig. 7D) shows something quite different and unexpectedit is bimodal with an early latency peak at 4.625 ms and a late latency peak at 2.125 ms with a difference of 6.75 ms. The ratio of late to early responses is 2.29:1 (119:52), as defined by a 1.25-ms cutoff between the two modes. We performed several different analyses to confirm the existence of the bimodal latency distribution displayed in Fig. 7D. First, Fig. 7E displays a colorized plot of the actual, single-trial field potentials (FPs) recorded from electrode channel 10, which is located in the granular layer. Time runs along the horizontal axis, chronological trial number runs along the vertical axis, and color indicates the amplitude of the FP. The red dashes on the left side of the plot indicate trials for which dVCA determined the latency to be early. It is easily seen that these highlighted trials have a response onset before the late trials, confirming that the bimodal latency distribution of the dVCA estimation concurs with observations from the actual data. It is also apparent that early and late responses are grouped over trials, which suggests a slow cyclical shift between different response states with one state dominated by faster and the other by slower responses. Second, we performed a cross-correlation analysis between each single-trial waveshape and the trial-averaged, FP waveform between 25 and 95 ms (this time period captures the initial negative deflection of the averaged FP waveform). This analysis also yielded a bimodal distribution (not shown). Finally, we selectively averaged the early and late FP recordings in channel 10 and showed them to be significantly different in both latency and waveshape (Fig. 7F). The trial-averaged FP waveshape (black) lies between the early (red) and late (blue) waveshapes. As expected, the minimum of the early averaged responses occurs before the minimum of the late averaged responses (5.5-ms difference). Although this time difference reflects the fact that these early responses precede the late responses, it is however not as accurate as the dVCA estimate because it is derived from a single time point rather than from the entire waveshape. It is also not surprising that the late responses more closely resemble the ensemble-averaged FP because they outnumber the early responses by more than 2 to 1.
Because these two types of granular responses exhibit distinct waveshapes and latencies this suggests that different physiological mechanisms are at work in these two processes. These mechanisms will be much easier to distinguish by separating the original data set into two subsetsan early subset and a late subsetand performing separate analyses on each and comparing. We then used dVCA to reestimate the first component in each subset. Figure 8, A and B shows the CSD of the first component from the early and late subsets, respectively. The onset latency of the prominent granular sink over source configuration was determined by descending down the left side of the peak and identifying the point in time where five consecutive previous data points were monotonically increasing. With this measure the response onset latency of the early subset (Fig. 8A) was 28 ms, which is clearly earlier than its counterpart in the late subset (Fig. 8B), which was 42.5 ms, as indicated by the drop lines. The waveshapes of the two types of Layer 4 responses differ significantly, indicating a difference in activation patterns between subsets. This is also confirmed by the different degrees to which this activation appears in the supragranular layers because the later responses seem to be more strongly coupled to the supragranular activation than the early responses.
|
To verify that these UHF oscillations were not present in the late responses, we applied a Morlet-based wavelet transformation to the two residual subsets of channel 11. To preserve information about timefrequency behavior of oscillations not precisely time-locked to the stimulus, we applied the wavelet transform to each single trial separately and the wavelet results were then averaged. These results show that there is a burst of UHF activity ranging from about 160 to 220 Hz and occurring between 42 and 57 ms in the early subset, which is represented as an "island of activity" in the timefrequency plot (Fig. 8D). Such an island of activity is completely absent in the late subset (Fig. 8E). This finding demonstrates the ability of dVCA to identify distinct physiological modes of activation using the single-trial characteristics of the responses, even in situations where the data are unfiltered.
To demonstrate how dVCA can be used to extract multiple components, we focus on the late subset. A complete analysis of the entire data set will be published elsewhere. In this example we model three components (Fig. 9, AC). Components 2 and 3 (c2 and c3) have been magnified by a factor of 2 to make them easily visible, and are thus not on the same scale as component 1 (c1), which is the largest response. To ensure that the SNR is within the acceptable range for dVCA, we used Eq. B4 to compute the SNR for each component in each channel. For a given component this is accomplished by estimating the SD of the component in that channel and the SD of the residual signal in that channel. The average SNR for each component was then computed by averaging its SNRs across channels. We found that all three components are well within the range of acceptable performance with average SNRs of 1.59, 1.68, and 0.98 dB for c1, c2, and c3, respectively.
|
A low-frequency biphasic response is described by c2 (Fig. 9B). Significant onset of this component occurs at 37 ms (see drop line), which is after c1 begins the biphasic activation in the granular layers. Note that the source and sink distribution of c2 are in superficial layers compared with the small supragranular response of c1. At first glance it appears that the spatial distribution of c3 is the same as that of c2. However, whereas these components may involve the same populations of supragranular neurons, c3 also involves others in the granular layers. The initial activation of c3 is at 30 ms, but it is not until 45 ms when the component begins to display a biphasic activation followed later by a slow wave.
Figure 9D shows the single-trial amplitude histograms along the diagonal, along with scatterplots showing the relationships between the single-trial amplitudes of the three components. The initial granular response, c1, has the lowest amplitude variability (
amp1 = 0.102), whereas the later supragranular responses display greater variability (
amp2 = 0.248,
amp3 = 0.362). Figure 9E shows the single-trial latency results along with scatterplots depicting the trial-by-trial relationships between component latencies. Component c1 again displays the least variability (
lat1 = 1.08 ms), but in this case, c2 displays greater variability than that of c3 (
lat2 = 9.85 ms,
lat3 = 1.58 ms).
Correlations in the scatterplots between component parameters indicate dynamic interactions among these components. Of the amplitudeamplitude interactions (Fig. 9D), we see that the amplitudes of c2 are correlated with c3 (r = 0.495, P < 107), so that when c2 is larger than average, c3 is also larger than average. Such correlation might suggest that these components have not been separated. However, note that the latency variability of each component is very different (Fig. 9E), and that there is little correlation between the latency of c2 and the latency