## Abstract

Transfer entropy, presented as a new tool for investigating neural assemblies, quantifies the fraction of information in a neuron found in the past history of another neuron. The asymmetry of the measure allows feedback evaluations. In particular, this tool has potential applications in investigating windows of temporal integration and stimulus-induced modulation of firing rate. Transfer entropy is also able to eliminate some effects of common history in spike trains and obtains results that are different from cross-correlation. The basic transfer entropy properties are illustrated with simulations. The information transfer through a network of 16 simultaneous multiunit recordings in cat's auditory cortex was examined for a large number of acoustic stimulus types. Application of the transfer entropy to a large database of multiple single-unit activity in cat's primary auditory cortex revealed that most windows of temporal integration found during spontaneous activity range between 2 and 15 ms. The normalized transfer entropy shows similarities and differences with the strength of cross-correlation; these form the basis for revisiting the neural assembly concept.

## INTRODUCTION

Over the past decades, most investigations concerning spike trains have focused on one of the main issues in systems neuroscience—understanding of the neural code (deCharms and Zador 2000; Eggermont 1998; Perkel and Bullock 1968; Pouget et al. 2000). Basically, the two main approaches depict neurons either as sensitive to input firing rates and act as integrators or as coincident detectors that are sensitive to the temporal patterns of the input (Gerstein and Kirkland 2001). Given that the synchronization between the two multiple single-unit recordings has a modest dependency on the sensory stimulus used (Eggermont 1994), coincident firing has mainly been investigated and used in the search for neural assemblies or neuron clusters able to show a stimulus-induced modulation of their synchronized firing [e.g., the gravitational clustering method (Baker and Gerstein 2000; Gerstein and Aertsen 1985) and synchrony clustering (Eggermont 2006)]. Studies based on firing rates have instead mainly focused on the processing abilities of single neurons as a consequence of their unique physiological specialization (type of cell, tuning properties), how they integrate the input activity, and eventually how a population of neurons possibly encodes one feature of the stimulus by a rate code (Pouget et al. 2000). Consequently, there has been a steady rise in interest in the estimation of “information” carried by single neuron, and multiple single units or populations to specific stimuli.

The concept of information with respect to neuronal activity, although stemming from Shannon's theory, has no standardized meaning in neuroscience and has been used in several different ways (Borst and Theunissen 1999). For instance, information rates have been estimated at a microscopic physiological scale in the transmembrane response to hormonal stimuli (Prank et al. 2000) or within a single synapse (London et al. 2002). The entropy carried by a spike train has also been widely investigated, the complexity of estimation being reflected by the large range of methods proposed [e.g., histogram method (Strong et al. 1998), vector spaces (Victor 2002), Lempel–Ziv complexity (Amigo et al. 2004)]. Information-related measures between neurons have also been used to study the effect of noise correlation on encoding and decoding stimuli (Averbeck and Lee 2006). Finally, the mutual information between the spike-train responses and a set of stimuli was also estimated to investigate the discrimination abilities of neurons (Borst and Theunissen 1999; Chechik et al. 2006; Gehr et al. 2000; Werner and Mountcastle 1965).

Herein we present the transfer entropy as a new exploratory tool that provides a bridge between the study of neural assemblies and the information about stimuli carried by individual neurons. More precisely, the transfer entropy estimates the part of activity of a neuron that is not dependent on its own past but dependent on the past activity of another neuron. In a nutshell, it estimates the information transferred between two neurons in both directions. To our knowledge, the transfer entropy concept has been discussed in Jumarie (1990), but applied only once in context of physical continuous systems (Schreiber 2000), and has never been applied to spike trains. Yet, to a certain extent, this tool is able to distinguish information resulting from common history and exclude it by appropriate conditioning of the entropy. Transfer entropy also detects asymmetry in neural relations, allowing studies of possible feedback in neural circuits, a topic that recently gained considerable interest (Contreras et al. 1996; Hupe et al. 1998; Krupa et al. 1999; Sillito et al. 1993, 1994; Yan and Suga 1996). Finally, but not unimportantly, transfer entropy takes into account linear and nonlinear flows and thus may represent a very general way to define the causality strength between two spikes trains. In particular, the window size for which maximum information is transferred may be useful to study neural integrative properties.

After presenting the mathematical tenets of the statistics, its basic properties will be elucidated through simulations of independent Poisson processes and compared with those of cross-correlation measures. An exploration of the multiunit activity recorded with an array of 16 electrodes in cat auditory cortex will highlight the potential of the method in network studies. Finally, recordings of spontaneous activity in 21 cats will show the statistical distribution of transfer entropy and above all delineate the size of temporal integration windows in primary auditory cortex as a potential first important physiological result.

## METHODS

### Transfer entropy

Let *X*_{1} and *X*_{2} be two spike trains. Let *X*_{1}^{F}(*t*), *X*_{2}^{F}(*t*) be the number of spikes of *X*_{1} and *X*_{2}, respectively, falling in the upcoming time interval ⌊*t*, *t* + τ_{f}⌋. Similarly, let *X*_{1}^{P}(*t*), *X*_{2}^{P}(*t*) be the number of spikes of *X*_{1} and *X*_{2} falling in the past time interval ⌊*t* − τ_{p}, *t*⌋. τ_{f} and τ_{p} typically range from 1 to <100 ms. In practice, time is considered discrete with *t*_{n} = *n*τ_{f}, *n* ∈ {0, 1, 2,…} such that [*X*_{1}^{F}(*t*_{n})]_{n}, [*X*_{2}^{F}(*t*_{n})]_{n}, [*X*_{1}^{P}(*t*_{n})]_{n}, and [*X*_{2}^{P}(*t*_{n})]_{n} are discrete processes.

In the stationary case, the transfer entropy from *X*_{1} to *X*_{2} can be defined as the amount of mutual information between the past of *X*_{1} (*X*_{1}^{P}) and the future of *X*_{2} (*X*_{2}^{F}) when the past of *X*_{2} (*X*_{2}^{P}) is already known, i.e. (1) where *H*(*X*_{2}^{F}|*X*_{2}^{P}) is the entropy of the process *X*_{2}^{F} conditional on its past. The distributions of *X*_{1/2}^{F/P}, being discrete, can be written explicitly as (2) where *k*, *l*, *m* ∈ {0, 1, 2,…}. Under independence between *X*_{1} and *X*_{2}, . The equality (3) shows that the transfer entropy (*TE*) represents the amount of information provided by the additional knowledge of the past of *X*_{1} in the model describing the information between the past and the future of *X*_{2}.

When no hypothesis regarding the distribution of *X*_{1/2}^{F/P} is made, the theoretical properties of the transfer entropy are probably extremely difficult to know a priori. Nevertheless, if τ_{f} and τ_{p} are large, the joint distribution between *X*_{1}^{F}, *X*_{1}^{P}, and *X*_{2}^{P} will be broad and sparse. The transfer entropy will thus be increasing automatically even if there is no causal link. Two corrections are necessary to avoid this effect. We present them in the next several paragraphs.

### Bias

At first, we remove from an estimate of the same transfer entropy in shuffled data, thereby modifying *X*_{1}^{P} to make it independent of *X*_{2}^{F/P}. Practically, we randomly shuffle the interspike intervals (ISIs) of *X*_{1}, which does not change the ISI distribution of *X*_{1} but completely disconnects *X*_{1}^{P} and *X*_{2}^{F/P}. This procedure was previously used in another context in Hung et al. (2002), for instance. Another possibility is to directly shuffle the values of the process *X*_{1}^{P}, which gives similar results and may be faster computationally. The shuffled estimate is dubbed ; in the following and is an average of results obtained on *n* trials.

### Normalization

Finally, given the properties of mutual information, we define the normalized transfer entropy (*NTE*) by (4) Intuitively, this represents the fraction of information in *X*_{2} not explained by its own past and explained by the past of *X*_{1}.

### Preferred direction of flow

Similarly to Wang and Kadia (2001) and Schnupp et al. (2006), we also define a selective index of the preferred direction of flow (*DF*) by (5) where, obviously, .

### Final estimates

All previous quantities depend on τ_{f} and τ_{p} values. All simulations and investigations on real data convinced us that a clear peak always exists in the surface of *NTE* values as a function of τ_{f} and τ_{p}. Consequently, we always assigned to *NTE* the maximum over the set of τ_{f} and τ_{p} values.

### Cross-correlation

Cross-correlograms were calculated using custom-made programs in MATLAB (Eggermont and Smith 1995b). The bin size was 2 ms and the resulting cross-correlogram was smoothed with a three-bin running average. Stationarity estimates of the recordings were based on firing rate (mean and variance) in 100-s-long segments of the 15-min recordings for silence. To correct for the overall firing rate, burst firing, and common periodicities in the firing of the neurons, the cross-covariance was deconvolved with the square root of the product of the autocovariance functions. This deconvolution was done in the frequency domain, where it becomes a simple division; Fourier transformation back to the time domain resulted in the corrected cross-correlation coefficient function.

### Simulations

We simulated some independent Poisson processes for *X*_{1} and *X*_{2}. We then replaced in *X*_{2} a proportion α ∈ [0, 1] of spikes by the same proportion of spikes of *X*_{1} delayed by 10 ms. In this way, the firing rate (*FR*) of *X*_{2} is not modified, but it creates a causality link from *X*_{1} to *X*_{2} of various strengths proportional to α. The parameter for the exponential distribution underlying the Poisson process is λ = 1/10, which gives an average *FR* of 10 spikes/s for *X*_{1} and *X*_{2}. Spike trains of 300-s duration are generated for both processes.

### Real data

We analyzed spike trains recorded during silence from the primary auditory cortex (AI) of 21 ketamine-anesthetized cats. The length of the recording is 900 s for each data set. Recordings were made with two arrays of eight microelectrodes, arranged in a 4 × 2 pattern with 0.5-mm separation between electrodes. The arrays were independently inserted into the auditory cortex. Details about the anesthesia, the electrode array, and the protocol can be found in Tomita and Eggermont (2005). The spike trains from individual electrodes represent multiple sorted units combined into multiple single-unit recordings. *NTE* is computed between two such multiple single-unit recordings. In addition to spontaneous spiking activity, we also analyzed the spike trains in response to several stimuli used in previous studies: *1*) *Poisson*: Poisson-distributed click trains, with mean click rate of 8/s and dead time of 20 ms, and lasting 15 min (Eggermont and Smith 1995a). *2*) *NoiseAM*: amplitude-modulated noise, modulation frequency 2 to 64 Hz for AM sounds (Eggermont 2002). *3*) *PP*: randomly presented gamma-tone pips at a rate of 20/s with a range of five octaves between 0.625 and 20 kHz (Eggermont 2006). *4*) *Meow*: typical vocalization of a cat, natural and altered with respect to carrier and envelope (Gehr et al. 2000; Gourévitch and Eggermont 2006). *5*) *RMeow*: time-reversed version of the Meow stimulus. *6*) *lpamn:* wide-band noise (bandwidth: 40 kHz) modulated with a 30-Hz low-pass filtered noise (Eggermont 2006). *7*) *BaPa*: presentation of a /ba/–/pa/ continuum in which voice onset time (VOT) was varied in 5-ms step from 0 to 70 ms (Aizawa and Eggermont 2006). *8*) *Gaps*: noise bursts with gaps from 5 to 70 ms (Aizawa and Eggermont 2006). *9*) *Train*: periodic click trains, repetition rates from 2 to 64 Hz (Eggermont 2002).

## RESULTS

To compute each value of shuffled estimates *n* = 20 trials were used.

### Simulations

For α = 1 (full causality, Fig. 1 *A*), for τ_{p} = τ_{f} = 10 ms and (Fig. 1*B*) as is discussed in the following. The reason that * NTE* reaches its maximal value for τ

_{p}= τ

_{f}= 10 ms is explained through the example in Fig. 2, which also explains why we find for τ

_{p}+ τ

_{f}< 10 ms.

The peak in Fig. 1*C*, corresponding to the maximum of transfer entropy, is less sharp than that for the *NTE* estimate. The preferred direction of flow (Fig. 1*D*) is always *DF*_{X1→X2}=1 except when τ_{p} + τ_{f} < 10 ms, where as explained previously. Consequently, the *DF* statistic should not be used when both *NTE* estimates are close to zero.

The *NTE* estimate is nonlinearly related to α (Fig. 3), in contrast to the linear dependency for the cross-correlation (*XC*). However, *NTE* is more suited to the study of complex neural networks than *XC*.

Figure 4 *A*, model 1 represents the case α = 0.6 with a delay of 10 ms between *X*_{1} and *X*_{2}. Model 2 (Fig. 4*B*) is a combination of the same 60% of spikes of *X*_{1} but in three fractions of 20%, each part being delayed by 4, 8, and 10 ms, respectively. Such variability in delays may occur if several parallel pathways with a different number of synaptic delays are activated. This situation is common in the brain, especially in nonprimary sensory areas where the neural discharges are spread out temporally (for a comparison of temporal patterns in posterior auditory field and AI see Phillips and Orman 1984).

Compared with model 1, the maximum of *NTE* is only slightly lower for model 2 (Fig. 4, *C* and *D*). In contrast, the peak in the cross-correlation function, with three small peaks being each associated with one of the three delays used (Fig. 4*E*), has dramatically decreased from 0.6 to 0.2 (Fig. 4*F*). Interestingly, the maximum *NTE* occurs for τ_{f} equal to the minimum delay (4 ms) and τ_{p} equal to the maximum delay (10 ms). These properties of *NTE* emphasize its potential as a tool to investigate integration memory and information transfer in neural assemblies.

The rationale for the need of shuffled estimates and normalization is emphasized in Fig. 5, where the quantities of transfer entropy are plotted for values τ_{f} = τ_{p} and α = 0.5. The transfer entropy is increasing when τ_{f} and τ_{p} are increasing (Fig. 5*A*), as a consequence of the broad and sparse joint distributions. Removal of this bias ensures that the transfer entropy stays around 0 when no causality is present (Fig. 5*B*, dashed line for information transfer from *X*_{2} to *X*_{1}). Finally, because the amount of information available in *X*_{1} and *X*_{2} is also increasing when τ_{f} and τ_{p} are increasing (Fig. 5*C*), the normalization of the transfer entropy by this latter value sharpens the main peak at 10 ms. A higher average *FR* for *X*_{1} or *X*_{2} would basically have the same effect as increasing τ_{f} and τ_{p}, i.e., increase of the amount of information available in *X*_{1} or *X*_{2}. Consequently, the combination of bias removal, normalization (*Eq. 4*), and controlling the influence on the future of a channel of its own past (*Eq. 1*) makes *NTE* mostly independent of the firing rate of both neurons.

### Real data: information flow in cortical neural networks

The ability of the transfer entropy to investigate neural assemblies is described in Fig. 6. Two arrays of eight electrodes are inserted in the auditory cortex of a normal hearing cat (Fig. 6*A*), array 1 being in a ventral part of AI where some recording sites showed nonprimary behavior (C3, C5, C6). This classification was based on longer response latency, more sustained responses, and nonmonotonicity, i.e., responses peak at an intermediate intensity level (Fig. 6*B*). For spontaneous firings, the matrix of *NTE* values (Fig. 6*C*) suggests various networks of information transfer graphically represented in Fig. 6*D* (for *NTE* >0.04). Little information was shared between electrodes in different arrays (Fig. 6*C*). A cluster analysis based on *XC* values (Eggermont 2006) showed one cluster for array 2; one cluster consisting of C1, C2, C3, and C4; one cluster consisting of C5 and C6; and one consisting of a single electrode C7 (indicated by different colors in Fig. 6*D*). The maximum of *NTE* between electrodes from different arrays was consistently found for higher values of τ_{f} and τ_{p} (Fig. 6, *E* and *F*). Except for C10, a flow from left to right and bottom to top is visible in array 2 (Fig. 6*D*). Interestingly, a flow from primary to putative nonprimary recording sites is clearly visible for array 1 (Fig. 6*D*), and even associated with small τ_{f} and τ_{p} values (Fig. 6, *E* and *F*). The transitivity rule is respected here—that is, if there is no relation from channel 2 to 1, then there is none from 2 to 3, and from 3 to 1; however, some strong transfers might occur in both directions (for instance, C1 to C4 and C4 to C1, see Fig. 6*C*). One possible hypothesis is that this results from an indirect feedback. However, most information transfer occurs in one single preferred direction, as illustrated in Fig. 6*G* between C2 and C4 both with primary-like responses. For real data, just as for the simulations, a single peak is present in the surface of *NTE* values as a function of τ_{f} and τ_{p} (Fig. 6*G*).

One very interesting prospect for the transfer entropy is in the assessment of the neural assembly behavior for different auditory stimuli. A recent study demonstrated that correlated neural activity gives rise to clusters of neurons that expand and contract in size in response to different stimuli (Eggermont 2006). Such results may potentially be extended by means of an information transfer evaluation between arrays of neurons. The responses to different stimuli from the 16 recording sites described above strengthen this assumption (Fig. 7): the global network (i.e., the direction of flow and the recording sites involved) remains unchanged with regard to the stimulus used. However, there is a high variability across stimuli in the strength of information transfer between neurons of the network. For instance, there is more information transferred from site C13 to C7 when the stimulus is tonal and harmonic (Meow, RMeow, BaPa). In contrast, the maximum of information transferred from C16 to C10 is reached for clicks (Poisson, Train). In this particular case the cluster analysis resulted in only one cluster encompassing both arrays. An intriguing result also stems from the variability of τ_{p} across stimuli (Fig. 7*B*). Whereas close sites C9, C10, and C11 shared information for very small and unchanged τ_{p} values over all stimuli, distant sites C7 and C13 showed high variability for τ_{p}. More precisely, presentation of natural and altered Meows provoked more information transferred from C7 to C13 along with longer τ_{p} values. In contrast, pairs C9/C16 and C10/C16 also showed longer τ_{p} values during Meows or silence stimuli without any specific increase of *NTE* compared with other stimuli.

### Real data: global results

Analysis of *NTE* values obtained for spontaneous activity from 5,650 electrode pairs in AI of 21 cats illustrates the putative statistical properties of the transfer entropy in vivo (Fig. 8). The distribution of *NTE* values approximately follows an exponential law (Fig. 8, *A* and *B*) with parameter λ = 1/0.0225, where 0.0225 is equal to the mean *NTE*. In particular, 5% of *NTE* values are >0.0714 and 16% are >0.04, the value taken as a lower limit in constructing the transfer diagram of Fig. 6*D*. The normalized information transfer computed without conditioning to the past of the current neuron was found to be 25% higher than *NTE* values, in average. This suggests that common history between pairs of neurons would account for roughly 20% of information transfer values if conditioning was not performed. The *NTE* values are somewhat correlated with peak cross-correlation values (Fig. 8*C*, correlation coefficient 0.58). Nevertheless, strong variability is apparent, suggesting the existence of pairs of neurons that are transferring information but are poorly synchronized, or in the opposite direction. This reflects the difference of information transfer revealed with these two tools. Similarly, the lag for the cross-correlation peak and the τ_{p} values are weakly correlated (0.31), although τ_{p} is generally higher than the lag time in absolute value [*P* < 10^{−6}, Wilcoxon test (Wilcoxon 1945); Fig. 8*D*]. This suggests that activity may be integrated over a larger interval than that strictly associated with the mean delay between neuronal firings.

### Neural integration times

Figure 9 presents results about transmission times and neural integration times involved in information transfer in cat AI. Most of the influence of past activity was restricted to the next 5 ms of neuronal activity (distribution of τ_{f} values; Fig. 9*A*). In contrast, the duration of past integration memory was larger, generally extending *F* up to 15 ms but occasionally even *F* up to 35 ms (Fig. 9*B*). The highest values for *NTE* were found for integration-memory duration <10 ms (Fig. 9*C*). As expected, the information transfer decreased with distance between neurons (Fig. 9*D*), suggesting that, at least during spontaneous activity, redundancy between neuron activities occurs mostly locally. Consequently, the influence of multiunit activity from one recording site onto a distant one is weak and drowned in thousands of other incoming connections to this site. Another consequence is that the minimum τ_{p} values increase with distance between electrodes (Fig. 9*E*). However, interestingly, some high values for τ_{p} can be found even for nearby electrodes, suggesting the existence of neurons that process input activity over long temporal integration windows, even if in this case the *NTE* is necessary smaller. *XC* also decreases with distance between neurons in similar fashion to *NTE* (Fig. 9*F*).

## DISCUSSION

### Causality and information transfer

Most tools used in the investigation of causality in neuroscience, especially in electroencephalography (review in Gourévitch et al. 2006), are based on an interpretation of the Granger Causality definition (Granger 1969): “We say that *X*_{1}(*t*) is causing *X*_{2}(*t*) [*X*_{1}(*t*) ⇒ *X*_{2}(*t*)] if we are better able to predict *X*_{2}(*t*) using all available information than if the information apart from *X*_{1}(*t*) had been used.” In his paper, Granger interpreted “better able to predict” as a reduction of the variance of the prediction error. Yet, in the light of information theory, the ability to better predict can also be understood through the entropy of the predicted variable. If the uncertainty (entropy) associated with a random variable is reduced, the prediction of its possible values is indeed improved. From *Eq. 1*, it appears that transfer entropy is the reduction of uncertainty in the future of *X*_{2} (*X*_{2}^{F}) attributed to the knowledge of the past of *X*_{1} (*X*_{1}^{P}). Consequently, when common input does not explain all the activity, *NTE* is a quantification of a causality link in the Granger sense.

Because *NTE* is based on information theory, we also posit that it is a very general way to define causality, a way that encompasses both linear and nonlinear relationships between the activities of a pair of neurons. However, only bivariate cases are considered for *NTE* because “the information apart from” *X*_{1}^{P} is *X*_{2}^{P} and so all the available information is implicitly reduced to *X*_{1}^{P} and *X*_{2}^{P}. We are aware that a “future challenge is to design methods that truly allow neuroscientists to perform multivariate analyses of multiple spike trains data” (Brown et al. 2004). However, even though *NTE* theoretically can easily be extended to an *n*-system of spike trains, it has been restrained in this paper to bivariate cases because of unobserved contributing neurons and the “curse of dimensionality” issues if all units available are used. One consequence is that “direct causality” should probably be avoided as an interpretation of *NTE* in a multiple spike-train context because of common inputs and potential intricate parallel and intermediate pathways between the pairs of neurons or multiple single units studied. A better interpretation may be that, in the Shannon sense, information present in one spike train is transferred by any synaptic pathway and subsequently observed in another train. Such a tool may thus be extremely useful in redundancy studies in the brain.

### Neural assemblies

The greatest interest about neural networks in the brain concerns the parameters describing relations between neurons and their evolution during elicited responses. For instance, thebalance between inhibition and excitation appears crucial (Bush and Sejnowski 1996; Kirkland and Gerstein 1998; Xing and Gerstein 1996). It may drive the contraction or enlargement of neural assemblies observed through synchrony (Eggermont 2006). One of the most common definitions for neural assemblies is “a group of neurons [that are] at least transiently working together as indicated by correlation of unit activity” (Gerstein and Kirkland 2001). We feel that the restriction of assembly membership by *correlation* only is too limited. It seems to us that temporal integration—and thus information transfer as quantified by *NTE*—defines another parameter of relations between neurons that is also able to emphasize neural assembly properties. An extended definition of neural assemblies would rather become “a group of neurons that are at least transiently working together as indicated by significant levels of synchronization and short-time integration between their unit activities.” In this respect it is important to notice that the peak widths of the cross-correlograms (Eggermont 2000) are ranging over the same values as the integration times involved in *NTE*.

Presently, the size of microelectrode arrays (mostly 16 or 32 electrodes) does not allow exhaustive sampling of neural networks. However, it is likely that investigations in the next decade will produce hundreds of simultaneous recordings, from which more precise and realistic descriptions of neural assembly processing will arise. Regardless, *NTE* may be useful to make network models or neural computation models more realistic by defining additional physiological parameters (see, for instance, Bush and Sejnowski 1996; Davey et al. 2006; Feldman 1982; Graham and Willshaw 1997; Valiant 2006), especially those including temporal integration (Panchev and Wermter 2006) or feedback (Kirkland and Gerstein 1998; Xing and Gerstein 1996).

In particular, in the continuing debate opposing population codes based on firing rate with neural assembly code resulting from coincident spiking, *NTE* appears as a useful tool to investigate *neural assemblies* resulting from *firing rate* changes induced by temporal integration. Besides, results in Fig. 9*E* support the hypothesis of long temporal integration windows for some neurons even if small consecutive *NTE* values do not allow consistent conclusions. For extended studies of neural assemblies, it is likely that *NTE* can be used to complement the cross-correlation function.

Another important property of *NTE* dealing with neural assemblies concerns the conditioning to the past of *X*_{2} (*X*_{2}^{P}), in the case of . This conditioning cannot exclude common input that would provoke simultaneous activity in *X*_{2}^{F} and *X*_{1}^{P} because of a delay between *X*_{1} and *X*_{2}. Latency from the thalamus to a cortical cell is remarkably constant across the cortex (typically, ≈2 ms), despite the wide divergence of inputs from the thalamus (Salami et al. 2003). This common input would thus occur without latency differences in cortical cell pairs. Somehow, it appears difficult to exclude it if the connection arising from common input overwhelms the strength of the direct connection between the pair of neurons. As previously noticed, it is one reason that *NTE* estimates should preferably be interpreted as an information transfer than as a direct causal link. However, the conditioning will exclude all common information between *X*_{1}^{P} and *X*_{2}^{P}. Not only is this important in the context of integration of activities between neurons, but because τ_{p} is most often greater than the lag detected by cross-correlation method (Fig. 8*D*), this will partly exclude the influence of similar values for *X*_{1}^{P} and *X*_{2}^{P} that would occur if the lag was small and *X*_{1}^{P} and *X*_{2}^{P} were determined by only a common input. We indeed found that 20% of information transfer between the only *X*_{1}^{P} and *X*_{2}^{F} arises from a common history between *X*_{1}^{P} and *X*_{2}^{P} and is removed by conditioning to *X*_{2}^{P}.

### Technical choices

Applying information theory to any type of data always requires careful thinking about the parameters used. These parameters can indeed dramatically influence results and conclusions. We chose to directly use *Eq. 2* and the data available to estimate the transfer entropy, keeping a nonparametric environment. Some closed forms for *TE* may exist albeit dependent on the model considered for data. For instance, using notations introduced in methods, if *X*_{1} and *X*_{2} are Poisson processes, [*X*_{1}^{F}(*t*_{n})]_{n}, [*X*_{2}^{F}(*t*_{n})]_{n}, [*X*_{1}^{P}(*t*_{n})]_{n}, and [*X*_{2}^{P}(*t*_{n})]_{n} all follow a Poisson distribution. The computation of thus depends only on the model used for the coupling relations between these four random variables. However, to our knowledge, such models have never been seriously considered in the literature and are even suggested as a future challenge in information theory context (Brown et al. 2004). As a consequence, the theoretical distribution of *TE* appears unreachable at this time, similarly to several causality measures recently proposed in electrophysiology [directed coherence or DCOH (Saito and Harashima 1981); directed transfer function or DTF (Kaminski and Blinowska 1991); partial directed coherence or PDC (Sameshima 1999)].

A significance threshold for *TE* is also difficult to determine. One possibility is to use the work of Moddemeijer (1989), who basically noted that the histogram represents statistics following a multinomial distribution. He then proposed an approximation for the variance of the entropy estimate in the case of the histogram approximation of the density. Preliminary investigations adapting this idea to the statistic did not convince us of the robustness of such an approach, which too often gave significant values. We rather chose to normalize because a coefficient between 0 and 1 is easier to interpret, like that for correlation or coherence. In our case, *NTE* estimates the part of information conveyed by a channel that is independent of its own past but could be found in the past of another channel. The single statistic is indeed not comparable between channels because information conveyed by channels shows a high variability. We then computed the empirical distribution of *NTE* for spontaneous activity, which may be specific for the cat's auditory cortex. Nonetheless, we think that values >0.03 or 0.04 could indicate real information transfer, albeit a modest one. The putative exponential distribution model for *NTE* (Fig. 7*B*) should help to delineate threshold values in future studies showing different *NTE* averages.

Another choice is the use of the same value τ_{p} for both the own past of a channel and the past of the exogenous channel. Obviously, it would be preferable to dissociate them, but the computation cost of an additional parameter to current τ_{f} and τ_{p} on which to maximize the *NTE* would be extremely high. One must notice here that this statistic in its current state already requires careful programming to achieve results in a reasonable time. In fact, the computation speed essentially depends on the joint distribution computation and thus on the number of trials used to compute the shuffled estimate .

It is noted that similarities with the transfer entropy idea of conditioning with respect to the past of another spike train can already be found in the old “cross-intensity functions” (Cox and Lewis 1966; Perkel et al. 1967), although rarely used with neural data (Brown et al. 2004; Eggermont and Smith 1996), and in the nonlinear causality test of Baek and Brock (1992) improved by Hiemstra and Jones (1994). The mutual information between the synaptic input and the output spike train of a single neuron also was investigated by London et al. (2002) using a finite-order Markov model for sequences of activations. Transitional probabilities were estimated by means of a context-weighting tree representation of all possible models. Although complex, this method might represent an alternative to ours for entropy estimation, even if its ability to test several orders of memory and to manage a high number of pairwise combinations in a reasonable time remains to be proved.

### Physiological correlates of the results for spontaneous activity

It is not surprising that transfer information is relatively low between cortical neurons (most values are <0.15; Fig. 8*A*), somewhat similar to maximum levels of synchrony under spontaneous activity (Eggermont 1994). Several histological reasons provide evidence for the weak influence of one neuron on another one, even neighboring ones. Even if one neuron typically receives inputs from several thousands of other neurons [rough estimates of 7,800 for mouse (Braitenberg and Schüz 1998), 9,400 for pyramidal neurons in rat visual cortex (Hellwig 2000), and 24,000–80,000 for human cortex (Abeles 1991)], it is much smaller than the total number of neurons in the brain [1.6 × 10^{7} for mouse (Braitenberg and Schüz 1998), 10^{10} for humans (Abeles 1991)], even compared with the number of neurons that would be contained in the volume of the functional area of this neuron [75,000/mm^{3} in rat visual cortex (Hellwig 2000)]. Moreover, based on excitatory postsynaptic potential values in rat visual cortex (Song et al. 2005), around 26 presynaptic neurons would be needed to cause a postsynaptic action potential, a result that is within the estimate of between five and 300 (Abeles 1991).

Similarly, a decrease of transfer entropy and a fortiori synchrony with distance (Fig. 9, *D* and *F*) is consistent with anatomical findings. Hellwig estimated that 70% of synapses of layer 2/3 pyramidal neurons in rat visual cortex are contained in a cylinder-shaped volume of cortex, whose radius parallel to the cortical surface is 500 μm and height is 300 μm (Hellwig 2000). Other studies led to similar results (Gruner et al. 1974; Nicoll and Blakemore 1993). Histological studies of Liley and Wright (1994) and Hellwig (2000) also showed decreasing connection probability with cell separation within pyramidal and stellate neurons of layer 2/3, the probability being <0.2 when the distance is >500 μm. The estimated probability of connection is often even lower in electrophysiology studies, between 5 and 15% for neighbor neurons (Mason et al. 1991; Nicoll and Blakemore 1993; Thomson and Deuchars 1997). Given that the mean synaptic delay in cortex is 1.2 ms with a minimum of 0.5 ms (Mason et al. 1991; Nicoll and Blakemore 1990), it also appears clear that large values for τ_{p} (>10 ms) between two multiple single-unit recordings will be associated with distant connections and several synaptic intermediates. This will weaken the influence of the connection and the likelihood of similar activities, and so make *NTE* values decrease substantially (Fig. 9, *C* and *D*). Even if *NTE* and *XC* show a similar decrease with distance (Fig. 9, *D* and *F*), the variability observed between their values suggests that coincident spiking does not fully reveal the information transferred between neurons (Fig. 8*C*) and emphasizes the importance of part of the neural code based on temporal integration.

τ_{p} values reported in Fig. 9 already provide an insight in the windows of temporal integration potentially used in AI. Figure 9*E* shows that long windows (>20 ms) can be found even between neighboring sites (≪1.5 mm). Nevertheless, >80% of such observed windows are <15 ms. To our knowledge, most studies about potential temporal integration in auditory processing analyzed the responses to more or less complex stimuli, never under silence. For instance, some neurons in AI respond to brief periodic stimuli only for repetition rates <20–40 Hz (Eggermont 2002; Lu et al. 2001; Schreiner et al. 1997). Time reversal of short (<50-ms) segments in recorded speech does not affect its intelligibility (Saberi and Perrott 1999). The mutual information between some vocalizations and the neural firings in the ferret reached a maximum when the temporal resolution of analysis was between 10 and 40 ms (Schnupp et al. 2006). From awake marmoset monkey responses to periodic click trains, Wang et al. (2003) concluded that rapidly modulated signals would be integrated within a short-time window of about 20–30 ms. These observations suggest that temporal integration over 10 to 50 ms may occur when processing a more or less complex sound. These findings are thus completely in line with τ_{p} values mainly between 2 and 15 ms found during silence, their distribution stretching up to 40 ms (Fig. 9*C*). One underlying question concerns the variation of *NTE* and temporal integration windows under various stimulus conditions. Figure 7, *A* and *B* showed that more information may be transferred between some recording sites during specific stimuli such as Poisson and Meows, whereas the length of the window of temporal integration is not perfectly correlated with variations of *NTE*. In particular, the study of spontaneous activity may be of more interest than previously expected because some significant levels of information transfer, and so redundancy, can be found between several multiple single units (Figs. 7*A* and 8*A*), even when they are >1 mm apart (Fig. 9*C*). This preliminary result is intriguing and illustrates the potential of the method in understanding certain aspects of brain processing.

In conclusion, normalized transfer entropy or *NTE* has promising features that should make it useful for neural networks analysis. Based on information theory and an intuitive definition, *NTE* quantifies the influence in a nonrestricted sense that activity observed in one neuron, or multiple single units, has on another one. *NTE* has great potential interest for studies of temporal integration as part of the neural code. *NTE* is a coefficient between 0 and 1 that is easy to interpret and independent of firing rate. *NTE* may show variability under various stimuli conditions, allowing studies of neural assembly encoding of stimuli. *NTE* appears robust (one peak over τ_{f} and τ_{p}) and shows results complementary to cross-correlation. *NTE* allows studies of feedback in neural circuits. Obviously, further investigations on *NTE*, τ_{f} and τ_{p} values between different places of a sensory cortical area, and during stimulus presentation may be needed to reveal the potential of this tool and possibly help to understand brain processing. The present application showed that most temporal integration windows during spontaneous activity in cat's primary auditory cortex would extend from a few milliseconds to 15 ms.

## GRANTS

This work was supported by the Alberta Heritage Foundation for Medical Research, the National Sciences and Engineering Research Council, a Canadian Institutes of Health–New Emerging Team grant, and the Campbell McLaurin Chair for Hearing Deficiencies.

## Footnotes

The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “

*advertisement*” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

- Copyright © 2007 by the American Physiological Society