JN Add DOIs to your references at manuscript stage!
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J Neurophysiol 82: 2861-2875, 1999;
0022-3077/99 $5.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Wiener, M. C.
Right arrow Articles by Richmond, B. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Wiener, M. C.
Right arrow Articles by Richmond, B. J.

The Journal of Neurophysiology Vol. 82 No. 6 December 1999, pp. 2861-2875
Copyright ©1999 by the American Physiological Society

Using Response Models to Estimate Channel Capacity for Neuronal Classification of Stationary Visual Stimuli Using Temporal Coding

Matthew C. Wiener and Barry J. Richmond

Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland 20892-4415


    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
APPENDIX
REFERENCES

Wiener, Matthew C. and Barry J. Richmond. Using Response Models to Estimate Channel Capacity for Neuronal Classification of Stationary Visual Stimuli Using Temporal Coding. J. Neurophysiol. 82: 2861-2875, 1999. Both spike count and temporal modulation are known to carry information about which of a set of stimuli elicited a response; but how much information temporal modulation adds remains a subject of debate. This question usually is addressed by examining the results of a particular experiment that depend on the specific stimuli used. Developing a response model allows us to ask how much more information is carried by the best use of response strength and temporal modulation together (that is, the channel capacity using a code incorporating both) than by the best use of spike count alone (the channel capacity using the spike count code). This replaces dependence on a particular data set with dependence on the accuracy of the model. The model is constructed by finding statistical rules obeyed by all the observed responses and assuming that responses to stimuli not presented in our experiments obey the same rules. We assume that all responses within the observed dynamic range, even if not elicited by a stimulus in our experiment, could be elicited by some stimulus. The model used here is based on principal component analysis and includes both response strength and a coarse (±10 ms) representation of temporal modulation. Temporal modulation at finer time scales carries little information about the identity of stationary visual stimuli (although it may carry information about stimulus motion or change), and we present evidence that, given its variability, it should not be expected to do so. The model makes use of a linear relation between the logarithms of mean and variance of responses, similar to the widely seen relation between mean and variance of spike count. Responses are modeled using truncated Gaussian distributions. The amount of stimulus-related information carried by spike count in our data are 0.35 and 0.31 bits in primary visual and inferior temporal cortices, respectively, rising to 0.52 and 0.37 bits for the two-principal-component code. The response model estimates that the channel capacity is 1.1 and 1.4 bits, respectively, using the spike count only, rising to 2.0 and 2.2 bits using two principal components. Thus using this representation of temporal modulation is nearly equivalent to adding a second independent cell using the spike count code. This is much more than estimated using transmitted information but far less than would be expected if all degrees of freedom provided by the individual spike times carried independent information.


    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
APPENDIX
REFERENCES

It is not yet completely understood how information is encoded in neuronal spike trains and how much information is carried. In the visual system, it is clear that the number of action potentials elicited by a visual stimulus is an important part of the code for carrying stimulus-related information. There is now strong evidence that modulation of the firing rate during the course of the response to a stimulus presented for the period of a typical intersaccadic interval (~300 ms) carries additional information that is not available from average response strength alone (Heller et al. 1995; Richmond and Optican 1987, 1990; Tovee et al. 1993; Victor and Purpura 1996). Here we investigate how much information temporal modulation adds.

As often understood, the answer to this question depends on the experiment from which we take our data. Different sets of stimuli will elicit different sets of responses, which may encode more or less information using temporal modulation. Using response models can help us answer a question less tied to a particular experiment: how much more information can be carried by the best use of temporal modulation than by the best use of spike count alone? The maximum amount of information that can be carried by a neuron using a particular code (such as spike count, or spike count along with some representation of temporal modulation) is called its channel capacity (Cover and Thomas 1991; Shannon and Weaver 1949). The channel capacity is defined uniquely for a particular neuron using a particular code. Channel capacity's uniqueness comes at a price, however: to calculate channel capacity, we must know all possible responses that can be elicited from a neuron rather than only those observed in our experiment.

Our goal is to construct a response model that not only describes the responses observed in an experiment but also can be used to predict the responses that would be elicited by other stimuli. This substitutes dependence on the accuracy of the model for the limitations imposed by the amount of data typically collected in experiments. Ideally we would like this model to include all features of the response that do carry unique information, without complicating matters by including features that carry little or no unique information. Recently Gershon et al. (1998) presented an approach to constructing such a response model involving spike count only. The model used a widely known relation between mean and variance of spike count (Dean 1981; Lee et al. 1998; O'Keefe et al. 1997; Tolhurst et al. 1981, 1983; van Kan et al. 1985; Vogels et al. 1989), and the fact that distributions of spike count were well fit by a truncated Gaussian. This model allowed Gershon et al. (1998), given any mean spike count, to describe the set of responses giving rise to that mean. The set of responses corresponding to a given mean did not depend on whether the mean had actually been elicited by a stimulus in the experiment. Thus the model predicted the responses associated with any possible mean. Because any set of responses has a mean, the model described all sets of responses that could be elicited from the neuron. This allowed Gershon et al. to estimate the neuron's channel capacity using the spike count code.

Here we extend the approach of Gershon et al. (1998) to a code that includes the temporal patterns of rate modulation of the responses. We represent the neuronal responses using the first two principal components. The coefficient with respect to the first principal component is strongly correlated with spike count, and the coefficient with respect to the second principal component is a coarse (±10 ms) measure of the time course of a response, often indicating whether spikes tend to be concentrated at the beginning of a response (Richmond and Optican 1987, 1990). We identify relations among the means and variances of the principal component coefficients similar to the relation between mean and variance of spike count. We also find that distributions of the principal component coefficients are well fit by truncated Gaussian distributions. Now a set of responses must be described by two means---one for each component of the code. As in Gershon et al. (1998), this allows us to predict both the response strength and the temporal patterns of rate modulation for all possible sets of responses.

We estimated channel capacity on the basis of the response models for spike count and for the first two principal components. We found that the two-component code on average increases the channel capacity by 0.8-0.9 bits over that carried by spike count alone (1.1 and 1.4 bits, respectively) in neurons in the primary visual cortex (V1) and area TE of inferior temporal cortex. Therefore the contribution temporal modulation can make to neural information processing for identification of stationary stimuli, although far smaller than if all degrees of freedom in a spike train actually were used to carry such information, is substantially larger than would be estimated from only the responses seen in our experiment. This contrasts with situations in which rapidly changing stimuli drive precisely time-locked neuronal responses, which can lead to extremely high information transmission rates (Buracas et al. 1998). Finally, we present evidence that very little additional channel capacity for identifying stationary stimuli can be expected by using more principal components (that is, by representing the rate variation with higher precision).


    METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
APPENDIX
REFERENCES

Data sets

We performed new analyses using previously published data. The data came from two studies of supragranular V1 complex cells, each study using two rhesus monkeys performing a simple fixation task (Kjaer et al. 1997; Richmond et al. 1990), and from one study of neurons in IT cortex in two other monkeys performing a simple sequential nonmatch-to-sample task (Eskandar et al. 1992). In these three studies, the visual stimuli were two-dimensional black-and-white patterns based on the Walsh functions (Fig. 1).



View larger version (51K):
[in this window]
[in a new window]
 
Fig. 1. Walsh patterns. For the primary visual cortex (V1) set 1, the 64 stimuli (A) and the corresponding contrast-reversed set were presented on the receptive fields while the monkeys fixated. For V1 set 2, the 8 stimuli (B) and the corresponding contrast-reversed set were presented on the receptive field while the monkey fixated. For both, the stimuli were 2.5° on a side (covering the excitatory receptive field and some of the surround). For the inferior temporal cortex (IT), the 4 × 4 set (16 stimuli) in the bottom left corner of A and the corresponding contrast-reversed set were used as the monkey performed a nonmatch-to-sample task. Stimuli were 4° on a side and were centered at the point of fixation.

In both V1 studies, stimuli were presented centered on the neuronal receptive fields, located in the lower contralateral visual field 1-3° from the fovea. Stimuli covered the excitatory receptive field. At 3° eccentricity, the stimuli were ~2.5° on a side. In the first V1 study (V1 set 1) (Richmond et al. 1990), 128 stimuli were used: sixty-four 8 × 8 pixel patterns and their contrast-reversed counterparts. For V1 set 2 (Kjaer et al. 1997), 16 stimuli were used: a set of eight 16 × 16 pixel patterns and their contrast-reversed counterparts.

In the IT study (Eskandar et al. 1992), stimuli were presented centered on the fovea. The patterns were 4° square. Thirty-two stimuli were used: sixteen 4 × 4 pixel patterns and their contrast-reversed counterparts.

The stimulus was displayed for 320 ms in V1 and 352 ms in IT. To account for response latencies and to avoid contamination from off-responses, spikes were counted during the interval from 30 to 300 ms after stimulus onset for the V1 neurons and 50 to 350 ms after stimulus onset for the IT neurons. Gershon et al. (1998) found that both transmitted information and channel capacity based on the spike count code are stable with respect to small changes in these counting windows. For each neuron, each stimulus was presented approximately the same number of times (±2) in randomized order. Different neurons received different numbers of presentations. The number of presentations of each stimulus was between 10 and 50 in V1 and between 19 and 50 in IT. Seven V1 neurons with <10 trials per stimulus were omitted from these analyses to ensure stability of information estimates (Golomb et al. 1997; Wiener and Richmond 1998). The timing of events, including spikes, was recorded with 1-ms resolution.

Quantifying the responses

Each spike train was low-pass filtered by convolution with a Gaussian distribution with standard deviation of 5 ms and resampled at 1-ms resolution to create a spike density function (Fig. 2, A and B). Convolving and resampling in this way avoids a problem of binned histograms, which do not distinguish between spikes at the center of a bin and those near the edges (Richmond et al. 1987; Sanderson and Kobler 1976; Silverman 1986).



View larger version (30K):
[in this window]
[in a new window]
 
Fig. 2. Representation of spike trains using principal component analysis. A: rasters of the responses of a single V1 cell to 2 stimuli. Horizontal axis shows time from stimulus onset in milliseconds. Each tick mark represents an action potential; each row of tick marks represents a single stimulus-elicited response. Left and right: responses to different stimuli. Mean spike count is equal for the 2 sets of responses, but the responses to the 2nd stimulus (right) show a greater concentration of spikes occurring between ~50 and 120 ms after stimulus onset. B: spike density functions corresponding to the rasters above. Horizontal axis shows time from stimulus presentation (ms) on the same scale as for the rasters. Vertical axis shows the instantaneous firing rate at each time in spikes/s. Dark line shows mean firing rate at each time; the wider gray line shows one standard error above and below the mean. Concentration of spikes in the early part of the responses to the 2nd stimulus (right) is again apparent. C: 1st 4 principal components of the responses of the cell. Horizontal axis of each panel shows time, in milliseconds, after stimulus presentation. Note that a positive coefficient for the 2nd principal component indicates a high concentration of spikes near the beginning of a response. Scale of the principal components and the corresponding coefficients is arbitrary and therefore is omitted. D: principal component representation of the responses shown in the rasters and densities. Each bar plot shows the mean and standard deviation of the 1st 4 (left to right) principal component coefficients of the responses. Line connecting the bases of the bars shows 0. Note the difference in the 2nd principal component coefficients between the 2 panels. This difference reflects the fact that the spikes in the responses shown in the right are more concentrated at early times than are the spikes in the responses shown in the left. (In the calculations in the text the principal components are translated to always be positive; here we use principal components centered at 0 to emphasize the difference in the the coefficients with respect to the 2nd principal component.)

We used principal component analysis to reduce the dimension of the data set. Principal component analysis defines an ordered set of axes, each accounting for more of the variance in the data set than those that follow. These axes can be computed by finding the eigenvectors of the covariance matrix of the data (Ahmed and Rao 1975; Deco and Obradovic 1996). Each data point is defined uniquely by its projections onto these axes; these values are called the coefficients with respect to the principal components, or principal component coefficients. We use the code formed by the first k principal component coefficients because it is the optimal k-dimensional linear code for least-squares reconstruction and minimizes an upper bound of information loss through a one-layer network (Campa et al. 1995; Deco and Obradovic 1996; Plumbley 1991). Principal-component analysis has a long history in signal analysis (Ahmed and Rao 1975) and has been used to study information coded by temporal modulation in neuronal responses (Heller et al. 1995; Kjaer et al. 1994; McClurkin et al. 1991; Optican and Richmond 1987; Richmond and Optican 1987, 1990; Tovee and Rolls 1995; Tovee et al. 1993). Figure 2 shows rasters of the responses elicited by two stimuli from a single V1 neuron, the corresponding spike density functions, the first four principal components, Phi 1-Phi 4, of the responses from the neuron, and the principal component representation of the two responses.

In this study, each neuronal response was represented by the corresponding coefficients with respect to the first and second principal components of the spike density functions (Richmond and Optican 1987, 1990). We call these coefficients phi 1 and phi 2. phi 1 and phi 2 can be translated by arbitrary constants as long as the appropriate constant multiples of Phi 1 and Phi 2 are subtracted from the results. This manipulation is similar to rewriting the expression a + bx as a + bx0 + b(- x0). It is conventional to use the average waveform as the base waveform, translating the axes so that the average waveform corresponds to a vector of zeros. This causes some values of phi 1 and phi 2 to be negative. Here we translated the axes so that phi 1 and phi 2 are always positive and logarithms could be taken. The specific form of the new base waveform is not important for our analyses.

Model of response variability

Estimating channel capacity requires estimating all possible (phi 1, phi 2) response distributions. Our estimate of these distributions must be based on the responses elicited by stimuli actually presented in our experiments. It is important to note that if the responses elicited by the stimuli presented in our experiments are not representative of the responses elicited by other stimuli, our model will generalize poorly.

Ideally, we would estimate the two-dimensional conditional distribution of responses p(phi 1, phi 2|s) directly for each stimulus s. However, we did not have enough responses to each stimulus to do so. Instead, we described the distributions of phi 1 and phi 2 individually, and assumed that phi 2 was independent of phi 1 (except for correlations imposed by truncation at the bounds of the response space, see following text). To be precise
<IT>p</IT>(<IT>&phgr;<SUB>1</SUB>‖</IT><IT>s</IT>)<IT>=</IT><IT>G</IT>(<IT>&mgr;</IT><SUB><IT>1</IT><IT>s</IT></SUB><IT>, &sfgr;</IT><SUP><IT>2</IT></SUP><SUB><IT>1</IT><IT>s</IT></SUB>)<IT>/</IT><IT>K</IT><SUB><IT>1</IT></SUB>(<IT>&phgr;<SUB>1</SUB></IT>)<IT> when &phgr;<SUB>1</SUB>≥&phgr;</IT><SUP><IT>min</IT></SUP><SUB><IT>1</IT></SUB>

=0 otherwise (1)
and
<IT>p</IT>(<IT>&phgr;<SUB>2</SUB>‖</IT><IT>s</IT><IT>, &phgr;<SUB>1</SUB></IT>)<IT>=</IT><IT>G</IT>(<IT>&mgr;</IT><SUB><IT>2</IT><IT>s</IT></SUB><IT>, &sfgr;</IT><SUP><IT>2</IT></SUP><SUB><IT>2</IT><IT>s</IT></SUB>)<IT>/</IT><IT>K</IT><SUB><IT>2</IT></SUB><IT> when &phgr;</IT><SUP><IT>min</IT></SUP><SUB><IT>2</IT></SUB>(<IT>&phgr;<SUB>1</SUB></IT>)<IT>≤&phgr;<SUB>2</SUB>≤&phgr;</IT><SUP><IT>max</IT></SUP><SUB><IT>2</IT></SUB>(<IT>&phgr;<SUB>1</SUB></IT>)

=0 otherwise (2)
where G is the Gaussian distribution with indicated mean and variance, µjs and sigma js2 are the mean and variance of phi j elicited by stimulus s, phi 1min is the minimum value of phi 1, phi 2min (phi 1) and phi 2max (phi 1) are the calculated bounds on phi 2 for given phi 1, and K1 = int phi 1mininfinity dphi 1p(phi 1|s) and K2 = ∫<AR><R><C>&phgr;<SUP>max</SUP><SUB>2</SUB>(&phgr;<SUB>1</SUB>)</C></R><R><C>&phgr;<SUP>min</SUP><SUB>2</SUB>(&phgr;<SUB>1</SUB>)</C></R></AR>:dphi 2p(phi 2|s) are normalizing factors. We show in the results that the Gaussian distribution provides an acceptable fit to the distributions of phi 1 and phi 2 for each stimulus.

Although phi 1 and phi 2 are by construction uncorrelated in the responses from each neuron, correlations are sometimes present in the responses elicited by individual stimuli. In RESULTS, we observe that the set of responses is bounded; these bounds will be described quantitatively at that point. We observe correlations between phi 1 and phi 2 only in those distributions lying close to the response bounds. Therefore our model assumes that correlations between phi 1 and phi 2 arise only as a result of truncating distributions at the boundaries of the response space. We make this same assumption when characterizing the set of all possible response distributions to calculate channel capacity. We make this assumption because we do not have data to justify any other: under other circumstances (and with sufficient data) we might find, and include in a model, such correlations. The close match of our estimates of transmitted information to those obtained using other well-validated methods (see RESULTS) suggests that in this case the assumption is justified.

Means and variances of response distributions

Two-dimensional Gaussian distributions (given our correlation assumptions, preceding text) are characterized by two means and two variances. All four parameters can be measured from experimentally observed responses. In RESULTS, we show that the variances of phi 1 and phi 2 distributions are related to mean phi 1 by a power law
&sfgr;<SUP><IT>2</IT></SUP><SUB><IT>i</IT></SUB><IT>=exp</IT>(<IT>a<SUB>i</SUB></IT>)<IT>&mgr;</IT><SUP><IT>b<SUB>i</SUB></IT></SUP><SUB><IT>1</IT></SUB> (3)
which is equivalent to
log (&sfgr;<SUP><IT>2</IT></SUP><SUB><IT>i</IT></SUB>)<IT>=</IT><IT>a</IT><SUB><IT>i</IT></SUB><IT>+</IT><IT>b</IT><SUB><IT>i</IT></SUB><IT> log &mgr;<SUB>1</SUB></IT> (4)
where µi and sigma i2 are the mean and variance of phi i, for i = 1, 2. ai and bi are estimated by linear regression. These regressions were used to estimate sigma is2 for i = 1, 2 in Eqs. 1 and 2. Note that the variances of both phi 1 and phi 2 were modeled as a function of the mean of phi 1 (we justify this in RESULTS).

Estimates of log(µ) and log(sigma 2) obtained by taking the logarithm of the sample mean and variance are biased, resulting in underestimation of the variance of response distributions and therefore overestimation of transmitted information. We corrected for the bias using a Taylor series expansion; only a few terms are needed for good results (Kendall and Stuart 1961, p. 4-6).

We also used linear regression to estimate the mean and variance of the values of phi 2 when phi 1 takes on a given value, no matter which stimulus elicited the response.

Transmitted information and channel capacity

The information carried in a neuron's response about which member of a set of stimuli is present (Cover and Thomas 1991) is defined as
<IT>I</IT>(<IT>R</IT><IT>; </IT><IT>S</IT>)<IT>=</IT><LIM><OP>∑</OP><LL><IT>s</IT><IT>,</IT><IT>r</IT></LL></LIM> <IT>p</IT>(<IT>s</IT>)<IT>p</IT>(<IT>r</IT><IT>‖</IT><IT>s</IT>)<IT> log<SUB>2</SUB> </IT><FR><NU><IT>p</IT>(<IT>r</IT><IT>‖</IT><IT>s</IT>)</NU><DE><IT>p</IT>(<IT>r</IT>)</DE></FR> (5)
where S is the set of stimuli s, R is the set of responses r, p(r|s) is the conditional probability of response r given stimulus s, p(s) is the probability that stimulus s occurred, and p(r) Sigma s p(r|s)p(s) is the probability of response r. Equation 5 is general; it can be applied to responses r of any dimension. We used both the one-dimensional spike count code and the two-dimensional principal component code, r = phi  = (phi 1, phi 2).

The transmitted information I(Phi ;S) depends on p(s), the distribution of presentation probabilities of the stimuli (see Eq. 5). The channel capacity of a cell is the maximum value of transmitted information over all distributions p(s), where the set of stimuli S should now be understood to include all possible visual stimuli. Finding this maximum requires knowing the conditional response distributions p(r|s) for all stimuli s. We estimate the distributions using the model described earlier.

From Eqs. 1 and 2, we see that estimating p(phi |µ), the probability with which the distribution with mean µ = (µ1, µ2) elicits response phi  = (phi 1, phi 2), requires µ1, µ2, sigma 12, and sigma 22. Given the power-law relations between the means and variances of distributions (Eq. 3), µ1 determines both sigma 12 and sigma 22 but not µ2. Thus for each µ1 there are many possible response distributions, identical except for translation in phi 2. Each distribution, then, is characterized completely by the two means µ1 and µ2. When calculating transmitted information, only the observed distributions are considered (Fig. 3, top); when calculating channel capacity, all possible distributions must be considered (Fig. 3, bottom).



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 3. Schematic of distributions used to calculate transmitted information and channel capacity. Lines represent the boundaries of the phi 1-phi 2 envelope: region of phi 1-phi 2 space in which responses fall. Ellipses represent distributions of responses elicited by various stimuli. Top: calculating transmitted information in an experiment. Response distributions elicited by 4 stimuli, A, B, C, and D, are shown. (These stimuli are merely illustrative and do not correspond to any stimuli from our actual experiments.) Mean values of phi 1 and phi 2, µ1 and µ2, of each distribution can be calculated from the data, and the variances of phi 1 and phi 2, sigma 12 and sigma 22, estimated from the mean of phi 1 using the regression relations. Bottom: when considering channel capacity, we must consider all possible pairs of means µ1, µ2. Value of µ1 determines both sigma 12 and sigma 22, so all distributions with a given value of µ1 will have the same shape. Distributions elicited by A, B, and C (top) are shown, along with several other possible distributions with the same mean values of phi 1. (Distribution elicited by stimulus D is omitted for clarity.)

According to this model, two stimuli that elicit the same µ1 and µ2 from a neuron in fact elicit identical response distributions and therefore cannot be distinguished from one another by that neuron. Thus a stimulus can be identified simply by the mean response it elicits. Equation 5 now can be written
<IT>I</IT>(<IT>&PHgr;; </IT><IT>S</IT>)<IT>=</IT><LIM><OP>∫</OP></LIM> <IT>d</IT><IT>&mgr;</IT><IT>p</IT>(<IT>&mgr;</IT>) <LIM><OP>∫</OP></LIM> <IT>d</IT><IT>&phgr;</IT><IT>p</IT>(<IT>&phgr;‖&mgr;</IT>)<IT> log<SUB>2</SUB> </IT><FR><NU><IT>p</IT>(<IT>&phgr;‖&mgr;</IT>)</NU><DE><IT>p</IT>(<IT>&phgr;</IT>)</DE></FR> (6)
where p(µ) is the probability with which the distribution with mean µ is presented, p(phi |µ) is the probability with which the distribution with mean µ = (µ1, µ2) elicits response phi  = (phi 1, phi 2), and p(phi ) = int µ dphi p(phi |µ) is the total probability with which response phi  occurs. The channel capacity is the maximum of this expression over the two-dimensional distribution of probabilities p(µ) = p1, µ2).

A derivation of Eq. 6 is given in Gershon et al. (1998). That derivation is expressed in terms of the spike count code but remains valid when spike count is replaced with any other response code. Information and channel capacity based on the spike count code were calculated using the method of Gershon et al. (1998).

The search for the maximizing set of probabilities is subject to three constraints: the probabilities must be nonnegative; the probabilities must sum to one; and the range of means must be finite. The first two constraints arise from intrinsic properties of probability distributions. If the third constraint is violated, the transmitted information can be infinite and the problem of maximizing transmitted information is ill-posed. The implementation of these constraints and other numerical issues are discussed in the APPENDIX.

We did not penalize distributions for probability weight falling outside the response envelope bounds for phi 2 (as we did for probability weight falling outside the observed range of phi 1); we simply ignored that portion of the distribution. This allows the widest possible separation between distributions, causing our estimates of channel capacity to be larger than if the boundaries had been strictly enforced. Thus our procedure was designed to give the most generous estimates possible (consistent with the constraints estimated from the data) of channel capacity associated with temporal coding.


    RESULTS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
APPENDIX
REFERENCES

We considered 6 neurons from one experiment in V1, 29 neurons from a second experiment in V1, and 19 neurons from IT. To calculate transmitted information and channel capacity, we characterized the space of responses, parameterized by phi 1 and phi 2, the coefficients with respect to the first two principal components; determined the regression relations for the means and variances of phi 1 and phi 2 elicited by the stimuli; and constructed a model for the distributions of phi 1 and phi 2.

phi 1 and phi 2 are somewhat abstract, but are related to biologically relevant aspects of the response. Responses with high (low) values of phi 1 correspond to responses with high (low) numbers of spikes (McClurkin et al. 1991; Richmond and Optican 1987, 1990; Tovee et al. 1993); in our data, the median correlation between spike count and phi 1 was 0.91 (interquartile range 0.86-0.97). phi 2 is a coarse (±10 ms) measure of temporal modulation (Optican and Richmond 1987; Richmond and Optican 1987). It often characterizes whether or not spikes are concentrated in the early part of the response (Fig. 2).

Response space

Although the coefficients with respect to the first and second principal components (phi 1 and phi 2) are linearly uncorrelated by construction (Ahmed and Rao 1975; Deco and Obradovic 1996), they are not independent. Figure 4 shows scatterplots of phi 2 versus phi 1 for two cells from V1 and two cells from IT. In these four cells, and in all other cells in our sample, the range of phi 2 increased with increasing phi 1. The apex of the cone is the (phi 1, phi 2) pair representing trials when no spikes were elicited.



View larger version (37K):
[in this window]
[in a new window]
 
Fig. 4. Scatterplot of responses for 4 cells. For each panel, the horizontal axis shows phi 1 and the vertical axis shows phi 2. Each point shows the response (phi 1, phi 2) for a single trial. Although phi 1 and phi 2 are linearly uncorrelated (by construction), they are not independent: range of phi 2 depends on the value of phi 1.

The limited range of phi 2 for a given value of phi 1 reflects the fact that only a certain amount of temporal modulation is possible with a given number of spikes. If each of k spikes in a train could be placed in any of n time bins, assuming only that no bin could hold more than one spike, the number of possible patterns would be (kn) = n!/[k!(n - k)!]. The range of values of phi 2 seen under this combinatorial assumption (Fig. 5, thin line) is much wider than the range observed experimentally for any given number of spikes k (Fig. 5, dots).



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 5. Spike count-phi 2 envelope of a V1 cell. Horizontal axis shows spike count (which is strongly correlated with phi 1), the vertical axis phi 2. Dots show spike count and phi 2 for individual trials. Thick lines show the spike count-phi 2 envelope predicted using the linear relation between log(spike count) and log(variance) of phi 2 in responses with that spike count, and the quadratic relation between spike count and the mean of phi 2 in responses with that spike count, and the assumption that the distribution of phi 2 in responses with a given spike count is normal. Thin lines show the spike count-phi 2 envelope calculated by taking the largest (or smallest) possible projection onto the phi 2 waveform of responses with a given number of spikes. More extreme estimate does not take into account when spikes are most likely to occur or even that it is unlikely that many spikes will occur at nearly the same time. Spike count is used here instead of phi 1 for ease of comparison with the extreme bounds.

Because we do not understand the spike generation process well enough to predict the response envelope, we developed a statistical model of the mean and variance of distributions of phi 2 corresponding to different values of phi 1. We divided the range of phi 1 into a number of bins, each of which defines a phi 1 slice in the two-dimensional response space. Figure 6A shows the top left scatterplot from Fig. 4 with slice boundaries superimposed. The mean and variance of phi 2 were calculated in each phi 1 slice. Figure 6B shows an example of the linear relation between log(variance) of phi 2 and log(mean) of phi 1 in each phi 1 slice. This relation was significant in 47/54 neurons at the P < 0.01 level (median r2 = 0.91, iqr = 0.79-0.94). Figure 6C shows an example of a fit of the mean of phi 2 as a quadratic function of the mean of phi 1 in each slice. The quadratic fit was rejected in only 3/54 neurons (chi 2 test, p <=  0.01) and is used here.



View larger version (50K):
[in this window]
[in a new window]
 
Fig. 6. Predictors of the mean and variance of phi 2 for a given range of phi 1. A: dividing the range of phi 1 into "slices". Top left scatterplot from Fig. 4 is reproduced with the boundaries of the phi 1 slices superimposed. , (phi 1, phi 2) for a single response. B: log(variance) of phi 2 values in phi 1 slices as a linear function of log(mean) phi 1. Horizontal axis shows the mean phi 1 value in each slice. Vertical axis shows the variance of phi 2 in each slice. , values in a single phi 1 slice. C: mean of phi 2 in phi 1 slices as a quadratic function of phi 1. Horizontal and vertical axes show the mean values of phi 1 and phi 2 respectively. , values in a single phi 1 slice. In both B and C,  represent points based on >= 11 responses. Points based on fewer responses are represented by the number of responses. Regression lines were calculated using only means and variances based on >= 11 responses. Note that the axes in A and C are linear, whereas the axes in B are logarithmic.

We used these relations to estimate the bounds of the envelope containing the response space by <A><AC>&mgr;</AC><AC>ˆ</AC></A>phi 2(phi 1) ± k±<A><AC>&sfgr;</AC><AC>ˆ</AC></A>phi 2(phi 1), where <A><AC>&mgr;</AC><AC>ˆ</AC></A> and <A><AC>&sfgr;</AC><AC>ˆ</AC></A> are the mean and standard deviation estimated as in the preceding text. For each cell, k+ (k-) was chosen so that all but 0.1% of the points fell below (above) the boundary. Several points were allowed to fall outside the bounds to prevent excessive influence of outliers. We show below that small changes in the response envelope do not affect our results. For all 54 neurons, the response space envelope estimated using the regression (thick lines in Fig. 5) was much narrower than the envelope assuming spikes can fall in arbitrary 1-ms bins (the thin lines in Fig. 5).

Mean-variance relations of phi 1 and phi 2 by stimulus

The logarithms of the mean and variance of spike count in response to different stimuli are linearly related (Dean 1981; Gershon et al. 1998; Lee et al. 1998; O'Keefe et al. 1997; Tolhurst et al. 1981, 1983; van Kan et al. 1985; Vogel et al. 1989). Because phi 1 is correlated strongly with spike count (Richmond and Optican 1987), it is natural that the logarithms of the mean and variance of phi 1 also are related linearly (in 52/54 neurons at P < 0.01; median r2 = 0.79, iqr = 0.61-0.91; see Fig. 7). By simple extension, we might expect there to be a linear relationship between log(mean) and log(variance) of phi 2 as well. However, such a relation existed in only 11/54 of the neurons (P < 0.01; median r2 = 0.09, iqr = 0.02-0.21). Instead, we found that the variance of phi 2 elicited by a single stimulus increased with the mean of phi 1 (in 43/54 neurons at P < 0.01; median r2 = 0.66, iqr = 0.38-0.84; see Fig. 8). This is consistent with the fact that the range (variance) of phi 2 increases with increasing phi 1 (Fig. 4). Adding mean(phi 2) to the model added very little explanatory power (median increase in r2 = 0.02, iqr = 0.0-0.09); for simplicity, we omitted mean(phi 2) from the model. Using these regressions, we predicted the variance of phi 1 and phi 2 distributions elicited by different stimuli from the means of the phi 1 distributions.



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 7. Log(mean) vs. log(variance) of the distributions of phi 1 elicited by different stimuli. Horizontal and vertical axes show (on a logarithmic scale) the mean and variance of phi 1 respectively. Each point represents the responses to a single stimulus. There were 128 stimuli for the V1 set 1 neuron (open circle ), 16 stimuli for the V1 set 2 neuron (), and 32 stimuli for the IT cell (black-triangle). Least-squares regression line for each data set is shown. This example shows the cell with the median slope from each data set.



View larger version (15K):
[in this window]
[in a new window]
 
Fig. 8. Variance of phi 2 is better predicted by mean of phi 1 than by mean of phi 2. Top: variance of phi 2 by stimulus plotted against mean phi 1 by stimulus. Horizontal axis shows mean phi 1 and the vertical axis shows variance phi 2. Bottom: variance of phi 2 by stimulus plotted against mean phi 2 by stimulus. Horizontal axis shows mean phi 2 and the vertical axis shows variance phi 2. Both: axes are logarithmic. , responses elicited by a single stimulus. Regression lines are superimposed.

To check that our regression estimates were robust, we divided the data for each neuron into two sets. The regressions for the two halves were statistically indistinguishable (P > 0.01) for all 54 neurons. The fact that different subsets of the data result in indistinguishable regression lines shows that the model can generalize, and supports using the model to predict the structure of responses to stimuli other than those presented in our experiments.

Distributions of principal component coefficients

Estimating the transmitted information between the visual stimuli and the neuronal responses requires estimating the conditional response distributions p(phi 1|s) and p(phi 2|s). We assumed that the phi 1 and phi 2 distributions are separable (except for the relation between phi 1 and the range of phi 2) and examined normal, lognormal, and gamma distributions, each truncated at the bounds of the response space, as models for the phi 1 and phi 2 distributions.

Using the observed mean and the variance estimated from the regression relation, the normal distribution (truncated at the boundary of the response space) was an acceptable fit for 79% of the distributions of phi 1 elicited by individual stimuli, 83% of the distributions of phi 2 elicited by individual stimuli, and 79% of the distributions of phi 2 given phi 1 regardless of stimulus (Kolmogorov-Smirnov test, P < 0.05). The gamma distribution was acceptable in almost exactly the same cases as the normal distribution, but the lognormal distribution was rejected much more frequently. We chose to use the normal distribution. Because the gamma distribution fit nearly as well as the normal, the information calculations presented in the following text were repeated using the gamma distribution for several cells; the results were indistinguishable from those obtained using the normal distribution.

Transmitted information

In this study, our goal was to estimate channel capacity, not transmitted information (which has been estimated for these data in the past) (see Eskandar et al. 1992; Heller et al. 1995; Kjaer et al. 1997; Richmond et al. 1990). Transmitted information was calculated as a test of our model: estimates based on the model can be compared with estimates obtained using a previously validated neural network method (Golomb et al. 1997; Kjaer et al. 1994). If the assumptions in our model are reasonable, the two methods should give similar results. This has been shown to be the case when spike count is used as the neural code (Gershon et al. 1998); we found that the two methods also give similar results for the (phi 1, phi 2) code used here. The least-squares line relating the two information estimates had intercept 0.04 and slope indistinguishable from 1. The r2 value was 0.94. The difference between the two measurements was of the order of magnitude by which Golomb et al. (1997) found that the neural network underestimates transmitted information with limited numbers of samples. Finding that the information measurements using the two methods are similar led us to believe that the assumptions in our model are reasonable and that the model can be used to estimate channel capacity.

Figure 9 shows, for each neuron, the transmitted information using (phi 1, phi 2) as the neural code plotted against transmitted information using spike count as the neural code. There was no significant difference in the information transmitted using the spike count code by neurons in V1 and IT (V1: 0.35 bits median, interquartile range 0.27-0.55; IT: 0.31 bits median, interquartile range 0.18-0.39; P > 0.01 Kruskal-Wallis). Although the increase in transmitted information from the spike count code to the (phi 1, phi 2) code is significantly larger in V1 than in IT (P < 0.01), the difference in information transmitted using the (phi 1, phi 2) code is still not significant (V1: 0.52 bits median, interquartile range 0.40-0.78; IT: 0.37 bits median, interquartile range 0.22-0.55; P > 0.01). This represents an increase in transmitted information of 55% (median; interquartile range 22-74%) for neurons in V1, and 19% (median; interquartile range 9-41%) for neurons in IT. Transmitted information using phi 1 (which is correlated strongly with spike count) as the code was 12% (median; interquartile range 3-29%) greater than transmitted information using spike count in V1 neurons and 5% (median; interquartile range -1-11%) greater in IT.



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 9. Transmitted information using spike count or two principal components. Horizontal axis shows transmitted information based on spike count. Vertical axis shows transmitted information based on a code using both phi 1 and phi 2. ---, equality.

To check that we did not lose stimulus-related information by smearing spike arrival times with too broad a convolution kernel, we repeated the information calculations with responses smoothed using a Gaussian kernel with a standard deviation of 1 ms rather than 5 ms. No additional information was found.

Channel capacity

Figure 10 shows, for each neuron, the channel capacity using (phi 1, phi 2) as the neural code plotted against channel capacity using spike count as the neural code. There was a small but significant difference between the channel capacities using spike count code of neurons in V1 and IT (V1: 1.1 bits median; interquartile range 0.91-1.30; IT: 1.4 bits median; interquartile range 1.2-1.5; P < 0.01 Kruskal-Wallis). There was no significant difference in channel capacity using the (phi 1, phi 2) code (V1: 2.0 bits median; interquartile range 1.8-2.3; IT: 2.2 bits median; interquartile range 1.8-2.5). This represents an increase in channel capacity of 84% (median; interquartile range 62-124%) for neurons in V1 and an increase of 52% (median; interquartile range 32-95%) for neurons in IT. This increase in channel capacity is a result of temporal modulation and is larger than estimated using only the observed responses (transmitted information). Channel capacity using phi 1 (which was correlated strongly with spike count) as the code differed from channel capacity using spike count as the code by 7% (median; interquartile range -7-22%).



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 10. Channel capacity using spike count or 2 principal components. Horizontal axis shows channel capacity based on spike count alone. Vertical axis shows channel capacity based on a code using phi 1 and phi 2. ---, equality. Channel capacity using (phi 1, phi 2) is always greater than channel capacity using spike count alone.

We performed several analyses to verify that our estimates of channel capacity are robust with respect to small changes in the response space boundaries. As for spike count code (Gershon et al. 1998), channel capacity depends on the range of responses the cell is capable of emitting in response to a stimulus. If we underestimate a neuron's dynamic range, we will underestimate its channel capacity. Here we estimated the neuron's dynamic range based on the responses observed. It is possible that we have not used stimuli that elicit the highest possible firing rates (and so phi 1) from these neurons. Nonetheless the peak firing rates we saw in these V1 and IT neurons are similar to those reported by others using a wide variety of stimuli, including natural stimuli (Baddeley et al. 1997; Perrett et al. 1984; Rolls 1984; Rolls et al. 1982; Tolhurst et al. 1981, 1983; Vogel et al. 1989), so we believe that our estimate of the dynamic range is reasonable. For several neurons, we examined the effect of allowing part of the distribution of phi 1 to fall outside the observed dynamic range. When we allowed as little as 0.5% or as much as 5% of the distribution of phi 1 to fall outside the observed range, estimated channel capacity changed by <4%. Similarly, widening the bounds on phi 2 for given phi 1 by 5% increased the channel capacity by <3%. If new evidence were to show that the proper range for either phi 1 or phi 2 is larger than we have estimated here, channel capacity could be recalculated using these methods.

Distribution of responses achieving channel capacity

We estimated channel capacity by finding the distribution (in 2 dimensions) of mean (phi 1, phi 2) responses that allows the cell to transmit the maximum possible information using a code based on phi 1 and phi 2.

Figure 11A shows an example of such a distribution. The horizontal and vertical axes show means of phi 1 and phi 2, respectively. Shades of gray indicate how frequently each mean is presented to achieve channel capacity. Because some of these distributions are quite broad, the distribution of observed responses arising from this distribution of mean responses is diffuse (not shown). The projection of the (phi 1, phi 2) distribution onto phi 1, that is, the distribution of mean phi 1 implied by the two-dimensional distribution, is shown as a histogram immediately below. Figure 11B shows the distribution of mean phi 1 values that achieves channel capacity using phi 1 alone as the neural code. In both histograms the horizontal axis shows mean phi 1 values, and the vertical axis shows the frequency with which distributions with the appropriate mean value are presented to achieve channel capacity. The projection of the optimal two-dimensional distribution onto the phi 1 axis is less concentrated than the optimal one-dimensional distribution.



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 11. Distributions of responses that achieve channel capacity. A: distribution of mean (phi 1, phi 2) pairs that achieves channel capacity based on phi 1 and phi 2. Mean values of phi 1 and phi 2 are plotted on the horizontal and vertical axes; shades of gray show the probability with which distributions with the given means must be presented to achieve channel capacity. Note that the means of the distributions are clustered near opposite edges of the response space. Because some of these distributions are broad, however, the resulting distribution of observed responses is quite diffuse. A histogram of the projection of the distribution of phi 1 and phi 2 onto phi 1 alone is shown below. B: distribution of mean phi 1 that achieves channel capacity using phi 1 alone as the neural code. Mean values of phi 1 are shown on the horizontal axis. Probability with which each mean phi 1 must be presented to achieve channel capacity is shown on the vertical axis. Projected distribution shown in A is less concentrated than this distribution based on the phi 1 code alone. Two-dimensional distribution loses some phi 1 information because the resulting distribution of phi 1 is not optimal. This loss of information is offset by a gain in information from phi 2.

Role of further principal components

Throughout this study we limited our analysis to two principal components. The reason was practical: using a three-component code increases the computational burden beyond the resources currently available to us. We can, however, address the issue of whether the use of more principal components in the response representation can be expected to lead to substantially higher estimates of information or channel capacity. Successive principal components, by definition, account for successively smaller portions of the response variance, that is, their range decreases. This decrease is rapid in our data. Therefore their information content (and so their contribution to channel capacity) must decline unless the noise associated with each principal component also decreases with the range. Figure 12 shows that this decrease does not happen in our data. Each column shows the distribution (across 54 cells) of the signal-to-noise ratio (the variance of mean responses to stimuli divided by the median variance of responses to single stimuli) for spike count or one of the first 10 principal components. A signal-to-noise ratio less than one means that the variability of responses to a given stimulus is greater than the variability of mean responses to different stimuli, so the responses distinguish only poorly among the stimuli that elicited them. Thus the third principal component will contribute much less channel capacity than the first and second principal components, and the fourth and higher principal components are expected to contribute insignificantly if at all. We verified that, as expected, the fifth through tenth principal components contribute no information not redundant with information in the first principal component. Thus our code using only phi 1 and phi 2 should carry a very large proportion of the information available in the responses.



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 12. Fourth and higher principal components of responses are noisy. Boxplot showing the distribution (across 54 cells) of the signal-to-noise ratio for spike count and for the first 10 principal component coefficients. The signal-to-noise ratio is the variance of mean responses to stimuli divided by the median variance of responses to single stimuli. Notches in each box show 95% confidence intervals on the median; the edges of the box show the 25th and 75th percentiles of the distribution, and the ends of the extended lines show the 5th and 95th percentiles. A value below 1 means that the variability of responses to a given stimulus is greater than the variability of mean responses to different stimuli, so the responses distinguish poorly among the stimuli that elicited them. Therefore principal components above the 4th are not expected to carry information.


    DISCUSSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
APPENDIX
REFERENCES

Here we have constructed a model of neuronal responses, based on principal component analysis, that includes both response intensity (firing rate) and a low-precision (±10 ms) representation of temporal modulation. This temporal precision and this code have been shown to carry a very large portion of the information useful for the identification of statically presented two-dimensional stimuli (Heller et al. 1995; McClurkin et al. 1991; Optican and Richmond 1987; Richmond and Optican 1990; Tovee et al. 1993