|
|
||||||||
The Journal of Neurophysiology Vol. 79 No. 3 March 1998, pp. 1135-1144
Copyright ©1998 by the American Physiological Society
Laboratory of Neuropsychology, National Institute of Mental Health, and Laboratory of Developmental Neurobiology, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland 20892
| |
ABSTRACT |
|---|
|
|
|---|
Gershon, Ethan D., Matthew C. Wiener, Peter E. Latham, and Barry J. Richmond. Coding strategies in monkey V1 and inferior temporal cortices. J. Neurophysiol. 79: 1135-1144, 1998. We would like to know whether the statistics of neuronal responses vary across cortical areas. We examined stimulus-elicited spike count response distributions in V1 and inferior temporal (IT) cortices of awake monkeys. In both areas, the distribution of spike counts for each stimulus was well described by a Gaussian distribution, with the log of the variance in the spike count linearly related to the log of the mean spike count. Two significant differences in response characteristics were found: both the range of spike counts and the slope of the log(variance) versus log(mean) regression were larger in V1 than in IT. However, neurons in the two areas transmitted approximately the same amount of information about the stimuli and had about the same channel capacity (the maximum possible transmitted information given noise in the responses). These results suggest that neurons in V1 use more variable signals over a larger dynamic range than IT neurons, which use less variable signals over a smaller dynamic range. The two coding strategies are approximately as effective in transmitting information.
Neurons in different regions of the visual system encode different aspects of visual stimuli. For example, neurons in V1 cortex respond strongly to an oriented bar, whereas those in inferior temporal (IT) cortex often require a more complex stimulus. We would like to know whether the differences in what is encoded are reflected in differences in the neuronal response as this may shed light on strategies for cortical processing. Specifically, we ask two questions. First, in what way does the statistical structure of responses differ across areas? Second, how do the differences, if any, affect information transmission?
Data set
We performed new analyses using previously published data. The data came from two studies of supragranular V1 complex cells, each study using two rhesus monkeys performing a simple fixation task (Kjaer et al. 1997
Relationship between mean spike count and its variance
For each cell, each stimulus produces a sample mean spike count, µi, and a sample variance in spike count, Fitting analytic distributions to the data
We seek a model for the conditional probability, P(n|s), of observing n spikes in response to stimulus s. We examined two widely used probability distributions Information measures
The information carried in a neuron's response about which member of a set of stimuli is present is defined as (Cover and Thomas 1991
We performed new analyses using previously published data from 42 V1 complex cells from two separate data sets (13 from V1 set 1 and 28 from V1 set 2) and 19 IT neurons (Eskandar et al. 1992 Log(variance) is linearly related to log(mean)
Various researchers have demonstrated a linear relation between the logarithm of the mean stimulus-elicited spike count and the logarithm of its variance in V1 neurons (Dean 1981 Modified Gaussian fits spike count data better than Poisson
We fit modified Gaussian distributions (as described in METHODS) and Poisson distributions to the empirical distribution of responses elicited by each stimulus. Sample fits are shown in Fig. 3. A
Information estimates using a modified Gaussian distribution
Because a modified Gaussian distribution modeled the data better than a Poisson distribution in all three data sets, we used the modified Gaussian to describe the conditional probabilities P(n|s) needed to compute transmitted information. We chose the mean and variance of the modified Gaussian in three ways: by using the observed mean together with the variance predicted by the mean-variance relation, by calculating the mean and variance directly from the data, and by using the mean and variance obtained from the fitting procedure. For comparison we also computed the information using an artificial neural network (Golomb et al. 1997
Channel capacity is approximately the same in V1 and IT
We can compute the channel capacity (assuming a spike count code) by finding the distribution of mean spike counts that yields the highest transmitted information (see METHODS). This requires knowing the minimum and maximum observed spike counts for each cell and the variability in spike count at each mean. The minimum and maximum come directly from the data; for the variability, we assumed that the probability of observing a particular spike count, P(n|µ), was given by a modified Gaussian distribution with mean µ, and with a variance predicted by a linear relation between log(variance) and log(mean).
In the INTRODUCTION, we posed two questions: in what way does the statistical structure of responses differ across areas and how do the differences, if any, affect information transmission? We found that the responses of neurons in V1 and IT cortex do indeed have different structures. The maximum spike count observed in V1 cortex neurons is generally much higher than that in IT cortex neurons. In addition, responses in V1 are much more variable than those in IT.
Transmitted information
Computation of the transmitted information between neural responses and a stimulus set requires that we choose a neural code. Here we chose the number of spikes in a window Channel capacity
An intrinsic drawback of transmitted information is that it depends on the frequencies with which stimuli are presented. This makes the value of the transmitted information somewhat arbitrary Calculating channel capacity
To calculate transmitted information, we need an estimate for the response distribution for each stimulus presented in an experiment. These distributions can be estimated directly from the data. Calculating channel capacity is more difficult, because it requires knowing the response distribution for all stimuli, not only those presented in a particular experiment. This problem can be overcome by sorting stimuli into groups based on the response distribution each evokes. Here we will call each such group of stimuli an equivalence class. The neuron cannot distinguish members of an equivalence class from one another. For example, otherwise identical stimuli of different colors produce the same response distribution in a cell insensitive to color. Therefore, rather than considering each stimulus separately, we work with the equivalence classes.
Comparison with other studies
In this study, the information transmitted by the spike count averaged ~1 bit/300 ms (3 bits/s). The channel capacity, although typically two to four times larger in any cell, is also not very large. Other investigators have reported significantly higher transmission rates. A recent preliminary report (Buracas et al. 1996 Questions raised
Presumably neurons in these two regions operate according to the same biophysical principles. How is it that the variance is lower in IT neurons than in V1 neurons? Does the larger dynamic range with larger variance offer some advantage that offsets the energy cost of higher firing rates? Finally, why don't all neurons use a large dynamic range with low variability?
![]()
INTRODUCTION
Abstract
Introduction
Methods
Results
Discussion
References
conflicting constraints the relative importance of which must be decided on a case-by-case basis. Here we have the additional problem that we want to compare brain regions that may use different coding schemes. Fortunately, for V1 and IT, the areas we consider here, it has been shown that spike count in a window ~300 ms wide carries most of the stimulus-related information
about 80% (Heller et al. 1995
). The remaining 20% of the information is carried in spike timing with an accuracy of ~30 ms in V1 and 60 ms in IT (Heller et al. 1995
). Because neurons in V1 fire at about twice the rate of those in IT (see RESULTS), the spike timing accuracy relative to the mean interspike interval is about the same in the two areas. Thus in this paper, we use the spike count as our neural code. This assumption greatly simplifies our calculations, although in principle everything we do here could be applied to coding schemes that include temporal variations.
).
; Tolhurst et al. 1981
, 1983
; van Kan et al. 1985
; Vogels et al. 1989
). We confirm the mean-variance relation in monkey V1, and we observe a similar relation in monkey IT cortex. We then go on to show that P(n|s) is well approximated by a modified Gaussian distribution (the main modification was truncation at 0; see METHODS for details) with mean, µ, that depends on the stimulus, s, and variance that depends only on the mean.
assuming a spike count code is used, and the observed dynamic range and mean-variance relations apply. Thus although neurons in V1 and IT implement different coding strategies, as reflected in the significantly larger variability and range of responses in V1 than in IT, neurons in the two areas are capable of transmitting about the same amount of information using a spike count code.
).
![]()
METHODS
Abstract
Introduction
Methods
Results
Discussion
References
; Richmond et al. 1990
), and from one study of neurons in area TE of IT cortex in two other monkeys performing a simple sequential nonmatch-to-sample task (Eskandar et al. 1992
). The stimuli were centered on V1 neuronal receptive fields, which were located in the lower contralateral visual field 1-3° from the fovea. The IT visual receptive fields were large and bilateral and included the fovea. Standard extracellular recording methods were used throughout.
), 128 stimuli were used: a set of 64 8 × 8 pixel patterns and their contrast-reversed counterparts. For V1 set 2 (Kjaer et al. 1997
), 16 stimuli were used: a set of 8 16 × 16 pixel patterns and their contrast-reversed counterparts. In both sets, the patterns covered the excitatory receptive field. At 3° eccentricity, the stimuli were ~2.5° on a side. For the IT experiments, 32 stimuli were used: 16 4 × 4 pixel patterns and their contrast-reversed counterparts. These patterns were 4° square and centered on the fixation point.

View larger version (42K):
[in a new window]
FIG. 1.
Walsh patterns. For V1 set 1, the 64 stimuli (A) and the corresponding contrast-reversed set were presented on the receptive fields while the monkey fixated. Stimuli were 2.5° on a side (covering the excitatory receptive field and some of the surround). For V1 set 2, the 8 stimuli (B) and the corresponding contrast-reversed set were presented on the receptive field while the monkey fixated. For inferior temporal (IT), the 4 × 4 set (16 stimuli) in the lower left corner of A and the corresponding contrast-reversed set were used as the monkey performed a nonmatch-to-sample task. The stimuli were 4° on a side and were centered at the point of fixation.
2i, where the subscript i labels stimulus. We use linear regression to fit the curve log
2 =b + m log µ to the set of points (µi,
2i). This results in a slope, m, and intercept, b, for each cell.
2) obtained by taking the logarithm of the sample mean and variance are biased and result in underestimation of the variance of response distributions and overestimation of transmitted information. We corrected for the bias using a Taylor series expansion; only a few terms are needed for good results. See Kendall and Stuart (1961)
, p. 4-6.
the Poisson distribution and a modified Gaussian distribution. The Gaussian distribution was modified by truncation to eliminate the negative portion followed by normalization. Such distributions have been considered for neural data before (Foldiak 1993
). The probability of seeing n spikes was taken to be the integral of this density function between n
1/2 and n + 1/2 (0 and 1/2 for n = 0). A
2 test was used to compare each of the analytic distributions to the histogram of experimentally observed spike counts. To have enough data for this analysis, only the responses to stimuli that had been presented
12 times to a given cell were considered.
1/2 and n + 1/2, we took the probability of observing n spikes, P(n|s), to be proportional to the Gaussian density evaluated at n. The constant of proportionality was chosen to ensure that the total probability summed to one. This alternative method resulted in negligible differences in all quantities we calculated.
)
where S is the set of stimuli s, R is the set of responses r, P(r|s) is the conditional probability of response r given stimulus s, P(s) is the probability that stimulus s occurred, and P(r) =
(1)
P(r|s)P(s) is the probability of response r. Equation 1 is general, but here we confine ourselves to the case where the response r is taken to be the number of spikes elicited by the stimulus. Thus in what follows, we replace P(r|s) with P(n|s) and P(r) with P(n) where n is the number of spikes.
), so we suspect that the lower bound we compute will not be far from the true maximum transmitted information.
where the notation s
(2)
[µ, µ +
µ] means restrict s to only those stimuli that produce a response the mean spike count of which lies between µ and µ +
µ and the sum over µ runs in increments of
µ. Equation 2 is exact; all we have done is order stimuli bythe mean spike count they produce. The next step is to replaceP(n|s
[µ, µ +
µ]) with P(n|µ). This also would be exact in the limit
µ
0 if the distribution of spike counts depended only on the mean. We show in the results that it is a good approximation to assume that the distribution of spike counts does depend only on the mean; in particular, it provides an estimate of the transmitted information that is consistent with estimates reached by other accepted methods. Thus we will adopt that approximation here.
[µ, µ +
µ]) with P(n|µ), we need to express P(µ) in terms of P(s). This can be done by noting that P(s) induces a probability distribution P(µ)
Then ignoring the error associated with the approximation P(n|s
(3)
[µ, µ +
µ])
P(n|µ), we write the probability of observing spike count n, averaged over all mean spike counts, as
where we replaced the sum over µ that appeared in Eq. 2 with an integral, valid in the limit of small
(4)
µ. Finally, we can rewrite Eq. 2 for I(S;R) in terms of probability distributions over nand µ
with P(µ) and P(n) given in Eqs. 3 and 4, respectively. Again we use an integral over µ rather than a sum.
(5)
(µ) =
µ+
µµdµP(µ).

View larger version (20K):
[in a new window]
FIG. 2.
Log(mean) vs. log(variance) regression. There were 128 stimuli for the V1 set 1 neuron (
), 16 stimuli for the V1 set 2 neuron (
), and 32 stimuli for the IT cell (
). Least-squares regression line for each data set is shown. This example shows the cell with the median slope from each data set.

View larger version (27K):
[in a new window]
FIG. 3.
Sample fits using Poisson and modified Gaussian distributions. A: cell from IT. B: cell from V1. Each row shows the histogram of responses to 1 of 32 (IT) or 128 (V1) stimuli, along with the best-fit modified Gaussian (left) and Poisson (right) distributions. Modified Gaussian provides a better fit, especially when the mean firing rate is large. Stimuli presented here were selected to show responses with a range of mean spike counts for each cell. Note that the scales for the 2 sets of graphs are different.
(µ) for all µ. The second constraint is implemented by requiring that
The third constraint is implemented by requiring that the distribution of spike counts be consistent with the observed data; that is, the distribution of means must not lead to a distribution of spike counts with many counts outside the observed range. Specifically, if nmin and nmax are the minimum and maximum observed spike counts over all stimuli for a particular cell, then we demand that
(6)
where P(n) is defined in Eq. 4, both C+(n) and C
(7)
(n) are nondecreasing functions of n, and
is small. Equation 7 ensures that P(n) falls off rapidly for spike counts outside the observed range. To implement the optimization procedure, we need to translate this into a constraint on
(µ) because the search for the maximum value of the transmitted information occurs in
(µ) space. Defining the function
and combining Eqs. 4 and 7, we arrive at
(8)
Equation 9 represents our third constraint. In practice, because expanding the range of spike counts increases transmitted information, we do not have to worry about our range being too small, only too large. Therefore, in Eq. 9, only the equality constraint is important.
(9)
To find the channel capacity, we minimize the function
where h1 and h2 are large constants. (h1 = 1012 and h2 = 1015 in the calculations presented here. Other large values for the constants give similar results.) The second and third terms of this expression are penalty functions that increase the value of F[
(10)
(µ)] when the second and third constraints are not met.
(µ)
0 combined with two linear constraints] is convex, and transmitted information is a concave function with respect to
(µ) (Cover and Thomas 1991
, p. 31). Therefore, we are guaranteed a single global minimum, and the gradient descent method must converge to that minimum.
![]()
RESULTS
Abstract
Introduction
Methods
Results
Discussion
References
; Kjaer et al. 1997
; Richmond et al. 1990
).
; Tolhurst et al. 1981
, 1983
; van Kan et al. 1985
; Vogels et al. 1989
). Using linear regression, we find such a relation for both our V1 complex cells and IT neurons (see Fig. 2).
2.42-1.45, 3/13 constants <0) and 0.60 (range
0.79-2.10,5/28 <0) in V1 set 1 and V1 set 2, respectively, and 0.31 (range
1.03-1.82, 5/13 <0) in IT.
2 test was used to evaluate the fits. The requirement that each response distribution analyzed be based on
12 presentations of the given stimulus excluded 7 of 13 of the neurons from V1 set 1. Three of 13 cells had enough presentations per stimulus for all stimuli, and three others had enough presentations for a few stimuli each, for a total of 433 response distributions. All cells from V1 set 2, and all cells from the IT set, had enough presentations for all stimuli.
2, by using the observed mean and (for the Gaussian) the variance predicted by the mean-variance regression, and by using the observed mean and (for the Gaussian) variance. The third method gave such poor results that we dropped it from consideration. The variance of responses to any given stimulus is a sample variance, and therefore is itself a random variable. The regression model uses the variances in response to all stimuli to estimate the variance of response to each stimulus. We believe this explains why the second method is so much more effective than the third method.

View larger version (20K):
[in a new window]
FIG. 4.
Chi-squared test of response distributions. Each bar shows the percent of response distributions for which the hypothesis that the data came from the Poisson or modified Gaussian distribution can be rejected (P = 0.05). Modified Gaussian distribution using the best-fit parameters is rejected less often than the distribution using the observed mean and variance calculated using the log(variance) vs. log(mean) regression, indicating that other factors probably influence the variance.
2 test based on the observed mean and predicted variance for the Gaussian fails more often than 5% of the time at P = 0.05 (6, 25, and 8% in V1 set 1, V1 set 2, and IT, respectively) suggests that factors other than those identified in this paper may influence the variance of the distributions.
; Heller et al. 1995
; Kjaer et al. 1994
).
the regression method
and the network method are nearly equal (Fig. 5). The second method always calculates higher values for transmitted information than the first method [mean difference = 0.047 ± 0.063 (SD) bits], and the third method calculates even higher values (mean difference = 0.072 ± 0.036 bits). These represented median percent differences of 8 and 20%, respectively.

View larger version (17K):
[in a new window]
FIG. 5.
Two methods for estimating transmitted information. The x axis shows the mean value calculated using the neural network (Kjaer et al. 1994
); the y axis shows the value calculated using the method described in the text. Values calculated using the 2 methods are nearly identical. All cells with enough data to allow analysis (60) are represented.
most information accumulated in just 50 ms (Fig. 6). Information in IT rose much more slowly, beginning to level off after ~150 ms. The early dip in transmitted information in cells in IT is due to latency effects: some stimuli elicit spikes earlier than others, and in small windows this produces information. Because we are using a spike count code, information is reduced as more of the stimuli elicit spikes. Information rises again as different spike counts become distinguishable. This is evidence that latency carries stimulus-related information in IT neuronal responses. Latency has been shown to carry stimulus-related information in V1 (Gawne et al. 1996
).

View larger version (16K):
[in a new window]
FIG. 6.
Transmitted information as a function of the counting window size. The x axis shows the time from stimulus presentation. IT starts later than V1 because it has a longer latency. The y axis shows the transmitted information accumulated from stimulus presentation (time 0) to the time indicated on the x axis. Information accumulates significantly more quickly in neurons from V1 than in neurons from IT.

View larger version (14K):
[in a new window]
FIG. 7.
Channel capacity as a function of the counting window size. The x axis shows the time from stimulus presentation. IT starts later than V1 because it has a longer latency. The y axis shows the channel capacity accumulated from stimulus presentation (time 0) to the time indicated on the x axis. Channel capacity rises more quickly in V1 than in IT, although the difference is not as pronounced as for transmitted information.

View larger version (10K):
[in a new window]
FIG. 8.
Distribution of mean responses (each corresponding to a stimulus equivalence class) that maximizes transmitted information. The x axis shows the means. The y axis shows the probability with which the means should occur to achieve channel capacity. Distribution here was calculated using integer means. As noted in the text, using a finer grid does not materially affect the results.
(which controls how many responses can lie outside the observed dynamic range; see METHODS) by a factor of 10 or used a constant instead of a quadratic weighting function [C+(n) = C
(n) = constant; see METHODS]. This did not change the resulting value of the channel capacity by >5% for any of the examples we considered. In addition, numerical implementation of the gradient descent requires that we discretize the probability distribution of the mean spike count. In our simulations, we used a bin size of one spike count so the means took on integer values. Again, to test robustness, in several cases, we decreased the bin size by a factor of 2 and saw little change.
![]()
DISCUSSION
Abstract
Introduction
Methods
Results
Discussion
References
330 ms wide. In the two areas we examined, V1 and IT, such a spike count code has been shown to carry ~80% of the information contained in the full neuronal response (Heller et al. 1995
). Thus the true transmitted information is ~25% higher than the values we report. Because we are comparing areas in which the downward bias caused by using an incomplete code is about the same, this bias should have virtually no effect on our conclusion that the two areas transmit about the same amount of information.
it almost always can be made either larger or smaller simply by changing the probabilities with which stimuli are presented. One could imagine adjusting stimulus probabilities to maximize the transmitted information. Shannon and Weaver (1949)
defined the resulting maximum value as the channel capacity. It is a function only of the conditional probability distribution P(r|s).
; Shannon and Weaver 1949
). Channel capacity, like transmitted information, depends on how we choose to interpret the cell's response, that is, on our assumption about the neural code. However, once we choose a code, the channel capacity is well defined. Because it is always possible that some other code would allow the cell to transmit more information than the code under examination, the channel capacity based on any given code is a lower bound on the amount of information that the cell can transmit. Because the spike-count code has been shown to carry ~80% of the stimulus-related information (Heller et al. 1995
), it provides a reasonable first approximation.
; Richmond et al. 1990
; Victor and Purpura 1996
). We predict that the increase in channel capacity when temporal modulation is taken into account will be proportional to the increase in transmitted information. If this is true, then the actual channel capacities will be ~25% larger than the values we calculated.
). The information capacity is the information present in the signal itself, subject to a model of the noise.
; Heller et al. 1995
; Kjaer et al. 1994
); the values obtained by the two methods are indistinguishable. Thus although there may be a distribution that fits these data better than the modified Gaussian, the modified Gaussian is a good model for the calculations we want to perform. A Poisson distribution, although often used to model responses, fit our experimental data poorly. Others have reached the same conclusion (Softky and Koch 1993
; Victor and Purpura 1996
).
; Tolhurst et al. 1981
, 1983
; van Kan et al. 1985
; Vogels et al. 1989
). If we know the mean of a response, we can calculate its variance. Therefore, any response distribution can be characterized by its mean.
; Rolls 1984
; Rolls et al. 1982
; Tolhurst et al. 1981
, 1983
; Vogels et al. 1989
). If new evidence does show that the dynamic range is larger than we observed, the channel capacity can be recalculated. The effects, however, are modest. We calculated the increase in channel capacity assuming that the maximum firing rate for each cell was 25% greater than the measured values. The median increase in channel capacity was 8.5% (range 0.8-20.9%).
). de Ruyter van Steveninck et al. (1997)
found less variance, and more information, in the responses of fly H1 cell to a moving coherent stimulus when the stimulus moved along a "presumably more naturalistic" two-dimensional trajectory than when the stimulus moved in one direction at constant speed. However, at least one study that looked for such differences in one monkey visual cortical area (MT) failed to find them (McAdams and Maunsell 1996
).
) indicates that the transmitted information rates of MT neurons in the monkey reach 30 bits/s with moving stimuli. de Ruyter van Steveninck et al. (1997)
estimated that the responses of the H1 neuron of the fly contain 2.43 bits/30 ms (~80 bits/s). Here we examine factors that may account for the differences in our results.
in MT.
) found that spike count transmitted ~80% of the information available in the full response. Therefore, if we accounted for temporal aspects of the signal, we could expect a 25% rise in transmitted information.
| |
ACKNOWLEDGEMENTS |
|---|
The authors thank Drs. Mike W. Oram and Karen D. Pettigrew for helpful discussion and comments on the manuscript.
Present addresses: E. D. Gershon, New York University School of Medicine and Center for Neural Science, New York, NY 10016; P. E. Latham, Dept. of Neurobiology, UCLA, Los Angeles, CA 90095.
| |
FOOTNOTES |
|---|
Address for reprint requests: B. J. Richmond, Laboratory of Neuropsychology, 49 Convent Dr., Bethesda, MD 20892-4415.
Received 25 September 1997; accepted in final form 1 December 1997.
| |
REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. A. Montemurro, S. Panzeri, M. Maravall, A. Alenda, M. R. Bale, M. Brambilla, and R. S. Petersen Role of Precise Spike Timing in Coding of Dynamic Vibrissa Stimuli in Somatosensory Thalamus J Neurophysiol, October 1, 2007; 98(4): 1871 - 1882. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Panzeri, R. Senatore, M. A. Montemurro, and R. S. Petersen Correcting for the Sampling Bias Problem in Spike Train Information Measures J Neurophysiol, September 1, 2007; 98(3): 1064 - 1072. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Banerjee, P. Series, and A. Pouget Dynamical Constraints on Using Precise Spike Timing to Compute in Recurrent Cortical Networks Neural Comput., April 1, 2007; 20(4): 974 - 993. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. La Camera, A. Rauch, D. Thurbon, H.-R. Luscher, W. Senn, and S. Fusi Multiple Time Scales of Temporal Response in Pyramidal and Fast Spiking Cortical Neurons J Neurophysiol, December 1, 2006; 96(6): 3448 - 3464. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. C. Motter Modulation of transient and sustained response components of V4 neurons by temporal crowding in flashed stimulus sequences. J. Neurosci., September 20, 2006; 26(38): 9683 - 9694. [Abstract] |