|
|
||||||||
The Journal of Neurophysiology Vol. 82 No. 6 December 1999, pp. 2861-2875
Copyright ©1999 by the American Physiological Society
Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland 20892-4415
| |
ABSTRACT |
|---|
|
|
|---|
Wiener, Matthew C. and Barry J. Richmond. Using Response Models to Estimate Channel Capacity for Neuronal Classification of Stationary Visual Stimuli Using Temporal Coding. J. Neurophysiol. 82: 2861-2875, 1999. Both spike count and temporal modulation are known to carry information about which of a set of stimuli elicited a response; but how much information temporal modulation adds remains a subject of debate. This question usually is addressed by examining the results of a particular experiment that depend on the specific stimuli used. Developing a response model allows us to ask how much more information is carried by the best use of response strength and temporal modulation together (that is, the channel capacity using a code incorporating both) than by the best use of spike count alone (the channel capacity using the spike count code). This replaces dependence on a particular data set with dependence on the accuracy of the model. The model is constructed by finding statistical rules obeyed by all the observed responses and assuming that responses to stimuli not presented in our experiments obey the same rules. We assume that all responses within the observed dynamic range, even if not elicited by a stimulus in our experiment, could be elicited by some stimulus. The model used here is based on principal component analysis and includes both response strength and a coarse (±10 ms) representation of temporal modulation. Temporal modulation at finer time scales carries little information about the identity of stationary visual stimuli (although it may carry information about stimulus motion or change), and we present evidence that, given its variability, it should not be expected to do so. The model makes use of a linear relation between the logarithms of mean and variance of responses, similar to the widely seen relation between mean and variance of spike count. Responses are modeled using truncated Gaussian distributions. The amount of stimulus-related information carried by spike count in our data are 0.35 and 0.31 bits in primary visual and inferior temporal cortices, respectively, rising to 0.52 and 0.37 bits for the two-principal-component code. The response model estimates that the channel capacity is 1.1 and 1.4 bits, respectively, using the spike count only, rising to 2.0 and 2.2 bits using two principal components. Thus using this representation of temporal modulation is nearly equivalent to adding a second independent cell using the spike count code. This is much more than estimated using transmitted information but far less than would be expected if all degrees of freedom provided by the individual spike times carried independent information.
| |
INTRODUCTION |
|---|
|
|
|---|
It is not yet completely understood how
information is encoded in neuronal spike trains and how much
information is carried. In the visual system, it is clear that the
number of action potentials elicited by a visual stimulus is an
important part of the code for carrying stimulus-related information.
There is now strong evidence that modulation of the firing rate during
the course of the response to a stimulus presented for the period of a
typical intersaccadic interval (~300 ms) carries additional
information that is not available from average response strength alone
(Heller et al. 1995
; Richmond and Optican 1987
,
1990
; Tovee et al. 1993
; Victor and
Purpura 1996
). Here we investigate how much information temporal modulation adds.
As often understood, the answer to this question depends on the
experiment from which we take our data. Different sets of stimuli will
elicit different sets of responses, which may encode more or less
information using temporal modulation. Using response models can help
us answer a question less tied to a particular experiment: how much
more information can be carried by the best use of temporal modulation
than by the best use of spike count alone? The maximum amount of
information that can be carried by a neuron using a particular code
(such as spike count, or spike count along with some representation of
temporal modulation) is called its channel capacity (Cover and
Thomas 1991
; Shannon and Weaver 1949
). The
channel capacity is defined uniquely for a particular neuron using a
particular code. Channel capacity's uniqueness comes at a price,
however: to calculate channel capacity, we must know all possible
responses that can be elicited from a neuron rather than only those
observed in our experiment.
Our goal is to construct a response model that not only describes the
responses observed in an experiment but also can be used to predict the
responses that would be elicited by other stimuli. This substitutes
dependence on the accuracy of the model for the limitations imposed by
the amount of data typically collected in experiments. Ideally we would
like this model to include all features of the response that do carry
unique information, without complicating matters by including features
that carry little or no unique information. Recently Gershon et
al. (1998)
presented an approach to constructing such a
response model involving spike count only. The model used a widely
known relation between mean and variance of spike count (Dean
1981
; Lee et al. 1998
; O'Keefe et al.
1997
; Tolhurst et al. 1981
, 1983
; van Kan
et al. 1985
; Vogels et al. 1989
), and the fact
that distributions of spike count were well fit by a truncated
Gaussian. This model allowed Gershon et al. (1998)
,
given any mean spike count, to describe the set of responses giving
rise to that mean. The set of responses corresponding to a given mean
did not depend on whether the mean had actually been elicited by a
stimulus in the experiment. Thus the model predicted the responses
associated with any possible mean. Because any set of responses has a
mean, the model described all sets of responses that could be elicited
from the neuron. This allowed Gershon et al. to estimate the neuron's
channel capacity using the spike count code.
Here we extend the approach of Gershon et al. (1998)
to
a code that includes the temporal patterns of rate modulation of the responses. We represent the neuronal responses using the first two
principal components. The coefficient with respect to the first
principal component is strongly correlated with spike count, and the
coefficient with respect to the second principal component is a coarse
(±10 ms) measure of the time course of a response, often indicating
whether spikes tend to be concentrated at the beginning of a response
(Richmond and Optican 1987
, 1990
). We identify relations
among the means and variances of the principal component coefficients
similar to the relation between mean and variance of spike count. We
also find that distributions of the principal component coefficients
are well fit by truncated Gaussian distributions. Now a set of
responses must be described by two means
one for each component of the
code. As in Gershon et al. (1998)
, this allows us to
predict both the response strength and the temporal patterns of rate
modulation for all possible sets of responses.
We estimated channel capacity on the basis of the response models for
spike count and for the first two principal components. We found that
the two-component code on average increases the channel capacity by
0.8-0.9 bits over that carried by spike count alone (1.1 and 1.4 bits,
respectively) in neurons in the primary visual cortex (V1) and area
TE of inferior temporal cortex. Therefore the contribution
temporal modulation can make to neural information processing for
identification of stationary stimuli, although far smaller than if all
degrees of freedom in a spike train actually were used to carry such
information, is substantially larger than would be estimated from only
the responses seen in our experiment. This contrasts with situations in
which rapidly changing stimuli drive precisely time-locked neuronal
responses, which can lead to extremely high information transmission
rates (Buracas et al. 1998
). Finally, we present
evidence that very little additional channel capacity for identifying
stationary stimuli can be expected by using more principal components
(that is, by representing the rate variation with higher precision).
| |
METHODS |
|---|
|
|
|---|
Data sets
We performed new analyses using previously published data. The
data came from two studies of supragranular V1 complex cells, each
study using two rhesus monkeys performing a simple fixation task
(Kjaer et al. 1997
; Richmond et al.
1990
), and from one study of neurons in IT cortex in two other
monkeys performing a simple sequential nonmatch-to-sample task
(Eskandar et al. 1992
). In these three studies, the
visual stimuli were two-dimensional black-and-white patterns based on
the Walsh functions (Fig. 1).
|
In both V1 studies, stimuli were presented centered on the neuronal
receptive fields, located in the lower contralateral visual field
1-3° from the fovea. Stimuli covered the excitatory receptive field.
At 3° eccentricity, the stimuli were ~2.5° on a side. In the
first V1 study (V1 set 1) (Richmond et al. 1990
), 128 stimuli were used: sixty-four 8 × 8 pixel patterns and their
contrast-reversed counterparts. For V1 set 2 (Kjaer et al.
1997
), 16 stimuli were used: a set of eight 16 × 16 pixel
patterns and their contrast-reversed counterparts.
In the IT study (Eskandar et al. 1992
), stimuli were
presented centered on the fovea. The patterns were 4° square.
Thirty-two stimuli were used: sixteen 4 × 4 pixel patterns and
their contrast-reversed counterparts.
The stimulus was displayed for 320 ms in V1 and 352 ms in IT. To
account for response latencies and to avoid contamination from
off-responses, spikes were counted during the interval from 30 to 300 ms after stimulus onset for the V1 neurons and 50 to 350 ms after
stimulus onset for the IT neurons. Gershon et al. (1998)
found that both transmitted information and channel capacity based on
the spike count code are stable with respect to small changes in these
counting windows. For each neuron, each stimulus was presented
approximately the same number of times (±2) in randomized order.
Different neurons received different numbers of presentations. The
number of presentations of each stimulus was between 10 and 50 in V1
and between 19 and 50 in IT. Seven V1 neurons with <10 trials per
stimulus were omitted from these analyses to ensure stability of
information estimates (Golomb et al. 1997
; Wiener and Richmond 1998
). The timing of events, including spikes, was recorded with 1-ms resolution.
Quantifying the responses
Each spike train was low-pass filtered by convolution with a
Gaussian distribution with standard deviation of 5 ms and resampled at
1-ms resolution to create a spike density function (Fig.
2, A and B).
Convolving and resampling in this way avoids a problem of binned
histograms, which do not distinguish between spikes at the center of a
bin and those near the edges (Richmond et al. 1987
;
Sanderson and Kobler 1976
; Silverman
1986
).
|
We used principal component analysis to reduce the dimension of the
data set. Principal component analysis defines an ordered set of axes,
each accounting for more of the variance in the data set than those
that follow. These axes can be computed by finding the eigenvectors of
the covariance matrix of the data (Ahmed and Rao 1975
;
Deco and Obradovic 1996
). Each data point is defined uniquely by its projections onto these axes; these values are called
the coefficients with respect to the principal components, or principal
component coefficients. We use the code formed by the first
k principal component coefficients because it is the optimal
k-dimensional linear code for least-squares reconstruction and minimizes an upper bound of information loss through a one-layer network (Campa et al. 1995
; Deco and Obradovic
1996
; Plumbley 1991
). Principal-component
analysis has a long history in signal analysis (Ahmed and Rao
1975
) and has been used to study information coded by temporal
modulation in neuronal responses (Heller et al. 1995
;
Kjaer et al. 1994
; McClurkin et al. 1991
;
Optican and Richmond 1987
; Richmond and Optican
1987
, 1990
; Tovee and Rolls 1995
; Tovee
et al. 1993
). Figure 2 shows rasters of the responses elicited
by two stimuli from a single V1 neuron, the corresponding spike density
functions, the first four principal components,
1-
4, of the
responses from the neuron, and the principal component representation
of the two responses.
In this study, each neuronal response was represented by the
corresponding coefficients with respect to the first and second principal components of the spike density functions (Richmond and Optican 1987
, 1990
). We call these coefficients
1 and
2.
1 and
2 can be
translated by arbitrary constants as long as the appropriate constant
multiples of
1 and
2
are subtracted from the results. This manipulation is similar to
rewriting the expression a + bx as a + bx0 + b(x
x0). It is conventional to use the average
waveform as the base waveform, translating the axes so that the average
waveform corresponds to a vector of zeros. This causes some values of
1 and
2 to be
negative. Here we translated the axes so that
1 and
2 are always
positive and logarithms could be taken. The specific form of the new
base waveform is not important for our analyses.
Model of response variability
Estimating channel capacity requires estimating all possible
(
1,
2) response
distributions. Our estimate of these distributions must be based on the
responses elicited by stimuli actually presented in our experiments. It
is important to note that if the responses elicited by the stimuli
presented in our experiments are not representative of the responses
elicited by other stimuli, our model will generalize poorly.
Ideally, we would estimate the two-dimensional conditional distribution
of responses p(
1,
2|s) directly for each stimulus s. However, we did not have enough responses to each
stimulus to do so. Instead, we described the distributions of
1 and
2 individually,
and assumed that
2 was independent of
1 (except for correlations imposed by
truncation at the bounds of the response space, see following text). To
be precise
|
|
(1) |
|
|
(2) |
js2 are the mean and variance of
j elicited by stimulus s,
1min is the minimum value of
1,
2min
(
1) and
2max
(
1) are the calculated bounds on
2 for given
1, and
K1 = 
1min
d
1p(
1|s)
and K2 =
d
2p(
2|s)
are normalizing factors. We show in the results that the Gaussian
distribution provides an acceptable fit to the distributions of
1 and
2 for each stimulus.
Although
1 and
2 are
by construction uncorrelated in the responses from each neuron,
correlations are sometimes present in the responses elicited by
individual stimuli. In RESULTS, we observe that the set of
responses is bounded; these bounds will be described quantitatively at
that point. We observe correlations between
1
and
2 only in those distributions lying close
to the response bounds. Therefore our model assumes that correlations between
1 and
2 arise
only as a result of truncating distributions at the boundaries of the
response space. We make this same assumption when characterizing the
set of all possible response distributions to calculate channel
capacity. We make this assumption because we do not have data to
justify any other: under other circumstances (and with sufficient data)
we might find, and include in a model, such correlations. The close
match of our estimates of transmitted information to those obtained
using other well-validated methods (see RESULTS) suggests
that in this case the assumption is justified.
Means and variances of response distributions
Two-dimensional Gaussian distributions (given our correlation
assumptions, preceding text) are characterized by two means and two
variances. All four parameters can be measured from experimentally observed responses. In RESULTS, we show that the variances
of
1 and
2
distributions are related to mean
1 by a power
law
|
(3) |
|
(4) |
i2 are the mean and variance
of
i, for i = 1, 2. ai and
bi are estimated by linear regression.
These regressions were used to estimate
is2 for i = 1, 2 in Eqs. 1 and 2. Note that the variances of
both
1 and
2 were
modeled as a function of the mean of
1 (we
justify this in RESULTS).
Estimates of log(µ) and log(
2) obtained by
taking the logarithm of the sample mean and variance are biased,
resulting in underestimation of the variance of response distributions
and therefore overestimation of transmitted information. We corrected
for the bias using a Taylor series expansion; only a few terms are
needed for good results (Kendall and Stuart 1961
, p.
4-6).
We also used linear regression to estimate the mean and variance of the
values of
2 when
1
takes on a given value, no matter which stimulus elicited the response.
Transmitted information and channel capacity
The information carried in a neuron's response about which
member of a set of stimuli is present (Cover and Thomas
1991
) is defined as
|
(5) |
s
p(r|s)p(s) is the
probability of response r. Equation 5 is general;
it can be applied to responses r of any dimension. We used
both the one-dimensional spike count code and the two-dimensional principal component code, r =
= (
1,
2).
The transmitted information I(
;S) depends on
p(s), the distribution of presentation
probabilities of the stimuli (see Eq. 5). The channel
capacity of a cell is the maximum value of transmitted information
over all distributions p(s), where the set of
stimuli S should now be understood to include all possible
visual stimuli. Finding this maximum requires knowing the conditional
response distributions p(r|s) for
all stimuli s. We estimate the distributions using the model
described earlier.
From Eqs. 1 and 2, we see that estimating
p(
|µ), the probability with which the distribution
with mean µ = (µ1,
µ2) elicits response
= (
1,
2), requires
µ1, µ2,
12, and
22. Given the power-law
relations between the means and variances of distributions (Eq. 3), µ1 determines both
12 and
22 but not
µ2. Thus for each µ1
there are many possible response distributions, identical except for
translation in
2. Each distribution, then, is
characterized completely by the two means µ1
and µ2. When calculating transmitted
information, only the observed distributions are considered (Fig.
3, top); when calculating
channel capacity, all possible distributions must be considered (Fig.
3, bottom).
|
According to this model, two stimuli that elicit the same
µ1 and µ2 from a neuron
in fact elicit identical response distributions and therefore cannot be
distinguished from one another by that neuron. Thus a stimulus can be
identified simply by the mean response it elicits. Equation 5 now can be written
|
(6) |
|µ) is the
probability with which the distribution with mean µ = (µ1, µ2) elicits
response
= (
1,
2), and p(
) =
µ d
p(
|µ) is the
total probability with which response
occurs. The channel capacity
is the maximum of this expression over the two-dimensional distribution
of probabilities p(µ) = p(µ1, µ2).
A derivation of Eq. 6 is given in Gershon et al.
(1998)
. That derivation is expressed in terms of the spike
count code but remains valid when spike count is replaced with any
other response code. Information and channel capacity based on the
spike count code were calculated using the method of Gershon et
al. (1998)
.
The search for the maximizing set of probabilities is subject to three constraints: the probabilities must be nonnegative; the probabilities must sum to one; and the range of means must be finite. The first two constraints arise from intrinsic properties of probability distributions. If the third constraint is violated, the transmitted information can be infinite and the problem of maximizing transmitted information is ill-posed. The implementation of these constraints and other numerical issues are discussed in the APPENDIX.
We did not penalize distributions for probability weight falling
outside the response envelope bounds for
2 (as
we did for probability weight falling outside the observed range of
1); we simply ignored that portion of the
distribution. This allows the widest possible separation between
distributions, causing our estimates of channel capacity to be larger
than if the boundaries had been strictly enforced. Thus our procedure
was designed to give the most generous estimates possible (consistent
with the constraints estimated from the data) of channel capacity
associated with temporal coding.
| |
RESULTS |
|---|
|
|
|---|
We considered 6 neurons from one experiment in V1, 29 neurons from
a second experiment in V1, and 19 neurons from IT. To calculate transmitted information and channel capacity, we characterized the
space of responses, parameterized by
1 and
2, the coefficients with respect to the first
two principal components; determined the regression relations for the
means and variances of
1 and
2 elicited by the stimuli; and constructed a
model for the distributions of
1 and
2.
1 and
2 are somewhat
abstract, but are related to biologically relevant aspects of the
response. Responses with high (low) values of
1 correspond to responses with high (low)
numbers of spikes (McClurkin et al. 1991
;
Richmond and Optican 1987
, 1990
; Tovee et al.
1993
); in our data, the median correlation between spike count
and
1 was 0.91 (interquartile range
0.86-0.97).
2 is a coarse (±10 ms) measure
of temporal modulation (Optican and Richmond 1987
;
Richmond and Optican 1987
). It often characterizes whether or not spikes are concentrated in the early part of the response (Fig. 2).
Response space
Although the coefficients with respect to the first and second
principal components (
1 and
2) are linearly uncorrelated by construction
(Ahmed and Rao 1975
; Deco and Obradovic
1996
), they are not independent. Figure
4 shows scatterplots of
2 versus
1 for two
cells from V1 and two cells from IT. In these four cells, and in all
other cells in our sample, the range of
2
increased with increasing
1. The apex of the
cone is the (
1,
2)
pair representing trials when no spikes were elicited.
|
The limited range of
2 for a given value of
1 reflects the fact that only a certain amount
of temporal modulation is possible with a given number of spikes. If
each of k spikes in a train could be placed in any of
n time bins, assuming only that no bin could hold more than
one spike, the number of possible patterns would be
(kn) = n!/[k!(n
k)!].
The range of values of
2 seen under this combinatorial assumption (Fig. 5, thin
line) is much wider than the range observed experimentally for any
given number of spikes k (Fig. 5, dots).
|
Because we do not understand the spike generation process well enough
to predict the response envelope, we developed a statistical model of
the mean and variance of distributions of
2
corresponding to different values of
1. We
divided the range of
1 into a number of bins,
each of which defines a
1 slice in the
two-dimensional response space. Figure
6A shows the top left
scatterplot from Fig. 4 with slice boundaries superimposed. The
mean and variance of
2 were calculated in each
1 slice. Figure 6B shows an example of the linear relation between log(variance) of
2 and log(mean) of
1
in each
1 slice. This relation was significant
in 47/54 neurons at the P < 0.01 level (median
r2 = 0.91, iqr = 0.79-0.94). Figure 6C shows an example of a fit of the mean
of
2 as a quadratic function of the mean of
1 in each slice. The quadratic fit was
rejected in only 3/54 neurons (
2 test,
p
0.01) and is used here.
|
We used these relations to estimate the bounds of the envelope
containing the response space by

2(
1) ± k±
2(
1), where
and
are the mean and standard deviation
estimated as in the preceding text. For each cell,
k+
(k
) was chosen so that all but 0.1%
of the points fell below (above) the boundary. Several points were
allowed to fall outside the bounds to prevent excessive influence of
outliers. We show below that small changes in the response envelope do
not affect our results. For all 54 neurons, the response space envelope
estimated using the regression (thick lines in Fig. 5) was much
narrower than the envelope assuming spikes can fall in arbitrary 1-ms
bins (the thin lines in Fig. 5).
Mean-variance relations of
1 and
2 by
stimulus
The logarithms of the mean and variance of spike count in response
to different stimuli are linearly related (Dean 1981
;
Gershon et al. 1998
; Lee et al. 1998
;
O'Keefe et al. 1997
; Tolhurst et al. 1981
,
1983
; van Kan et al. 1985
; Vogel et al.
1989
). Because
1 is correlated
strongly with spike count (Richmond and Optican 1987
),
it is natural that the logarithms of the mean and variance of
1 also are related linearly (in 52/54 neurons
at P < 0.01; median
r2 = 0.79, iqr = 0.61-0.91; see Fig. 7). By simple
extension, we might expect there to be a linear relationship between
log(mean) and log(variance) of
2 as well.
However, such a relation existed in only 11/54 of the neurons
(P < 0.01; median r2 = 0.09, iqr = 0.02-0.21). Instead, we found that the
variance of
2 elicited by a single stimulus
increased with the mean of
1 (in 43/54 neurons
at P < 0.01; median
r2 = 0.66, iqr = 0.38-0.84; see Fig. 8). This is
consistent with the fact that the range (variance) of
2 increases with increasing
1 (Fig. 4). Adding
mean(
2) to the model added very little
explanatory power (median increase in
r2 = 0.02, iqr = 0.0-0.09); for simplicity, we omitted mean(
2) from the model. Using these regressions, we predicted the variance of
1 and
2 distributions
elicited by different stimuli from the means of the
1 distributions.
|
|
To check that our regression estimates were robust, we divided the data for each neuron into two sets. The regressions for the two halves were statistically indistinguishable (P > 0.01) for all 54 neurons. The fact that different subsets of the data result in indistinguishable regression lines shows that the model can generalize, and supports using the model to predict the structure of responses to stimuli other than those presented in our experiments.
Distributions of principal component coefficients
Estimating the transmitted information between the visual stimuli
and the neuronal responses requires estimating the conditional response
distributions p(
1|s)
and p(
2|s). We assumed
that the
1 and
2
distributions are separable (except for the relation between
1 and the range of
2)
and examined normal, lognormal, and gamma distributions, each truncated
at the bounds of the response space, as models for the
1 and
2 distributions.
Using the observed mean and the variance estimated from the regression
relation, the normal distribution (truncated at the boundary of the
response space) was an acceptable fit for 79% of the distributions of
1 elicited by individual stimuli, 83% of the
distributions of
2 elicited by individual
stimuli, and 79% of the distributions of
2
given
1 regardless of stimulus (Kolmogorov-Smirnov test, P < 0.05). The gamma
distribution was acceptable in almost exactly the same cases as the
normal distribution, but the lognormal distribution was rejected much
more frequently. We chose to use the normal distribution. Because the
gamma distribution fit nearly as well as the normal, the information
calculations presented in the following text were repeated using the
gamma distribution for several cells; the results were
indistinguishable from those obtained using the normal distribution.
Transmitted information
In this study, our goal was to estimate channel capacity, not
transmitted information (which has been estimated for these data in the
past) (see Eskandar et al. 1992
; Heller et al.
1995
; Kjaer et al. 1997
; Richmond et al.
1990
). Transmitted information was calculated as a test of our
model: estimates based on the model can be compared with estimates
obtained using a previously validated neural network method
(Golomb et al. 1997
; Kjaer et al. 1994
).
If the assumptions in our model are reasonable, the two methods should
give similar results. This has been shown to be the case when spike
count is used as the neural code (Gershon et al. 1998
);
we found that the two methods also give similar results for the
(
1,
2) code used
here. The least-squares line relating the two information estimates had
intercept 0.04 and slope indistinguishable from 1. The
r2 value was 0.94. The difference
between the two measurements was of the order of magnitude by which
Golomb et al. (1997)
found that the neural network
underestimates transmitted information with limited numbers of samples.
Finding that the information measurements using the two methods are
similar led us to believe that the assumptions in our model are
reasonable and that the model can be used to estimate channel capacity.
Figure 9 shows, for each neuron, the
transmitted information using (
1,
2) as the neural code plotted against
transmitted information using spike count as the neural code. There was
no significant difference in the information transmitted using the spike count code by neurons in V1 and IT (V1: 0.35 bits median, interquartile range 0.27-0.55; IT: 0.31 bits median, interquartile range 0.18-0.39; P > 0.01 Kruskal-Wallis). Although
the increase in transmitted information from the spike count code to
the (
1,
2) code is
significantly larger in V1 than in IT (P < 0.01), the
difference in information transmitted using the
(
1,
2) code is still
not significant (V1: 0.52 bits median, interquartile range 0.40-0.78;
IT: 0.37 bits median, interquartile range 0.22-0.55; P > 0.01). This represents an increase in transmitted information of
55% (median; interquartile range 22-74%) for neurons in V1, and 19%
(median; interquartile range 9-41%) for neurons in IT. Transmitted
information using
1 (which is correlated
strongly with spike count) as the code was 12% (median; interquartile
range 3-29%) greater than transmitted information using spike count in V1 neurons and 5% (median; interquartile range
1-11%) greater in IT.
|
To check that we did not lose stimulus-related information by smearing spike arrival times with too broad a convolution kernel, we repeated the information calculations with responses smoothed using a Gaussian kernel with a standard deviation of 1 ms rather than 5 ms. No additional information was found.
Channel capacity
Figure 10 shows, for each neuron,
the channel capacity using (
1,
2) as the neural code plotted against channel
capacity using spike count as the neural code. There was a small but
significant difference between the channel capacities using spike count
code of neurons in V1 and IT (V1: 1.1 bits median; interquartile range 0.91-1.30; IT: 1.4 bits median; interquartile range 1.2-1.5;
P < 0.01 Kruskal-Wallis). There was no significant
difference in channel capacity using the (
1,
2) code (V1: 2.0 bits median; interquartile
range 1.8-2.3; IT: 2.2 bits median; interquartile range 1.8-2.5).
This represents an increase in channel capacity of 84% (median;
interquartile range 62-124%) for neurons in V1 and an increase of
52% (median; interquartile range 32-95%) for neurons in IT. This
increase in channel capacity is a result of temporal modulation and is
larger than estimated using only the observed responses (transmitted
information). Channel capacity using
1 (which
was correlated strongly with spike count) as the code differed from
channel capacity using spike count as the code by 7% (median;
interquartile range
7-22%).
|
We performed several analyses to verify that our estimates of channel
capacity are robust with respect to small changes in the response space
boundaries. As for spike count code (Gershon et al.
1998
), channel capacity depends on the range of responses the
cell is capable of emitting in response to a stimulus. If we
underestimate a neuron's dynamic range, we will underestimate its
channel capacity. Here we estimated the neuron's dynamic range based
on the responses observed. It is possible that we have not used stimuli
that elicit the highest possible firing rates (and so
1) from these neurons. Nonetheless the peak
firing rates we saw in these V1 and IT neurons are similar to those
reported by others using a wide variety of stimuli, including natural
stimuli (Baddeley et al. 1997
; Perrett et al.
1984
; Rolls 1984
; Rolls et al. 1982
;
Tolhurst et al. 1981
, 1983
; Vogel et al.
1989
), so we believe that our estimate of the dynamic range is
reasonable. For several neurons, we examined the effect of allowing
part of the distribution of
1 to fall outside
the observed dynamic range. When we allowed as little as 0.5% or as
much as 5% of the distribution of
1 to fall
outside the observed range, estimated channel capacity changed by
<4%. Similarly, widening the bounds on
2 for
given
1 by 5% increased the channel capacity
by <3%. If new evidence were to show that the proper range for either
1 or
2 is larger than
we have estimated here, channel capacity could be recalculated using
these methods.
Distribution of responses achieving channel capacity
We estimated channel capacity by finding the distribution (in 2 dimensions) of mean (
1,
2) responses that allows the cell to transmit
the maximum possible information using a code based on
1 and
2.
Figure 11A shows an example
of such a distribution. The horizontal and vertical axes show means of
1 and
2,
respectively. Shades of gray indicate how frequently each mean is
presented to achieve channel capacity. Because some of these
distributions are quite broad, the distribution of observed responses
arising from this distribution of mean responses is diffuse (not
shown). The projection of the (
1,
2) distribution onto
1, that is, the distribution of mean
1 implied by the two-dimensional distribution, is shown as a histogram immediately below. Figure
11B shows the distribution of mean
1 values that achieves channel capacity using
1 alone as the neural code. In both histograms
the horizontal axis shows mean
1 values, and
the vertical axis shows the frequency with which distributions with the
appropriate mean value are presented to achieve channel capacity. The
projection of the optimal two-dimensional distribution onto the
1 axis is less concentrated than the optimal one-dimensional distribution.
|
Role of further principal components
Throughout this study we limited our analysis to two principal
components. The reason was practical: using a three-component code
increases the computational burden beyond the resources currently available to us. We can, however, address the issue of whether the use
of more principal components in the response representation can be
expected to lead to substantially higher estimates of information or
channel capacity. Successive principal components, by definition, account for successively smaller portions of the response variance, that is, their range decreases. This decrease is rapid in our data.
Therefore their information content (and so their contribution to
channel capacity) must decline unless the noise associated with each
principal component also decreases with the range. Figure 12 shows that this decrease does not
happen in our data. Each column shows the distribution (across 54 cells) of the signal-to-noise ratio (the variance of mean responses to
stimuli divided by the median variance of responses to single stimuli)
for spike count or one of the first 10 principal components. A
signal-to-noise ratio less than one means that the variability of
responses to a given stimulus is greater than the variability of mean
responses to different stimuli, so the responses distinguish only
poorly among the stimuli that elicited them. Thus the third principal component will contribute much less channel capacity than the first and
second principal components, and the fourth and higher principal
components are expected to contribute insignificantly if at all. We
verified that, as expected, the fifth through tenth principal
components contribute no information not redundant with information in
the first principal component. Thus our code using only
1 and
2 should carry a very large
proportion of the information available in the responses.
|
| |
DISCUSSION |
|---|
|
|
|---|
Here we have constructed a model of neuronal responses, based on
principal component analysis, that includes both response intensity
(firing rate) and a low-precision (±10 ms) representation of temporal
modulation. This temporal precision and this code have been shown to
carry a very large portion of the information useful for the
identification of statically presented two-dimensional stimuli
(Heller et al. 1995
; McClurkin et al.
1991
; Optican and Richmond 1987
; Richmond
and Optican 1990
; Tovee et al. 1993