Theoretical studies of mammalian cortex argue that efficient neural codes should be sparse. However, theoretical and experimental studies have used different definitions of the term “sparse” leading to three assumptions about the nature of sparse codes. First, codes that have high lifetime sparseness require few action potentials. Second, lifetime-sparse codes are also population-sparse. Third, neural codes are optimized to maximize lifetime sparseness. Here, we examine these assumptions in detail and test their validity in primate visual cortex. We show that lifetime and population sparseness are not necessarily correlated and that a code may have high lifetime sparseness regardless of how many action potentials it uses. We measure lifetime sparseness during presentation of natural images in three areas of macaque visual cortex, V1, V2, and V4. We find that lifetime sparseness does not increase across the visual hierarchy. This suggests that the neural code is not simply optimized to maximize lifetime sparseness. We also find that firing rates during a challenging visual task are higher than theoretical values based on metabolic limits and that responses in V1, V2, and V4 are well-described by exponential distributions. These findings are consistent with the hypothesis that neurons are optimized to maximize information transmission subject to metabolic constraints on mean firing rate.
- efficient coding
- redundancy reduction
- natural vision
it has long been argued that neural representations in sensory cortex are optimized to efficiently represent natural stimuli (Attneave 1954; Barlow 1961). This “efficient coding hypothesis” is compelling because it provides a theoretical framework for understanding the evolutionary and developmental constraints on neural systems. Moreover, it raises the possibility that there may be a single theoretical principle that determines the structure of neural codes across many species and sensory systems.
A particularly influential form of the efficient coding hypothesis is “sparse coding” (Field 1987; Olshausen and Field 1996; Rolls and Tovee 1995). Broadly speaking, sparse codes are ones in which metabolically expensive action potentials occur rarely, resulting in a code that is both computationally (Treves and Rolls 1991) and metabolically (Levy and Baxter 1996) efficient. The alternative is a dense code that relies on more frequent action potentials and is therefore potentially more metabolically inefficient. Experimental studies have provided evidence for sparse coding across multiple sensory modalities: vision (Rolls and Tovee 1995; Vinje and Gallant 2000; Weliky et al. 2003), audition (DeWeese et al. 2003), and olfaction (Perez-Orive et al. 2002; Stopfer 2007). These studies suggest that sparse coding may reflect a general principle of sensory processing across sensory modalities and species.
This simple definition of a sparse code–one in which action potentials are relatively rare–is a convenient way to refer to a broad concept. However, to model accurately systems that use sparse neural codes or to measure and compare the sparseness of real neural codes, we need a more precise, formal definition. Unfortunately, different studies of sparseness have used different definitions of sparseness, and, as a result, it is difficult to compare sparseness measurements across studies and to compare models of sparse codes with real neural codes. Moreover, the seemingly small differences between these definitions can have profound theoretical implications. Specifically, it is possible that real neural codes are sparse according to some definitions but not according to others.
This paper examines four definitions of sparseness that frequently appear in the current literature. The first defines a sparse code as one in which neurons have a low mean firing rate (we will refer to this as “low mean firing rate”). This is the simplest definition, and it has an obvious intuitive connection with the term “sparse”. This definition is commonly used colloquially, and it has also been used in quantitative investigations of sparseness (Assisi et al. 2007; Hahnloser et al. 2002). Codes with low mean firing rates have an obvious advantage: action potentials require substantial energy, so lower firing rates codes have lower metabolic requirements.
The second defines a sparse code as one in which the population response distribution elicited by each stimulus is peaked (we will refer to this as “population sparseness”). A peaked distribution is one that contains many small (or 0) values and a small number of large values. Thus a neural code will have high population sparseness if only a small proportion of neurons are strongly active at any given time. It has been suggested that neural codes with high population sparseness are advantageous because they resemble the inherently sparse structure of the sensory environment (Field 1987). Kurtosis is the standard statistical measure of the peakedness of distributions; however, kurtosis is not appropriate for real neural data, where response distributions are nonnegative. Rolls and Tovee (1995) and Vinje and Gallant (2000) provide alternative measures that can, in principle, be used to quantify the peakedness of nonnegative neuronal response distributions. However, accurate measurement of population sparseness is difficult. Population sparseness depends on the correlations between neural responses (Willmore and Tolhurst 2001), and accurate estimation of those correlations requires a dense survey of the responses of many single neurons. This will inevitably include both neurons with similar receptive fields (for which responses are likely to be correlated) and neurons with dissimilar receptive fields (for which responses are likely to be uncorrelated or even anticorrelated). Since true population sparseness is difficult or even impossible to measure, lifetime sparseness (defined below) is often used as a proxy for population sparseness.
The third definition of a sparse code is one in which individual neurons have peaked lifetime response distributions (we will refer to this as “lifetime sparseness”). A neuron will have high lifetime sparseness if it is silent most of the time but occasionally produces strong responses. Lifetime sparseness is superficially similar to population sparseness because both are measures of the peakedness of response distributions. However, these two types of sparseness are fundamentally different. Lifetime sparseness refers to the distribution of responses of a single neuron across many stimuli, whereas population sparseness refers to the distribution of responses of a population of neurons to a single stimulus. Willmore and Tolhurst (2001) observed that lifetime and population sparseness are effectively decoupled; a neural code may be lifetime-sparse but not population-sparse or vice versa.
Lifetime sparseness was used in two important theoretical investigations of sparseness (e.g., Bell and Sejnowski 1997; Olshausen and Field 1996) and has also been applied to experimental data (e.g., Baddeley et al. 1997; Rolls and Tovee 1995; Vinje and Gallant 2000). It can be calculated using the same measures mentioned above for population sparseness. The theoretical advantages of other types of sparseness, redundancy minimization, minimization of mean firing rates, and population sparseness, have also been ascribed to lifetime sparseness. However, these theoretical advantages are not directly conferred by lifetime sparseness, for which true benefits have not yet been articulated clearly.
The final definition of a sparse code is one that maximizes the information transmitted by each action potential (we refer to this as “information per spike”). This idea is intuitively related to efficient neural coding, but its relationship to lifetime sparseness is less clear. Levy and Baxter (1996) suggested that the high metabolic cost of action potential generation may impose an upper limit on the mean sustainable firing rate for a population of neurons. Therefore, to make efficient use of the limited number of available action potentials, each neuron should transmit the maximum possible information with each action potential. Information theory (Cover and Thomas 1991) shows that, for a fixed mean firing rate, maximizing information per spike results in an exponential response distribution. Exponential distributions have high peaks and heavy tails, so if neurons maximize information per spike, they will exhibit exponential response distributions with moderate lifetime sparseness.
Although all the definitions provided above refer to sparseness, each refers to a different aspect of neural coding, each is measured in a different way, and each has different theoretical implications. They are similar, however, in that they suggest that action potentials should be relatively rare events. Because of this similarity, it is often assumed that there are no significant differences between the definitions. Therefore, the first aim of this paper is to show that these definitions refer to genuinely different aspects of neural coding. Specifically, we will argue that lifetime-sparse neural codes do not necessarily have low mean firing rates nor do they necessarily have high population sparseness. To demonstrate this, we provide a taxonomy of different theoretical constraints on neural codes and use this to clarify the differences between these constraints.
The second aim of this paper is to test the hypothesis that neural codes in the visual system are optimized to maximize lifetime sparseness. If this is true, then neural codes should be as sparse as possible at each level of the visual hierarchy. In V1, the code used by simple cells is approximately linear, and it seems to be as lifetime-sparse as a linear code can be. Beyond V1, the representation becomes progressively less linear, and it is possible (in principle) for neurons to form codes with much higher lifetime sparseness. Thus, if the lifetime sparseness maximization hypothesis is correct, we should expect the codes in V2 and V4 to have higher lifetime sparseness than the code in V1. To test this hypothesis, we compare the lifetime sparseness of neural codes in three visual areas of the macaque, V1, V2, and V4, under naturalistic conditions.
The final aim of this paper is to test the hypothesis that visual cortex maximizes information transmission per spike. This hypothesis makes different predictions about the structure of neural codes: that their mean firing rates should be constrained and that their response distributions should be exponential in shape. To test this hypothesis, we compare firing rates in V1, V2, and V4 in a challenging naturalistic visual task with theoretical limits and measure the shape of response distributions in these areas.
MATERIALS AND METHODS
Neurophysiological recordings were made from several visual areas in 4 adult male macaques (Macaca mulatta) using 2 different experimental protocols. A head positioner, search coil (in 1 animal), and recording platform were placed using sterile surgical techniques under isoflurane anesthesia [see Mazer and Gallant (2003) for details]. Areas V1, V2, and V4 were located by exterior cranial landmarks and/or direct visualization of the lunate sulcus, and location was confirmed by comparing receptive field properties and response latencies with those reported previously (Desimone and Schein 1987; Gattass et al. 1981; Schmolesky et al. 1998).
All procedures were approved by the Animal Care and Use Committee of the University of California, Berkeley, and were conducted in strict accordance with good practice as defined by the Office of Laboratory Animal Care at UC Berkeley, the National Institutes of Health, the Society for Neuroscience, and the American Association for Laboratory Animal Science.
Visual Search Task
Firing rate measurements were made from neurons in area V4 of two animals while they performed a free-viewing visual search task (see Fig. 1A; Mazer and Gallant 2003). The opportunity to perform the task was cued by presentation of a textured noise pattern. Animals initiated each trial by grasping a touch bar. A search target was then presented at the center of a 21-in. cathode ray tube (CRT; Viewsonic PS790; 37- or 45-cm viewing distance). Each search target was a circular image patch extracted from a commercial digital library (Corel). The radius of the patch was set to be equal to the size of the spatial receptive field, up to a maximum radius of 5°. (Receptive fields of recorded V4 neurons ranged from 0.5 to 16° in radius, with a mean of 4.6°.) Each patch was converted to grayscale and blended smoothly into a background pattern consisting of 1/f-filtered white noise or 1/f random texture.
Animals were given 2–4 s to inspect the search target using voluntary eye movements. This was followed by a 2- to 4-s delay during which only the textured background pattern was visible. Then, an array of 4–25 potential match stimuli was presented (these were also blended into the background pattern). Each array remained on the monitor for 2–5 s. If the search target appeared anywhere in the array, the animal had to release the touch bar no later than 500 ms after array offset to receive liquid reward. If the target was not present in the search array, a new array was presented after a delay of 2–3 s. The delay-array sequence was repeated 1–7 times (with a uniform probability distribution), and the final array on each trial always contained the search target. Failure to detect the target was indicated by an error tone followed by a brief time-out period. Trials lasted between 2 and 42 s.
Lifetime sparseness measurements were made from neurons in V1, V2, and V4 during performance of a simple fixation task (see Fig. 1B; see Willmore et al. 2010 for full details). Eye position was monitored during task performance, and trials in which eye position deviated >0.5° from the fixation spot were excluded from further analysis. The standard deviation of fixational eye movements was typically 0.1°.
First, the receptive field of each neuron was identified using manual mapping, followed by quantitative mapping using sparse noise stimuli. Then, each neuron was probed with a rapidly changing sequence of natural images. Each search target was a circular image patch extracted from a commercial digital library (Corel). Patches were chosen by an automated algorithm that selected patches at random, favoring those with high contrast (to reduce the frequency of blank stimuli, e.g., patches of sky). Patches were converted to grayscale and adjusted with a gamma nonlinearity of 2.2 to give an appropriate luminance profile on our calibrated, linearized display. The outer edges of the patches (10% of the radius) were blended smoothly into the neutral gray background, for which luminance was chosen to match the mean luminance of the image sequence.
Random image sequences were constructed by concatenating images extracted according to the procedure described above. In areas V1 and V2, the image switched on every frame, giving a presentation rate of 60 Hz (72 Hz in a few cases). In area V4, the image changed on alternate frames, giving a presentation rate of 30 Hz. All images were centered on the receptive field. To ensure that the relative activations of receptive field center and surround were not grossly different between areas, the patch size was adjusted to be 2–4 times the receptive field diameter in V1 and V2 and 1–2 times the receptive field diameter in V4. The entire sequence was broken into 3- to 5-s segments, and 1 segment was presented on each fixation trial. To avoid transient trial onset effects, the 1st 200 ms of data acquired on each trial were discarded before analysis. The total number of images presented to each neuron varied between 4,000 and 80,000.
In both tasks, behavioral control, stimulus presentation, and data collection were performed on a Linux microcomputer using custom software (PyPE). Eye movements were recorded either with a scleral search coil (1 animal, 1 kHz; Judge et al. 1980) or an infrared eye tracker (120 Hz: RK-801, ISCAN, Burlington, MA; or 500 Hz: EyeLink II, SR Research, Toronto, Canada). Latencies associated with the video-based trackers were compensated during offline analysis (Gawne and Martin 2000). Single neuron responses were recorded using high-impedance (nominally 10–25 MΩ), epoxy-coated tungsten microelectrodes (125-μm diameter, 20–25° taper; Frederick Haer, Brunswick, ME). A microdrive system (MM-3BF; National Aperture, Nashua, NH) was used to advance electrodes through the intact dura perpendicular to the cortical surface. In early experiments, signals were amplified (Model 1800; A-M Systems, Seattle, WA) and band-pass filtered (0.1–10 kHz; custom filter), and spikes were isolated using a conventional window discriminator. In later experiments, neural signals were recorded with a dedicated multichannel recording system (amplification, filtering, and spike detection in a single unit; MAP; Plexon, Dallas, TX). In these experiments, spike times were recorded with 1-ms resolution by the same computer system used to control the behavioral task and to monitor eye movements.
Firing rate measurement.
For each neuron, we measured instantaneous firing rate, r, by counting spikes in bins corresponding to the monitor refresh rate (usually 16.7 ms; 13.3 ms in 15 cases). In cases where stimuli were presented more than once, we took the mean of the responses to multiple presentations.
Lifetime sparseness measurement.
We define the sparseness, S, of a response distribution, P(r), as follows: (1) where E[·] is the expectation value. This definition is identical to that used by Vinje and Gallant (2000). This measure is 0 for a dense code and increases to 1 for a sparse code. [It is a rescaling of that used by Treves and Rolls (1991).] We measured the lifetime sparseness, SL, of each neuron by calculating the sparseness of its responses to all presented stimuli. Individual estimates of SL were then used to calculate mean lifetime sparseness across all n neurons: (2)
Adjustment for spontaneous activity.
There is no consensus about how spontaneous activity should be treated in neural coding analyses. From a theoretical perspective, subtracting spontaneous activity from measured firing rates should not alter coding efficiency, because, by definition, spontaneous rate is the same for all stimuli for a given neuron. However, if we are to consider the role of metabolic demands in shaping neural codes, spontaneous activity must be included because all action potentials, even spontaneous ones, consume energy. To determine whether our results were affected by spontaneous activity, we characterized the response distributions both with and without subtraction of the spontaneous rate. To estimate the spontaneous rate, we calculated the mode (most commonly occurring value) of the responses of each neuron to the natural image set. Since most natural images do not elicit a response from most neurons (Smyth et al. 2003), the mode of the response distribution is an accurate estimate of the spontaneous firing rate. We calculated adjusted responses as follows: (3) where r̂ is the mode of the distribution. This adjustment removes the mode and sets to 0 any responses that were previously lower than the mode, modeling the response distribution that would have been produced if the neuron had no spontaneous activity. We found no qualitative difference between lifetime sparseness of the unadjusted responses, r, and the adjusted responses, r′. Therefore, we only report results for unadjusted responses.
To characterize the shape of the response distribution for each neuron, we fit the logarithm of the response distribution, P′(r) = ln[P(r)], with a quadratic function relating P′(r) and r: (4) where k, a, and b are constants.
An exponential distribution is perfectly fit by the 1st 2 terms in this equation. Therefore, the 3rd parameter, b, is an estimate of the 1st-order deviation of the distribution from an exponential. Positive values indicate a concave distribution; negative values indicate a convex distribution. (No neuron produced >8 spikes in any 16-ms bin, and there were at most 9 data points for each neuron. Given these data, including higher-order terms would likely result in overfitting of random fluctuations in the distribution shapes. Therefore, higher-order terms were not included in the model.) For comparison, we also fit a linear model consisting of only the 1st 2 terms (i.e., in which b = 0).
Different Definitions of Sparseness
The first aim of this study was to distinguish between several definitions of sparseness that have been proposed in the literature. Figure 2 presents a sparseness taxonomy, and in the following section we demonstrate theoretically that these definitions reflect genuinely different ideas that cannot simply be grouped together under a single rubric of sparseness.
High lifetime sparseness does not imply a low mean firing rate.
The first subdivision in Fig. 2 distinguishes overall activity (as measured by mean firing rate) from measures of response distribution shape: lifetime sparseness, population sparseness, and maximization of information per spike. The mean firing rate of a neuron is a metabolically important quantity. Theoretical studies (Attwell and Laughlin 2001; Lennie 2003) have suggested that the mean firing rates of neurons may be limited by the rate of ATP production. Attwell and Laughlin (2001) propose a limit of 4 Hz; Lennie (2003) proposes a limit between 0.16 and 0.94 Hz.
Constraints on mean firing rate and on response distribution shape are superficially related because they both imply limits on firing patterns. However, this relationship is only superficial. Mean firing rate is a measure of the neural response distribution central tendency, whereas lifetime and population sparseness (e.g., kurtosis and the Vinje-Gallant metric) are measures of the distribution shape. In general, a distribution central tendency and shape need not be related.
The difference between mean rate and lifetime sparseness is demonstrated in Fig. 3, which shows three schematic response distributions. The two distributions depicted with solid lines have the same mean but differ in shape. As a result of their different shapes, these two distributions also exhibit very different lifetime sparseness using the Vinje-Gallant sparseness measure (see materials and methods). This means that two neurons with the same mean firing rate can have different lifetime sparseness. Conversely, the two Gaussian distributions (solid and dotted lines) have different means. However, since both are Gaussian, they have the same shape and therefore the same lifetime sparseness. So, two neurons with the same lifetime sparseness may have different mean firing rates. This demonstrates that mean firing rate and lifetime sparseness need not be related.
The decoupling of mean firing rate and lifetime sparseness can be demonstrated formally: if one multiplies the responses of a neuron by a constant, c, the lifetime sparseness is unchanged: (5)
One can therefore arbitrarily scale a response distribution (and thereby arbitrarily change the mean firing rate) without altering its lifetime sparseness.
Equation 5 shows that even though lifetime sparseness and mean firing rate appear to be superficially similar coding properties, they are in fact quite different. They also have different implications. Mean firing rate is a measure of overall responsiveness, which has obvious implications for metabolic efficiency. In contrast, lifetime sparseness is dependent on the shape of the response distribution. This shape may have implications for the information content of neural codes but has no direct effect on metabolic efficiency. Because lifetime and population sparseness are identical measures of shape (but applied to different response distributions), the same argument also applies to population sparseness: neural codes with high population sparseness do not necessarily have low mean firing rates.
Lifetime sparseness and mean firing rate are equivalent for binary responses.
It is worth considering the relationship between mean firing rate and lifetime sparseness in the special case where responses are binary. In neurophysiology experiments, binary responses most commonly arise when action potentials are recorded in response to a single presentation of a stimulus and data are binned at a temporal resolution comparable with the discharge rate of the neuron. Under these conditions, the maximum number of spikes that can occur in any bin is one. [However, DeWeese et al. (2003) observed that responses of neurons in rat auditory cortex were binary over longer durations.] Also, the shape of any binary response distribution will be very different from the continuous distributions associated with sparse coding.
The expected value of a binary response distribution is equal to the proportion of responses, p, equal to 1: E[r] = nr=1/ntotal = p. Because E[r2] = E[r] = p, the lifetime sparseness of a binary response distribution reduces to: (6)
This equation suggests that a system with a binary response distribution will only have high lifetime sparseness when responses are rare. Thus, for a binary response distribution, the lifetime sparseness is inversely related to the mean firing rate. This relationship is unusual. Over longer, physiologically relevant timescales, neuronal response rates vary continuously, so lifetime sparseness and mean firing rate are not related so closely.
Spontaneous activity affects both mean firing rate and lifetime sparseness.
A second important relationship between mean firing rate and lifetime sparseness is that both are strongly affected by the presence of spontaneous activity. Consider the following simple model of a neural response distribution, which captures the basic shapes of the real response distributions we observed (Fig. 4): (7)
The exponential distribution with parameter λ approximates the response distribution of a linear filter with sparse responses. For example, the responses of a rectified Gabor filter to natural scene stimuli are typically exponentially distributed. The spontaneous rate is modeled as r0, and k is a normalization term, k = λ/[2 − exp(−λr0)]. Any increase in the spontaneous rate, r0, will increase the mean firing rate. Because the number of relatively small responses decreases, it will also reduce the lifetime sparseness of the distribution.
Example distributions produced by this model are shown in Fig. 5, A–C. The only difference between these 3 distributions is the amount of spontaneous activity, r0. When r0 = 0 (Fig. 5A), the mean firing rate, r̄ = 1.04, and the lifetime sparseness, SL = 0.52 (the precise value for an exponential is ½). When r0 increases to 2 (Fig. 5C), the mean rate, r̄, increases to 2.44, and the lifetime sparseness decreases to 0.20. Clearly, spontaneous activity increases mean firing rate and reduces lifetime sparseness. Thus any variation in spontaneous activity level will result in anticorrelation between mean firing rate and lifetime sparseness. This relationship is summarized in Fig. 5D.
Lifetime sparseness and population sparseness are simply related only if neural responses are independent and identically distributed.
The most important subdivision between measures of response shape (Fig. 2) is between population and lifetime measures. In particular, lifetime and population sparseness are different coding properties (Willmore and Tolhurst 2001). Here, we review this distinction briefly and show that lifetime and population sparseness are likely to be significantly different in typical neurophysiology data sets.
Consider the schematic neural population in Fig. 6A. We can measure the sparseness of the responses of this population in (at least) two different ways. First, we can measure the distribution of the responses of a single neuron to a set of stimuli and then measure the sparseness of this lifetime response distribution. Repeating this measurement for all neurons and averaging the result gives a single value that we refer to as lifetime sparseness: (8) where m is the number of neurons, and μi and σi are the mean and standard deviation of the responses of each neuron. Lifetime sparseness can be measured experimentally using standard single-neuron recording techniques (Baddeley et al. 1997; Vinje and Gallant 2000).
Alternatively, one can take the distribution of responses of the whole neuron population in response to a single stimulus and measure their sparseness. Repeating this for all stimuli and averaging the result produces a single value that we refer to as population sparseness: (9) where n is the number of stimuli, and μj and σj are the mean and standard deviation of the population response to each stimulus. Unlike lifetime sparseness, it is technically challenging to measure population sparseness directly because it requires recording from large numbers of single neurons (ideally simultaneously). It is therefore tempting to use lifetime sparseness as an estimate of population sparseness. However, because lifetime and population sparseness are both normalized measures of distribution shape, they are not generally equivalent. In fact, they are not even closely related.
Lifetime and population sparseness are only equivalent when the neuronal responses are independent and identically distributed (i.i.d.); in this special case, μi = μj = μ and σi = σj = σ and so SL = SP (Eqs. 8 and 9). However, when neural responses are correlated (as they usually are), lifetime and population sparseness are no longer equivalent. To illustrate this, consider the example shown in Fig. 6A. Here, the responses of different neurons are uncorrelated, and lifetime and population sparseness are closely related. However, in Fig. 6B, all neurons respond to the same single stimulus, so their responses are strongly correlated. As a result, the population sparseness is zero even though the lifetime sparseness is high.
Neural responses are not identically distributed.
If real neural response distributions were i.i.d., then lifetime and population sparseness would be equivalent, and so there would be no need to measure population sparseness directly. We therefore investigated whether the neurons in our sample were i.i.d.
Figure 4 provides examples of response distributions from each area, showing the range of different distributions. It is immediately clear that the responses are not identically distributed: the distributions have a wide range of means and shapes. To quantify these differences, we measured the mean response rate of each neuron and then measured the standard deviation of these values within each area. In V1, the standard deviation is 20.5 Hz (mean 18.5 Hz), in V2 it is 19.1 Hz (mean 21.7 Hz), and in V4 it is 22.1 Hz (mean 22.7 Hz). These values indicate very substantial variation about the mean firing rate in each area. In fact, several neurons in each area had mean rates above twice the average for the area. It is clear that this sample of neurons cannot be considered identically distributed. Since the neurons recorded here are not i.i.d., population sparseness cannot be accurately estimated from measurements of lifetime sparseness.
We did not assess the independence of the neural responses in our sample because this is likely to be critically dependent on the operation of short-range connections between neurons (Felsen et al. 2005; Schwartz and Simoncelli 2001). Accurate measurement of independence would therefore require simultaneous recording of neighboring neurons. Since the neurons in our sample were not recorded simultaneously, such estimates would almost certainly be biased.
Measurement of Lifetime Sparseness
The second aim of this study was to evaluate the hypothesis that the brain is optimized to maximize the lifetime sparseness of neural response distributions, as suggested by Olshausen and Field (1996) and Bell and Sejnowski (1997). This hypothesis predicts that neurons in each visual area should have the maximum possible lifetime sparseness.
In V1, the evidence suggests that neurons may indeed exhibit maximal lifetime sparseness. The organization and response properties of V1 simple cells suggest that they form a predominantly linear representation of visual input, apart from the nonlinearity of the spike generation function (Movshon et al. 1978). The receptive fields of simple cells closely resemble Gabor filters (Daugman 1980; Marcelja 1980). Gabor filters represent natural visual stimuli with lifetime sparseness that appears to be maximal for a linear representation (Field 1987; Willmore and Tolhurst 2001). This suggests that V1 simple cells are optimized to maximize lifetime sparseness, subject to the constraint that the representation must remain linear. This circumstantial evidence is supported by the theoretical work of Olshausen and Field (1996), which showed that sparseness maximization produces receptive fields similar to those of V1 simple cells.
Beyond V1, the representation becomes progressively less linear. It is therefore possible that neurons in relatively more central visual areas, like V2 and V4, might form a code with much higher lifetime sparseness than the simple-cell code found in V1. For example, V4 neurons may respond to combinations of edge segments that are more sparsely distributed in the visual world than are simple oriented edges. Extremely selective neurons found at the highest stages of visual processing might have a lifetime sparseness that approaches 1. In general, if the visual cortex is optimized to maximize lifetime sparseness, then we should find that lifetime sparseness increases between areas V1, V2, and V4.
Lifetime sparseness does not increase through the visual hierarchy.
To test whether neurons in relatively more central visual areas have relatively higher lifetime sparseness than those in earlier areas, we measured the lifetime sparseness of neurons in V1 (n = 47), V2 (n = 123), and V4 (n = 83) both during passive fixation and during presentation of natural image patches in and around the receptive field of each neuron. This task is a good assay of lifetime sparseness for two reasons. First, the stimuli used in these experiments were patches extracted from images of natural scenes. This is important because lifetime sparseness depends on both the neurons and the statistical structure of the input. Therefore, lifetime sparseness values measured with synthetic stimuli are likely to be different from ecologically relevant values obtained with natural scenes. Second, we rapidly presented a very large set of stimuli (from 4,000 to 80,000) to each neuron. Presenting stimuli rapidly this way enabled us to assess response distributions over a very large set of images, helping to ensure that estimated lifetime sparseness values accurately reflect the selectivity of each neuron for natural stimuli. Note, however, that because the images were presented rapidly in these experiments, they do not reflect the absolute firing rates that would be observed under natural viewing conditions.
Typical response distributions obtained in this experiment are shown in Fig. 4. If the lifetime sparseness maximization hypothesis is correct, then lifetime sparseness should increase from V1 to V2 to V4. In fact, there is little change in lifetime sparseness between these areas (Fig. 7). In V1 the median value is 0.88, in V2 it is 0.81, and in V4 it is 0.80. The change in lifetime sparseness from V1 to V2 is not significant (P = 0.07, NV1 = 47, NV2 = 123, Mann-Whitney U test). There is a small but significant decrease in lifetime sparseness between V1 and V4 (P < 0.01, NV1 = 47, NV4 = 83, Mann-Whitney U test), but this could be due to the slightly different stimulus paradigms used in the two areas. Stimuli were shown at 3× the diameter of the classical receptive field (CRF) of each neuron in V1 and V2 and at only 1.5× the CRF diameter in V4. In V1, larger stimuli produce responses with higher lifetime sparseness (Vinje and Gallant 2000). If the same effect occurs in V4, then it might account for the difference we observe here.
Nevertheless, it is clear that there is no increase in lifetime sparseness from V1 to V2 (where stimulus paradigms were identical), and it is unlikely that there is any increase from V1 to V4. We therefore conclude that there is no demonstrable increase in lifetime sparseness through the visual hierarchy, contrary to the prediction of the lifetime sparseness maximization hypothesis.
Because we presented images rapidly (30 or 60 Hz) in this experiment, it is possible that the response to each image might have been affected by neighboring images in the sequence (Keysers et al. 2001; Macknik and Livingstone 1998). However, presenting stimuli rapidly enabled us to assess the response distributions over a sufficiently large set of images, and on balance we believe that the resulting lifetime sparseness values are a good estimate of the selectivity of each neuron for natural stimuli.
Another potential problem with this analysis is that lifetime sparseness estimates can be biased by spontaneous firing rates. If spontaneous rates were higher in V4 than V1, then this might account for the relatively lower lifetime sparseness found in V4. To control for this possibility, we adjusted the stimulus-evoked responses by first subtracting the mode and then setting response rates less than 0 to 0. This adjustment provides a reasonable estimate of the response distribution that would be obtained from each neuron if it did not have any spontaneous activity. Although this adjustment increased the lifetime sparseness of a few neurons that had very low lifetime sparseness to begin with, it had a negligible effect on the median lifetime sparseness in any cortical area (overall lifetime sparseness change <0.01). Thus our conclusion that lifetime sparseness does not increase from V1 through V4 does not appear to be affected by small differences in spontaneous activity between these areas.
Information per Spike
The third aim of this study was to evaluate the hypothesis that the visual cortex is optimized to maximize information per spike. Both Levy and Baxter (1996) and Baddeley et al. (1997) proposed this as an alternative to sparseness maximization. If this hypothesis is correct, we should expect that mean firing rates are subject to metabolic constraints and in addition that neurons should have exponential response distributions.
Mean firing rates are likely to be limited by metabolic constraints.
The first prediction of the information per spike hypothesis is that there is a metabolic constraint on mean firing rate. Attwell and Laughlin (2001) and Lennie (2003) suggested that metabolic constraints do indeed limit the feasible firing rates of neurons in mammalian cortex. In independent calculations, these authors placed upper bounds on mean firing rate at approximately 4 and 1 Hz, respectively. Of course, it is still possible that these metabolic constraints are not important for neural coding: if the maximum feasible rates were much higher than those actually observed, we would conclude that metabolic constraints were not significantly constraining the neural code.
Therefore, we first measured mean firing rates evoked during a passive task where each animal maintained steady fixation on a centrally located spot while visual stimuli were flashed in and around the receptive field. In area V1, the mean was 18.5 Hz, in V2 it was 21.7 Hz, and in V4 it was 22.7 Hz. These rates are all substantially beyond the limits proposed by Attwell and Laughlin (2001) and Lennie (2003). It is possible that these values are a good estimate of mean rates during natural viewing conditions, because the fixation task was broken into trials lasting up to only 5 s, and the stimuli consisted of a very rapidly changing sequence of images designed to reduce the effect of adaptation. So, these high firing rates were not sustained for a long period of time and may be unusually high.
To obtain an estimate of firing rates under more naturalistic conditions, we measured mean firing rates of area V4 neurons during a naturalistic visual search task (Mazer and Gallant 2003). We believe this task provides a good assessment of firing rates that are likely to occur in V4 during natural vision. The stimuli were natural scene patches superimposed on a background with natural statistics, and animals performed the task using voluntary eye movements. Thus complex stimuli with relatively natural dynamics were present in and around the receptive field of each neuron throughout the task. The task is attentionally demanding, so it is reasonable to expect that the mean firing rates measured in V4 during this task approach the maximum rates that occur during natural vision.
Visual search trials lasted from 2 to 42 s, with a mean length of 16.2 s (see Fig. 8A). Figure 8B shows the distribution of mean firing rates in each trial. The mean of the distribution is 16.5 Hz. Although the firing rate of area V4 neurons during visual search is lower than that observed in V4 during passive fixation and stimulation with rapid sequences, it is still considerably above the metabolic limits proposed by Attwell and Laughlin (2001) and Lennie (2003).
There are a number of possible explanations for this difference. First, the theoretical maximum firing rates may be underestimates. Second, our method may overestimate mean firing rates because extracellular recording biases the sample toward neurons with high firing rates (Olshausen and Field 2005; see discussion). Alternatively, the brain might compensate by reducing activity in other brain areas when one area is particularly active.
Regardless of the explanation for this difference, we can clearly rule out the possibility that neural firing rates are well below the theoretical limits. In fact, the firing rates we observe are sufficiently high that the brain may not be able to sustain them for long periods. This suggests that either the proposed metabolic limits are likely to seriously constrain the structure of neural codes or the proposed metabolic limits need to be revised upward. In either case, it is very likely that the neural code has evolved to work within these metabolic limits.
Striate and extrastriate neurons have approximately exponential response distributions.
The relation between entropy and information means that neurons with high-entropy response distributions are capable of transmitting large amounts of information. The exponential is the maximum entropy distribution for positive values and a fixed mean (Cover and Thomas 1991). Therefore, if visual cortex maximizes information per spike, then the response distribution for each neuron should be exponential in shape. To determine whether response distributions in areas V1, V2, and V4 are exponential during passive fixation and stimulation with natural scenes, we fit the logarithm of lifetime response distributions with a quadratic (Eq. 4).
The log-transformed distributions are shown in Fig. 9, A–C. An exponential distribution would be well-fit by a straight line, requiring only the constant and linear terms. The quadratic term, b, therefore reflects the degree to which the distribution deviates from exponential: negative values indicate convexity, and positive values indicate concavity. The distributions of b in areas V1, V2, and V4 are shown in Fig. 9, D–F. In V1, the mean of b is not significantly different from 0 (t-test, P > 0.9; n = 41). This indicates that there is no systematic bias toward positive or negative curvature. In both V2 and V4, however, the means are significantly lower than 0 (V2: t-test, P << 0.001, n = 108; V4: t-test, P << 0.001, n = 73), indicating convex distributions in both areas. This means that neurons in both V2 and V4 produce more intermediate-strength responses than predicted by the exponential model.
The above analysis indicates that the exponential is not a complete description of response distributions in V2 and V4. To assess the size of this deviation from the exponential model, we compared the variance accounted for (r2) by the quadratic fits with the variance accounted for by a linear fit. The ratios of r2 for the two fits are shown in Fig. 9, G–I. In all but a few neurons, the linear model accounts for over 90% as much variance as the quadratic model. This indicates that the deviations from the exponential, although significant, are small. We conclude that responses in V1, V2, and V4 are approximately exponentially distributed.
In this paper, we have highlighted the differences between four forms of sparseness (emphasized in black in Fig. 2): metabolic constraints on the mean firing rates of neurons, lifetime sparseness, population sparseness, and maximization of information transmitted by each spike. Whether visual cortex maximizes sparseness clearly depends on how sparseness is defined.
Lifetime Sparseness or Information Maximization?
Theoretical studies have suggested that sensory neurons should maximize lifetime sparseness (Bell and Sejnowski 1997; Olshausen and Field 1996) and that they should maximize the information content of neuronal response distributions (Levy and Baxter 1996). These hypotheses have often been treated as identical, but in fact they make different predictions about how the structure of coding should progress across the visual hierarchy.
The hypothesis that neurons maximize lifetime sparseness suggests that lifetime sparseness should increase across the visual hierarchy from more peripheral to more central areas (e.g., from V1 to V2 to V4). Neurons in more central visual areas can theoretically use relatively more complex nonlinear codes, and such codes might have much higher lifetime sparseness than the quasi-linear coding scheme found in V1. We measured whether lifetime sparseness increases across the visual hierarchy by measuring responses of neurons in V1, V2, and V4 to flashed natural images. We find that responses in V4 and V2 are no more sparse than in V1 (Fig. 7). This suggests that the neural code in extrastriate visual areas V2 and V4 is not simply optimized to maximize lifetime sparseness.
Unfortunately, we do not currently have access to a data set that would be appropriate for examining neural population sparseness. Population sparseness depends on the correlations between neural responses (Fig. 6), so that a sample of neurons with similar receptive fields is likely to have lower population sparseness than a sample with dissimilar receptive fields. Thus, to accurately estimate population sparseness, one must densely sample neurons with similar and dissimilar receptive fields. In the data collected here, neurons were not recorded particularly close to one another. Therefore, in this report, we did not measure population sparseness.
The most obvious direct test of the information per spike hypothesis (Baddeley et al. 1997; Levy and Baxter 1996) would be to determine whether the rates of mutual information between stimulus and response are maximal. However, this is an ill-defined problem. Unless we know all the constraints that determine how neurons represent information, we cannot know whether the mutual information has been maximized subject to all relevant constraints. Instead, we tested two predictions that follow from this hypothesis.
The first prediction is that there is a constraint on the mean firing rates that neurons can achieve. To test this, we first determined whether mean firing rate is actually constrained in visual cortex by comparing firing rates in V4 during a challenging visual task with theoretical maximum values based on metabolic constraints (Fig. 8). We find that individual neurons sustain firing rates (mean 16.5 Hz) that are sufficiently high that they are likely to be constrained by metabolic considerations (which place maximum rates at 1–4 Hz). The discrepancy between these figures is likely to be accounted for by recording bias; see below. Alternatively, other parts of the brain might be particularly inactive during this task, compensating for the above-average activity of the visual cortex.
The second prediction of the information per spike hypothesis is that neuronal response distributions will be exponential (to maximize the entropy of response distributions given the limit on mean firing rate). We tested the quality of exponential fits of the response distributions of neurons in V1, V2, and V4 recorded during a passive fixation task where natural image patches were presented within and around the CRF of each neuron (Fig. 9). We find that response distributions in all three areas are well-described by exponential distributions and are therefore compatible with the information per spike hypothesis.
Taken together, our results suggest that neurons in early and intermediate visual areas are not simply optimized to maximize lifetime sparseness. This does not rule out the possibility that lifetime sparseness is just one of a many constraints the neural codes must simultaneously satisfy. Higher visual areas represent relatively complex visual attributes, and this representation may place a limit on lifetime sparseness so that lifetime sparseness cannot increase through the visual hierarchy. Alternatively, the brain may maximize other forms of sparseness, such as population sparseness. Our results are also compatible with the hypothesis that neurons are optimized to maximize information transmission, subject to a constraint on mean firing rate.
Effect of Image Size on Lifetime Sparseness
The stimuli used to estimate lifetime sparseness in this study consisted of natural image patches scaled to a fixed proportion of the receptive field diameter of each neuron (3× in V1 and V2; 1.5× in V4). Because receptive field diameter increases significantly from V1 to V2 to V4, the image patches also increased in size across the visual hierarchy. One might argue that this procedure biased estimates of lifetime sparseness across areas. However, we believe that manipulating patch size in this way provides the most conservative way to test the hypothesis that lifetime sparseness increases through the visual hierarchy.
The ideal way to perform this experiment would be to present full-field stimuli to all neurons, thereby obtaining an ecologically relevant estimate of sparseness. However, it is not practical to use a monitor that covers the entire visual field. When using a smaller monitor, one is forced to choose between using patches of constant size or scaling the image patches according to receptive field size.
If we had used patches of constant size, we could have made them large enough to ensure that they always covered the CRF of each neuron completely, but coverage of the nonclassical receptive field (nCRF) would inevitably vary with receptive field size. Neurons with small CRFs would receive a lot of nCRF stimulation, and neurons with large CRFs would receive relatively less nCRF stimulation. This would tend to increase the sparseness of neurons with small CRFs and reduce the sparseness of neurons with large CRFs (Vinje and Gallant 2000). Because V4 neurons have larger CRFs than V1 neurons, this would make V4 appear to have lower lifetime sparseness than V1 and would lead us to wrongly reject the hypothesis that sparseness increases from V1 to V4.
For this reason, in this study, we chose to scale the size of each patch according to the size of the CRF. This ensures that the proportion of stimulation impinging on the CRF and the nCRF is constant both within and across visual areas. We believe that this provides a relatively unbiased estimate of lifetime sparseness regardless of CRF diameter.
It is important to remember that in our study, image patches were presented at 3× CRF diameter in V1 and V2 and 1.5× CRF diameter in V4. The small decrease in lifetime sparseness that we observed in V4 may indeed reflect this methodological difference. However, the data from V1 and V2 are directly comparable, and we find no increase in lifetime sparseness between those two areas.
Comparison with Previous Measurements
Our results are compatible with those of several previous experiments that have measured the statistics of response distributions in visual cortex under naturalistic conditions. Several studies have measured mean firing rates in visual cortex of anesthetized animals during natural image stimulation (Baddeley et al. 1997; Maldonado and Babul 2007; Ringach et al. 2002; Tolhurst et al. 2009; Yen et al. 2007). Others have recorded in awake animals during a variety of naturalistic tasks (Baddeley et al. 1997; DiCarlo and Maunsell 2000; Gallant et al. 1998; Griffith and Horn 1966; Guo et al. 2005; Legéndy and Salcman 1985). In particular, Baddeley el al. (1997) measured firing rates in primate inferotemporal cortex during presentation of natural stimuli. They observed significantly lower mean firing rates (4 Hz) than those observed here. This difference is probably explained by different experimental paradigms. Baddeley et al. (1997) measured responses during passive fixation of a slowly changing movie. The challenging free-viewing visual search task used here might be expected to produce higher firing rates.
Recording Bias in Electrophysiology
The mean firing rates we observe in V4 during the free-viewing visual search task are higher than the limits suggested by Attwell and Laughlin (2001) and Lennie (2003). There are several possible explanations for this discrepancy: neurons might compensate by being relatively silent after each period of activity, or other neurons might compensate by being relatively silent when visual cortex is involved in a difficult task. However, the most likely reason is that extracellular recording methods produce a biased sample (Olshausen and Field 2005). Extracellular electrophysiology experiments are dependent on the detection and identification of action potentials. Therefore, the more active neurons are the more likely to be identified and characterized. This problem is exacerbated when the procedure used to identify good candidates for systematic assessment uses either a restricted set of stimuli or a stereotyped behavioral task.
In the experiments that provided the data analyzed here, care was taken to minimize selection bias: neurons were recorded even if they responded weakly, and the stimuli and tasks used to search for neurons were quite different from those used to assess neuronal response properties. However, despite these efforts, it is likely that some sampling bias remains, and so the true mean firing rate, averaged over the population, is likely to be somewhat lower than that observed here.
These data and analyses demonstrate that we must be judicious in our use of the term sparseness and always careful to define exactly which form of sparseness is being measured or considered. In visual cortex, there is substantial evidence that sparseness at the earliest stages of cortical processing may approach theoretically maximal values. However, surprisingly, lifetime sparseness does not appear to increase substantially in extrastriate regions, despite the fact that nonlinearities increase across the visual hierarchy. Taken together, these results suggest that lifetime sparseness is not the primary constraint that defines the neural code in striate and extrastriate visual cortex. Instead, our data suggest that maximization of information may be an important coding constraint.
This work was supported by National Eye Institute and National Institute of Mental Health.
No conflicts of interest, financial or otherwise, are declared by the author(s).
We are grateful to R. J. Prenger for assistance with data collection and M. R. Krause for comments on the manuscript.
- Copyright © 2011 the American Physiological Society