|
|
||||||||
Department of Anatomy and Neurobiology, Washington University School of Medicine, St. Louis, Missouri 63110
Submitted 22 August 2003; accepted in final form 8 June 2004
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Fluctuations in firing rate can be classified into 2 major types. First, firing rates tend to fluctuate randomly from one presentation to the next of the same stimulus (Shadlen and Newsome 1998
; Tolhurst et al. 1983
). Trial-to-trial fluctuation of firing rates (noise) and the effects of noise on the information carried by the neuron have been characterized extensively in various areas of the visual system (Abbott 1994
; McAdams and Maunsell 1999
; Shadlen and Newsome 1994
; Wiener et al. 2001
; also see Softky and Koch 1993
). Second, the firing rate of a visual cortical neuron rarely stays constant during a sustained presentation of a given stimulus; the neuronal response to visual stimulation typically includes an initial transient peak followed by a declining response (Keysers et al. 2001
; Lisberger and Movshon 1999
; Müller et al. 1999
, 2001
; Oram and Perrett 1992
; Tolhurst et al. 1980
, 1983
; also see Rieke et al. 1998
). Explorations of the effects of this latter type of firing rate variation in the visual cortex have largely focused on the temporal dynamics of selectivity for various low-level stimulus characteristics, such as orientation, spatial frequency, contrast, and disparity, notably in area V1 (see, e.g., Albrecht et al. 2002
; Bredfeldt and Ringach 2002
; Frazor et al. 2004
; Menz and Freeman 2003
, 2004
; Müller et al. 2001
; Shapley et al. 2003
; also see DISCUSSION). Additional studies have examined the temporal dynamics of selectivity for orientation contrast in V1 (Knierim and Van Essen 1992
), illusory contours in V2 (Lee and Nguyen 2001
), "border ownership " in V2 and V4 (Zhou et al. 2000
), complex chromatic and achromatic shapes in inferotemporal cortex (Edwards et al. 2003
; Keysers et al. 2001
; Oram and Perrett 1992
; Tovée et al. 1993
), and motion-based features in middle temporal (MT), middle superior temporal (MST), and ventral intraparietal (VIP) areas (Cook and Maunsell 2002
; Duffy and Wurtz 1997
; Pack and Born 2001
; also see DISCUSSION). However, neither the temporal dynamics of intermediate level shape processing nor the interplay between temporal dynamics and signal-to-noise have been systematically studied in the extrastriate cortex.
Here, we examine the temporal dynamics of the response to shape stimuli of low- to intermediate complexity in the extrastriate visual area V2. We previously showed that neurons in V2 carry detailed shape information about a diverse set of visual shapes, including a variety of gratings and contours (Hegdé and Van Essen 2000
, 2003
). Using this data set, we studied the temporal dynamics of 3 main types of firing rate variation in V2: 1) stimulus-to-stimulus variation, or signal; 2) trial-to-trial variation, or noise; and 3) cell-to-cell variation in the response to a given stimulus. Our study addressed the temporal dynamics of the shape information conveyed by the firing rate and not the information conveyed by the temporal pattern the responses per se (McClurkin et al. 1991
; Richmond and Optican 1990
), or by synchronized firing among subsets of neurons (see Gray 1999
; Salinas and Sejnowski 2001
; Usrey and Reid 1999
).
We find that the magnitude and the nature of the information carried by individual neurons and by the V2 cell population change substantially over the course of the response. The maximal response modulation at the individual cell level occurs during the initial response transients. Information about shape decorrelates over time for many individual cells and for the population, suggesting that shape representation in V2 changes in meaningful ways in association with temporal variations in the firing rate.
| METHODS |
|---|
|
|
|---|
(at 1 kHz) inserted transdurally into the cortex. All animal-related procedures used in this study were reviewed and approved in advance by the Washington University Animal Studies Committee. Visual stimulation and recording
The animal fixated within a fixation window of 0.5° radius for a juice reward while stimuli were presented within the classical receptive field of the V2 cell under study. The stimulus set consisted of 48 grating stimuli and 80 contour stimuli (Fig. 1). Grating stimuli consisted of conventional sinusoidal gratings, as well as non-Cartesian (hyperbolic and polar) gratings. The contour stimuli consisted of bars, intersections (tristars, crosses, and 5- and 6-armed stars), angles (acute angles, right angles, and obtuse angles), arcs (one-quarter arcs, semicircles, and three-quarter arcs), and circles. The large contour stimuli were matched in size to the cell's preferred bar length (determined qualitatively during receptive field mapping), with the exception of larger obtuse angles and one-quarter arcs, which were scaled down 50% to ensure that they stayed within the classical receptive field. The smaller contours in all cases were one half the size of the corresponding larger contours.
|
Single cells were isolated using a window discriminator (Bak Electronics, Germantown, MD). The cell's receptive field was mapped using multiple mouse-driven bar- or grating stimuli on the computer's monitor. The cell's preferred bar parameters, including preferred length, width, color, and orientation, were also determined during the manual mapping. For the recording, the stimulus set was reoriented for each cell according to the cell's preferred orientation (also see legend to Fig. 1). All stimuli were presented in the cell's preferred color (qualitatively assessed using a palette of 6 colors) over a uniform gray background. The line width of contour stimuli was determined by cell's qualitatively estimated preferred bar width. The grating stimuli had a spatial frequency of 2, 4, or 6 cycles per receptive field diameter and had the same diameter as that of the receptive field and the same mean luminance as that of the background. Stimuli were presented sequentially for 300 ms each with a 300 ms interstimulus interval. Up to 6 stimuli were presented per trial in this fashion. The stimuli were randomly interleaved, so that the effects of systematic eye position drifts, and so forth, over the course of the trial were minimized. In general, the fixation jitter was much narrower for each of the 3 animals than the ±0.5° range allowed, and none of the animals showed any drifts or other systematic effects of fixation fatigue. To reduce the contributions of any receptive field nonuniformities, each stimulus was presented at 3 different jitter positions, spaced evenly from each other, and offset from the receptive field center by 25% of the receptive field radius.
The spikes were collected at a resolution of 1 ms using a Silicon Graphics Indigo2 workstation using custom-written experimental control software. Presentation of visual stimuli was synchronized with the spike-collection software at a temporal resolution of ±7 ms (i.e., 1/72 Hz, the screen refresh rate). The response to each stimulus was recorded over 12 randomly interleaved repetitions (except for 62 cells from one animal, which had 9 repetitions per stimulus, 3 at each jitter position). Only the data from trials throughout which the animal maintained fixation were further analyzed.
Data analyses
The data were analyzed using the statistical utility S-Plus (Insightful, Seattle, WA) or Matlab (The Mathworks, Natick, MA) or custom-written C language software. A total of 196 cells were recorded from 4 hemispheres of 3 animals. In 180 cells, at least one of the 128 stimuli evoked a response greater than the background at a significance level of P < 0.05 (2-tailed t-test with Bonferroni correction for 128 comparisons,
= 0.05/128; see Huberty and Morris 1989
; also see Savitz and Olshan 1995
; Thompson 1998
); all 180 of these were included in this study.
ASSIGNING SPIKES INTO BINS. Spikes fired by the given cell between 0 and 300 ms after stimulus onset were analyzed in this study, unless specified otherwise. For most analyses, the spikes in response to each presentation of each stimulus were divided into 15 consecutive time bins of 20 ms, each extending from 0300 ms, so that the bins had identical binwidth but contained a variable number of spikes, depending on the cell's firing rate during the given bin. For some analyses, 15 consecutive bins that contained equal number of spikes but had variable binwidth were used instead. To calculate these bins for a given cell, the spikes fired by the cell across all repetitions of all stimuli were pooled, preserving the relative temporal order in which they occurred during the 0300 ms time interval. This ordered train of spikes was divided into 15 consecutive bins so that each bin contained an equal number of spikes; the final 14 or fewer spikes closest to the 300 ms mark were discarded so as to avoid fractional spike counts.
TESTS OF SIGNIFICANCE.
Conventional parametric tests of significance were used where appropriate. In most cases, however, we used randomization analysis, by determining whether the value of a suitable test statistic calculated from the actual data differs significantly from the distribution of the same test statistic calculated from randomized data (see Edgington 1995
; Manly 1991
). For each test, an appropriate test statistic was first calculated using the actual neural response data. The data were then randomized across bins, trials, stimuli, and/or cells as appropriate, and the test statistic was recalculated using the randomized data. The randomization process was repeated 106 times [103 times in the case of multidimensional scaling (MDS) analyses described below]. The proportion of times the randomized test statistic exceeded the actual test statistic constituted the one-tailed probability P that the actual value of the test statistic was attributed to chance.
In cases involving multiple comparisons, we adopted a stringent approach of using the Bonferroni correction (
= 0.05/n, where
is the probability of type I error and n is the number of comparisons; see Huberty and Morris 1989
; also see Savitz and Olshan 1995
; Thompson 1998
).
INDICES. To measure the sharpness of a given cell's shape selectivity profile during a given bin, we used the tuning sharpness index (TSI), defined as [1 (Rmean/Rmax)], where Rmean is the cell's mean response to all 128 stimuli and Rmax is the cell's response to its most effective stimulus. The TSI is sensitive to noise, given that random fluctuations in firing rate will tend to increase Rmax by chance without affecting Rmean, thus increasing TSI.
To measure the modulation of a given cell's responses above random noise levels by the stimuli, we used the response modulation index (RMI). To calculate RMI for a given cell during a given bin, we first calculated the conventional F ratio of the cell's responses to the given set of stimuli, given by F = MSbetween/MSwithin, where MSbetween is the stimulus-to-stimulus variance across trials (or, equivalently, the between-stimulus mean squares), and the MSwithin is the average trial-to-trial variance. The F ratio is an explicit measure of the signal-to-noise ratio because MSbetween and MSwithin are measures of the signal and the noise, respectively. Note that MSbetween values subsume the corresponding MSwithin values, so that MSbetween represents both the stimulus-to-stimulus variance and trial-to-trial variance, whereas MSwithin represents only the trial-to-trial variance (see Brase and Brase 1995
). To calculate RMI, we randomized the responses across the stimuli and recalculated the F ratio. RMI was defined as the F ratio calculated from the actual data divided by the average F ratio from the randomization rounds. This normalization of the actual F ratio with the average randomized F ratio effectively corrected for deviations of the data set from normality. We did not use information-theoretic measures to quantify information, mainly because the number of repetitions per stimulus in our data set (912 depending on the cell) was too small for such analyses. Briefly, small sample sizes (i.e., small number of repetitions) result in systematic biases in the information-theoretic estimates of the information conveyed (Panzeri 1996
; Panzeri and Treves 1996
). To correct for these biases is desirable to have at least as many repetitions per stimuli as there are stimuli in the stimuli set (128 in our case); smaller number of repetitions will result in systematic overestimation of the information conveyed (Panzeri 1996
, p. 9194; Panzeri and Treves 1996
, p. 90100; also see Rolls et al. 2003
). Note that ad hoc corrections, including those suggested by Chee-Orts and Optican (1993)
, do not adequately correct for this error (see Panzeri 1996
, p. 9194; Panzeri and Treves 1996
, p. 88).
CALCULATING POPULATION AVERAGES. To calculate the population average of a given metric, we first normalized the values of the metric across all time bins to a maximum of 1.0 for each cell to correct for cell-to-cell differences in the response magnitude. We then averaged the normalized values across all 180 cells separately for each bin. SE values were calculated for each bin as the SE of the normalized values across all cells in that bin. The only exception to this procedure was the population average of the sharpness of tuning index (TSI) values shown in Fig. 4D, which were averaged across the population without being normalized, because the TSI values ranged from 0 to 1 to begin with.
|
= 10 ms), and the time value corresponding to the peak was determined to the nearest millisecond. We used randomization analysis (see above) to determine whether the peaks were the result of noise. To do this for the RMI peak for a given cell, we randomized the spike counts of the cell across all bins and stimuli and recalculated the RMI for each bin. We then smoothed the resulting 15-bin time histogram and determined the peak as for the nonrandomized data. This procedure was repeated for 1,000 rounds. The proportion of rounds during which the peak was at least as tall as the peak from that from the actual data represented the one-tailed probability P that the peak from the actual data was attributed to noise. The statistical significance of the firing rate peaks was determined similarly. Analysis of response correlations at the population level
To analyze patterns of response correlation across the population, we used metric MDS and principal components analysis (PCA), as described in Hegdé and Van Essen (2003)
. Briefly, we used a 128 x 128 correlation matrix of the population response as the input to MDS or PCA. Each element of the matrix represented the correlation coefficient of the responses of the V2 cells (averaged across trials, but non-normalized) to a given pair of the 128 stimuli (see Fig. 8A; also see Hegdé and Van Essen 2003
; Kachigan 1991
, p. 147).
|
ANALYSIS OF MDS CLUSTERS: D RATIO TEST. We used randomization analysis to determine whether the clustering of stimuli, if any, in a given MDS plot was significantly nonrandom. The test statistic was the D ratio, which was directly analogous to the conventional F ratio, making the D ratio test a direct analog of the F test. An MDS plot was first generated using the original 128 x 128 correlation matrix described above. Clusters of data points were provisionally identified from a visual examination of the plot and used to calculate the D ratio, defined as the variance of the between-cluster distances divided by the mean variance of within-cluster distances. The correlation matrix was then randomized and an MDS plot was generated from the randomized matrix. The D ratio was calculated for this MDS plot using the original composition of the clusters. The clustering in the original matrix was considered significantly nonrandom if the P value was <0.05.
COPHENETIC CORRELATION.
Cophenetic correlation is a method of calculating the correlation coefficient between paired matrices or other high-dimensional data (for overviews, see Sneath and Sokal 1973
; Sokal and Rohlf 1962
). We used a version of this method to measure the similarity between a given pair of MDS plots or correlation matrices (for details, see Hegdé and Van Essen 2003
). Like the conventional correlation coefficient, the values of rC vary from 1.0 (perfect correlation) to 0.0 (no correlation) to 1.0 (perfect anticorrelation).
PCA (S-Plus routine princomp) simplifies complex, high-dimensional data by identifying a small number factors that underlie global patterns in the data and determining the extent to which each factor, or principal component, accounts for variance in the data. In a manner analogous, but not identical, to multiple linear regression, PCA linearly transforms an original set of variables into a smaller set of independent (uncorrelated) variables (principal components) that represent most of the information in the original set of variables. If the given data set is highly correlated (or equivalently, redundant or low-dimensional), a small number of principal components will account for a large proportion of the data, and the proportion of data explained will tend to fall off sharply from the top component on. As the input data become increasingly decorrelated (or increase in dimensionality, or decrease in redundancy) it takes progressively more principal components to account for a given proportion of the data, and successive principal components will tend to account for more comparable proportions of data. We used this technique to assess the temporal variations in the dimensionality of the V2 population response.
| RESULTS |
|---|
|
|
|---|
When tested with the stimulus set shown in Fig. 1, many V2 cells responded with a brisk initial transient response that rapidly decayed to a sustained level at a lower firing rate. In Fig. 2A, the response pattern for an exemplar V2 cell is shown as the average firing rate across all stimulus presentations, superimposed on individual raster patterns that are ordered in the actual sequence of stimulus presentation. The cell had low spontaneous activity (4 spikes/s), responded with a latency of about 30 ms after stimulus onset, fired maximally (121 spikes/s) about 50 ms after stimulus onset, declined rapidly to below 20 spikes/s, then rebounded to a sustained rate of 2030 spikes/s. The average response for the entire population of V2 cells (Fig. 2B, solid line), showed a similar response pattern, including a rapid transient response (3585 ms after the stimulus onset), a sustained response at about 40% of the peak rate, and a modest OFF-response (arrow).
|
Figure 3 illustrates the spike density function (SDF) for individual stimuli presented to the exemplar cell, with each row showing the SDF for a single stimulus averaged across 9 repetitions. Numerous stimuli elicited strong responses during the 4060 ms interval after the stimulus onset, but most of those were far less effective later in the stimulation period. In contrast, a few stimuli including the large acute angles at 0 and 180° (rows 81 and 83), elicited a relatively strong response throughout the stimulus presentation.
|
During the 160180 ms time window (Fig. 4B), the most effective stimuli were the large acute angles at 180° (122 spikes/s, third row) and 0° (120 spikes/s, first row). The cell responded poorly to most other stimuli, including many that were effective during the 4060 ms bin (e.g., the intermediate frequency radial grating, third row). The TSI increased to 0.8 for the 6080 ms window, corresponding to a 5-fold ratio between the best and the average response, and it remained near this level for the remainder of the stimulus presentation (Fig. 4D, thin red line). Overall, the cell's shape-selectivity profiles during the 4060 ms bin versus the 160180 ms bin were poorly correlated (correlation coefficient r = 0.17). The shape-selectivity profile calculated over the entire 300 ms interval (Fig. 4C) largely resembled that during the 160180 ms bin (r = 0.69), indicating that there was reasonable consistency in shape selectivity after the initial transient. Together, these results indicate that the response profile of the exemplar cell substantially decorrelated over the course of the response.
The broader tuning (i.e., decreased TSI) during initial transients might in principle reflect a saturation effect, in which many stimuli elicit a near-maximal firing rate. If saturation occurred and were related to the cell's relative refractory period, the variance/mean ratio (VMR) of the responses across all stimuli should decrease during the initial transient, as indeed occurs in V1 cells responding to flashed gratings (Müller et al. 2001
). The VMR computed for the exemplar cell was 0.55 for the 4060 ms window, increased to about 0.8 in the 60140 ms window, then declined to about 0.65 after 160 ms. This is consistent with some initial saturation in the exemplar cell. However, for the population response, the VMR (thick blue line) was maximal during the 4060 ms bin during which TSI was lowest and declined subsequently as the TSI values (thick red line) increased. Thus response saturation related to refractoriness is unlikely to be a general explanation for the increases in tuning sharpness after the initial transient.
Temporal dynamics of the signal-to-noise ratio
To measure the information conveyed by each neuron after accounting for trial-to-trial fluctuations (i.e., noise), we used the response modulation index (RMI), which is based on the F ratio and provides an explicit measure of the signal-to-noise ratio (see METHODS). In one analysis, the time course of the RMI was calculated using 20 equal time bins (15 ms/bin). Because the use of equal binwidth could in principle underestimate the response modulation during intervals of low average responses, we carried out a separate analysis in which the bins for each cell were adjusted to give an equal total spike count (see METHODS). The results for both analyses were very similar and are illustrated here only for the equal time bin analysis.
Figure 5A shows results for the exemplar cell, with the RMI indicated by the thick solid line and large dots. Measures contributing to the RMI, the conventional F ratio (i.e., the signal-to-noise ratio, medium dotted line), and randomized average F ratio (thin dashed line), are also shown (see legend for details). The RMI had values of 2.8 and 2.9 during the first 2 responsive bins (2040 and 4060 ms), rose to a peak of 6.0 during the 6080 ms bin, then decreased to about 5 from 100180 ms and about 3.5 from 180300 ms. Thus the responses of this cell conveyed significant shape information with a time course that differed substantially from the cell's mean firing rate.
|
To assess whether the temporal dynamics depend strongly on the different stimulus classes within our overall stimulus set, we carried out the same analyses, but using the responses to only the 12 sinusoidal stimuli (medium gray line in Fig. 5B) or only the 4 oriented bars (thin black line). The results were statistically indistinguishable (one-way ANOVA, P > 0.05) for both the exemplar cell (data not shown) and for the population as a whole (Fig. 5B), indicating that the information conveyed about spatial frequency and/or orientation follows a similar temporal dynamic pattern as that about the stimulus set as a whole.
Reliability of response modulation during various phases of the response
The population averages of RMI, though informative, do not address the question of whether or to what extent the response modulation was statistically significant during various bins. To address this issue, we determined, for each bin, the cells for which the evoked responses were both significantly above background (as measured by a 2-tailed t-test, P < 0.05 with Bonferroni correction) and were modulated significantly above random levels (as measured by the RMI, P < 0.05 with Bonferroni correction) during the same bin. Of the 180 cells, 161 cells (89%) met both criteria for at least one bin. For each of these 161 cells, we determined the bin during which the cell conveyed the earliest, largest, and latest significant shape information by the above 2 criteria, as illustrated schematically in Fig. 6A. The results are shown in Fig. 6, BD. About one third (58/161, 36%) and eight tenths (127/161, 79%) of the cells conveyed significant shape information within 40 and 60 ms of the stimulus onset, respectively (Fig. 6B). About one half (81/161, 50%) and two thirds (113/161, 70%) of the cells, respectively, conveyed maximal shape information within the first 60 and 80 ms after the stimulus onset, respectively (Fig. 6C). About three tenths of the cells (48/161, 30%) conveyed significant shape information as late as the last (i.e., 280300 ms) bin, although at lower RMI values (Fig. 6D).
|
Because the modulation of a cell's response is closely related to its firing rate, we investigated whether and to what extent the 2 parameters were correlated for a given cell. To address this issue, we carried out 2 analyses. First, we compared the time of maximal response with the time of maximal response modulation for each cell (Fig. 7A). For 151 (84%) of the 180 cells (denoted by filled circles in the scatter plot), the peak response and peak response modulation were both significantly higher than expected from chance, as determined by randomization (P < 0.05 with Bonferroni correction; see METHODS). For these cells and for V2 cells in general, the peak in response modulation was substantially delayed relative to the initial response transients, to an even greater degree than for the exemplar cell. For 30% of the overall population (54/180) and of the cells denoted by filled circles (46/151), the response modulation peaked more than 20 ms after the peak in firing rate. For 23 of these 46 cells (50%), and for 21% of the overall population (37/180), the response modulation (as measured by the RMI value) was significantly larger (P < 0.05 with Bonferroni correction) for the bin during which the RMI was maximal than for the bin during which the firing rate was maximal. The overall correlation between peak response and maximum response modulation was poor across all V2 cells (r = 0.19), and across the 151 cells with significant peaks in the response and the response modulation (r = 0.10). In the second analysis, we calculated the r value between RMI and average neuronal response across all bins for each cell. The r values varied considerably from one cell to the next (Fig. 7B). The correlation was statistically insignificant (P > 0.05, r < 0.58, one-tailed Pearson product moment correlation) for about three fifths of the cells (105/180, 58%; open bars). The average r value for all V2 cells was 0.38. Thus the mean response was a poor predictor of response modulation for many V2 cells.
|
The results presented thus far deal with the temporal dynamics of response modulation at the level of individual cells. We also studied the temporal dynamics of the population response, specifically to determine whether and to what extent the population response decorrelated over time. To do this, we calculated a correlation matrix of the population response during each 20 ms bin, so that each element of a given matrix represented the correlation coefficient r between the responses of all V2 cells to a given pair of stimuli (see Fig. 8A and METHODS). The matrices corresponding to 4 selected bins are shown in Fig. 8B.
The matrices varied significantly across the bins (one-way ANOVA, P < 0.05, data not shown). Figure 8C shows the mean r value (±SE) from the correlation matrix corresponding to each bin. During the first bin (020 ms) (i.e., before the response onset), the mean r value was low (0.23), and the r values were distributed across stimuli apparently randomly (Fig. 8B, panel 1). The mean r values increased sharply over the next 2 bins, peaking at 0.4 during the 4060 ms bin, when the population average response was maximal (cf. Fig. 2B).
The population response decorrelated progressively from 60 to 160 ms, then stabilized at a low level (0.190.22) comparable to that of the 0- to 20-ms bin. The SE of the r values (error bars in Fig. 8C) varied little across all 15 bins (range, 0.0090.01), even as the mean r values varied from one bin to the next, indicating the r values rose or fell consistently across the various stimuli from one bin to the next.
During the 4060 ms bin, the response correlation was highest among the gratings (Fig. 8B, panel 2, stimuli 148) and some large contour stimuli. This pattern of correlations is more clearly visualized by the MDS plot of this correlation matrix (Fig. 8D), in which the stimuli that elicited relatively consistent responses across the population (i.e., elements of the correlation matrix with similar colors) are clustered together, whereas stimuli that elicited disparate responses from one cell to the next are dispersed farther apart (see METHODS for details). If the response correlations vary randomly among stimuli, the stimuli are expected to scatter randomly, with no significant clustering. If the responses were perfectly correlated or uncorrelated (i.e., r = 1.0 or r = 0.0, respectively) across all stimuli, all stimuli will fall on a single point. Three clusters were identifiable in the MDS plot, one containing all of the grating stimuli (red cluster), another containing 17 large angles and intersections (green cluster), and a third (blue cluster) containing the remaining contour stimuli. These clusters were highly significant as measured by the D ratio test (where the D ratio is a direct analog of the F ratio; see METHODS), with 0/1,000 rounds passing the criterion, indicating that the underlying correlation patterns were highly nonrandom. The pattern in Fig. 8D is very similar to those in a previous analysis of the same data set (Hegdé and Van Essen 2003
) that used a somewhat broader time window for analysis (36134 ms [mean]). The distinction between gratings and contours clearly underlay the separation of the red cluster from the other 2 (green and blue) clusters, whereas the selectivity for stimulus size and for specific angles and curved stimuli may underlie the differentiation between the 2 (blue vs. green) contour clusters (see Hegdé and Van Essen 2003
). For later bins, the clustering of stimuli was progressively less pronounced, although still discernible in both the MDS plot (Fig. 8E for the 280300 ms window) and the correlation matrix (Fig. 8B, panels 3 and 4), indicating that the population response was substantially but not completely decorrelated after this point.
To assess the complexity of the population response during each bin, we analyzed each correlation matrix using PCA (see METHODS). The proportion of the data accounted for by the most informative principal components, and the number of principal components required to account for a criterion amount of the data, are 2 measures of the dimensionality, or complexity, of the data (see Kachigan 1991
, p. 246247). If the population response during a given bin were perfectly decorrelated, each principal component would be expected to account for 0.78% of the total response variation (given a set of 128 stimuli). Figure 9 shows the 10 most informative principal components for each of the 15 bins. During the 020 ms bin, before the response onset, the first 10 principal components together account for only 46% of the variance. During the 4060 ms bin, when the population is maximally correlated (see Fig. 8, BD), the first principal component by itself accounts for 46% of the variance, and the top 3, 5, and 10 components together accounted for 78, 85, and 92% of the data, respectively. Thus a relatively small number of response dimensions accounted for most of the population response during the early transient response. The complexity of the of the population response increased markedly between 60 and 160 ms; between 160 and 300 ms, the first principal component accounted for 1318% of the variance and the first 10 components accounted for 5862% of the variance. We obtained qualitatively similar results when these analyses were repeated using bins with equal spike counts, described above (data not shown).
|
Temporal changes in the response profiles of individual V2 cells
The decorrelations of the population response described above might in principle result from a decorrelation of the response profile of a subset of cells, or from a widespread decorrelation across the population. To distinguish between the 2 scenarios, we measured the extent to which the response profiles of individual cells changed over time. To do this, we calculated the correlation coefficient r between the given cell's response during a given pair of bins for each pairwise combination of the 15 bins (see METHODS).
Figure 10A illustrates the changes in the response profile of the exemplar cell over the course of the response relative to its response profiles during each of the 4 earliest bins. The thick solid line shows the r values across all bins measured relative to the 2040 ms bin, the earliest bin during which the cell's responses were both significantly elevated from background levels and were significantly modulated (see Fig. 6A). The r value during the next bin (4060 ms bin) was 0.20, indicating that the response profiles decorrelated substantially between the 2 bins. In the subsequent bins, the r values stabilized at a lower level (range, 0.060.17), slightly above the chance level (horizontal arrow on the y-axis). Similar results were obtained when the correlation coefficients were calculated relative to the 4060 ms bin (thick dashed line) or to the 6080-ms bin (thin dotted line), except that the average r values stabilized at a higher level (mean r = 0.18).
|
The nature of the temporal changes in shape profiles were diverse; we found no clear patterns across subpopulations of cells in this regard. That is, given the response profile of an individual cell during the early part of the response, it was not possible to predict, with any accuracy, how the response profile of the cell might change later in the response, other than the trend toward decorrelation already discussed.
| DISCUSSION |
|---|
|
|
|---|
As noted in the INTRODUCTION, many previous studies have addressed the temporal dynamics of various response metrics used in other visual areas or sensory systems. In the visual system, LGN (lateral geniculate nucleus) neurons temporally decorrelate in response to natural movies, but not to white noise, by a process involving linear temporal filtering characteristics of neurons (Dan et al. 1996
). In the visual cortex, many previous studies have explored the temporal dynamics of individual cell responses, especially in area V1 (see, e.g., Albrecht et al. 2002
; Bredfeldt and Ringach 2002
; Celebrini et al. 1993
; DeAngelis et al. 1993
; Jones and Palmer 1987
; Mazer et al. 2002
; Menz and Freeman 2003
, 2004
; Müller et al. 2001
; Ringach et al. 1997
; Shapley et al. 2003
) and in the inferior temporal cortex (Edwards et al. 2003
; Keysers et al. 2001
; Oram and Perrett 1992
; Tovée et al. 1993
). Recently, Müller et al. (2001)
analyzed the temporal dynamics of the signal-to-noise ratio in area V1 of the anesthetized monkey and found that V1 cells convey much more information during the initial transients than during later periods of the same duration. They reported time-dependent changes in the contrast response function but not in orientation tuning curves, and they found that orientation tuning estimated from the first 150 ms of response was indistinguishable from that estimated over the first 1,250 ms. In contrast, Zohary et al. (1990)
reported increased orientation discriminability as the integration window increased from 60100 ms, and Ringach et al. (1997)
reported significant time dependency of orientation tuning using a finer-grained temporal analysis (10 vs. 50 ms bins). In the domain of stereopsis, Menz and Freeman (2003
, 2004
) showed that binocular disparity tuning in cat visual cortex sharpens with time, consistent with coarse-to-fine processing.
In the olfactory system, it has been recently reported that the population response of zebrafish mitral cells to odors decorrelates over a period of 1 s or so (Friedrich and Laurent 2001
, 2004
). This is qualitatively analogous to our observations, except that it occurs over a much slower time course (
800 ms) and is not associated with an increased sharpness of tuning.
An interesting possibility is that the broadly tuned transient responses of V2 neurons provide a relatively low-dimensional representation that subserves a rapid, relatively coarse-grained initial shape analysis (e.g., predator or not?), whereas the more sharply tuned sustained responses provide a higher-dimensional representation that subserves finer-grained discriminative capacities (e.g., which prey is better?) (see Churchland et al. 1994
; Field 1995
; Friedrich and Laurent 2001
; Roweis and Saul 2000
; Seung and Lee 2000
; Tennenbaum et al. 2000
; also see Edwards et al. 2003
, and references therein).
Many lines of psychophysical evidence support the notion that coarse-grained object recognition can occur on a faster timescale, whereas finer-grained object recognition is slower (Donders 1969
; Luce 1986
; Posner 1978
). Reaction times and processing times tend to be shorter for simple detection or categorization tasks than for discrimination tasks (Fabre-Thorpe et al. 1998
; Liu et al. 2002
; Luce 1986
; Posner 1978
; also see Treisman 1988
). In addition, when visual stimuli are flashed only briefly, followed by a mask or as part of a continuous image sequence, subjects can often accurately detect the presence of a given visual object, but not the precise features of the object (Fabre-Thorpe et al. 1998
; Fize et al. 2000
; Keysers et al. 2001
; Reynolds 1981
; Thorpe et al. 1996
).
Neural mechanisms of response decorrelation
What aspects of neural circuitry might account for the changes in sharpness of tuning and the progressive decorrelation across the neuronal population? Saturation of responses during the initial transient could be one contributing factor, but our analysis of response variance relative to the mean suggests that the relative refractory period is not the main factor. However, there might be alternative forms of saturation (e.g., involving contrast normalization mechanisms) that are consistent with our data.
A response decay after an initial transient can arise from depression of cortical excitatory synapses (Markram and Tsodyks 1996
; Müller et al. 2001
; Varela et al. 1997
). However, the fact that V2 responses decorrelate while they decrease in magnitude suggests that the decay is not simply a function of generalized, nonspecific synaptic depression. Response decorrelations in visual cortex may also arise from recruitment of intracortical inputs that shape classical receptive field responses, from context-dependent effects outside the classical receptive field (see Vinje and Gallant 2000), or as a function of recent stimulation of the receptive field (see Müller et al. 1999
; also see Nelson 1991
).
Time as a coding dimension
Given that the shape of tuning profiles of V2 neurons can vary over time, it is obviously of interest to know whether the processes used to decode this information in other visual areas must take these characteristics into account. The notion that the temporal pattern of sensory responses is used to code meaningful information has been suggested by others for the visual system (McClurkin et al. 1991
; Richmond and Optican 1990
) and the olfactory system (Friedrich and Laurent 2001
). This notion of temporal structure in mean firing rates is distinct from temporal coding hypotheses based on synchronized firing among subsets of neurons (for reviews, see Gray 1999
; Salinas and Sejnowski 2001
; Usrey and Reid 1999
). However, both types of temporal coding, though intriguing, remain controversial (Averbeck and Lee 2004
; Shadlen and Movshon 1999
) and are difficult to test incisively.
| GRANTS |
|---|
|
|
|---|
| ACKNOWLEDGMENTS |
|---|
|
|
|---|
Present address of J. Hegdé: Vision Center Laboratory, The Salk Institute for Biological Studies, 10010 N. Torrey Pines Rd., La Jolla, CA 92037.
| FOOTNOTES |
|---|
Address for reprint requests and other correspondence: D. Van Essen, Department of Anatomy and Neurobiology, Box 8108, Washington University School of Medicine, St. Louis, MO 63110 (E-mail: vanessen{at}brainvis.wustl.edu).
| REFERENCES |
|---|
|
|
|---|
Adrian ED. The impulses produced by sensory nerve endings. Part I. J Physiol 61: 4972, 1926.
Albrecht DG, Geisler WS, Frazor RA, and Crane AM. Visual cortex neurons of monkeys and cats: temporal dynamics of the contrast response function. J Neurophysiol 88: 888913, 2002.
Averbeck BB and Lee D. Coding and transmission of information by neural ensembles. Trends Neurosci 27: 225230, 2004.[CrossRef][ISI][Medline]